We’ve talked before about the importance of building a data portfolio that showcases your skills as the first step toward breaking into a career in analytics. We also shared where to look for real datasets in a recent post: 5 resources for building your data portfolio.
Because you can never have too much data (or too much practice), we’re sharing 6 more resources you can leverage to build on your portfolio. If you feel like your data portfolio could use a lift, or if you are just getting started and looking for projects, check out these resources below:
Housing over 3,000,000 datasets, Data.gov is the largest repository of the US government’s public data, and a first-stop for many analysts looking to build their data portfolios. The catalogue hosts datasets from a myriad of topics, ranging from finance to education to government and consumer, just to name a few. Most of the information is available for download in a few different formats, making it a go-to source for finding an array of datasets compatible with your analytics tools.
Github is a developers playground, and also a place to share and post data and collaborative data portfolio projects. Their Awesome Public Datasets list includes open data sets falling under 30 different popular topics. Github also holds many other data-focused repositories that you can access and collaborate on as well.
3. Buzz Data
Like Github, BuzzData is a sharing service that allows you to upload and share your own data with other users. While you might think of data analysis as individualized work, data projects are often largely collaborative. Opening up your dataset to the Buzz Data community gives you the opportunity to mind meld with other analysts, and can help you look at your data from a new perspective.
DataCite makes research data searchable and shareable through metadata tagging, with the goal of providing continuous value to the research community. You can browse more than 2,000 science-related data repositories to look for open data and use APIs to access those datasets for your own analysis. One popular example is the CancerData.org repository. Since DataCite makes use of metadata tagging, you can easily search the site by keyword.
5. Reddit r/opendata
If you spend time online, you’ve probably found yourself on Reddit. Dubbed “the front page of the Internet,” there are countless subreddit categories and threads to explore. Of course, you’ll come across your standard memes, conspiracy theories about the “non-existence of Finland,” and more, but you can also find an extensive list of free public data sources for your portfolio projects–if you know where to look. Enter: the r/opendata community. This subreddit is filled with interesting datasets and conversation threads, as well as posts from people across the world looking for project ideas and requesting unique types of data. So, if you’re browsing for a free open data set to use, ask Reddit! The community has your back.
6. R Datasets Package
RStudio is an industry standard statistical programming tool, and contains built-in public data packages that you can take advantage of for your own ad hoc analysis. While these native datasets are lighter and contain fewer rows than than those you’ll find browsing the resources above, they are great to practice with while you get familiar with R programming. This step by step guide breaks down how you can load and use some of these popular datasets yourself.
Your data portfolio is the best reflection of your work as an analyst, making it a critical tool for your success. Not only is it essential for potential employers to get a tangible understanding of what you can do, a strong portfolio is necessary if you want to beat out all of the other candidates for a role.
And finally, if you feel you’re really lacking hands-on experience with analytical tools and concepts, applying to a data analytics bootcamp program like Level can help you quickly gain those in-demand skill up while you simultaneously build a data portfolio.