25+ free datasets for Datascience projects

Here are top 25 websites to gather datasets to use for your data science projects in R, Python, SAS, Excel or other programming language or statistical software. Best part, these are all free, free, free! 

Government and UN/World Bank websites:

  1. US government database with 190k+ datasets –http://catalog.data.gov/dataset
  2. UK government database with 25k+ datasets – https://data.gov.uk/data/search
  3. Canada government database – http://open.canada.ca/data/en/dataset?q=education
  4. FBI crime statistics – http://1.usa.gov/1LltHEQ
  5. Center for Disease Control – http://wonder.cdc.gov/
  6. Bureau of Labor Statistics – http://www.bls.gov/data/
  7. NASA datasets – http://nssdc.gsfc.nasa.gov/
  8. World Bank Data – http://datacatalog.worldbank.org/
  9. UN database with 34 sets and 60 million records – http://data.un.org/
  10. EU commission open data – https://open-data.europa.eu/en/data/
  11. NIST – http://1.usa.gov/1JpmcNI
  12. National Center for Education Statistics – http://1.usa.gov/1mAjH0A
  13.  U.S. National Epidemiological Survey on Alcohol and Related Conditions (NESARC) – dataset from survey to determine magnitude of alcohol use and psychiatric disorders in the U.S. population.

Academic websites:

  1. Yelp academic data – https://www.yelp.com/academic_dataset
  2. Univ of California, Irvine – http://archive.ics.uci.edu/ml/datasets.html
  3. Harvard Univ: http://gis.harvard.edu/resources/data
  4. Harvard Dataverse database: http://bit.ly/1RlXNKa
  5. MIT: http://web.mit.edu/towtank/www/vivdr/datasets.html. Also, http://bit.ly/1IMJVri
  6. Univ of North Carolina, adolescent health – http://www.cpc.unc.edu/projects/addhealth/data
  7. Mars Crater Study, a global database that includes over 300,000 Mars craters 1 km or larger, provided by Wesleyan University:

 Kaggle & Datascience resources:

  1. Few of my favs from Kaggle Website
  2. Databits.io – http://databits.io/challenges/opensource . My favorites among these are :
  3. Datasets on Climate information, human genome data, Enron email information, etc – https://www.quandl.com/search?type=free
  4. Gapminder – http://www.gapminder.org/data/

Curated Lists:

  1. KDnuggets provides a great list of datasets from almost every field imaginable – space, music, books, etc. May repeat some datasets from the list above.
  2. An eclectic mix of datasets about gun ownership, NYPD crime rates, college student study habits and caffeine concentrations in popular beverages – https://www.reddit.com/r/datasets
  3. Data Science Central has also curated many datasets for free – http://www.datasciencecentral.com/profiles/blogs/big-data-sets-available-for-free
  4. List of open datasets from DataFloq – https://datafloq.com/public-data/?sp=6358335213372237508418


  1. MRI brain scan images and data – http://bit.ly/1kFfcke
  2. Economic, education, Health and other datasets from Quandl. Please note this site also has a premium version of other datasets – https://www.quandl.com/search?type=free
  3. Google repository of digitized books and ngram viewer – https://books.google.com/ngrams. Sample chart shown below:
  4. Database with geographical information – http://freegisdata.rtwilson.com/
  5. Loan information from Lending Club – https://www.lendingclub.com/info/download-data.action

