Finding Datasets for Machine Learning Explorations

I am currently writing a short (100 pages) e-book. The topic is the CNTK code library for deep neural networks. I spent a lot of time looking through the primary repository for students and researchers – the University of California at Irvine (I got one of my undergraduate degrees there — Go Anteaters!) Web site at

However, the datasets there are quite limited in scope and variety. While hunting around, I found a very neat site at It’s a commercial site that aggregates tons of state, local, and federal government data. You can get subsets of the data for free, but you need to pay a small subscription fee to get full datasets. The site has a very slick interface that pops up a graph when you mouse-over a link to some data.

Anyway, that led me to just start Googling and Binging away for government datasets. Wow! There is absolutely tons of good data available. For example, I was looking for some real data for time series regression. I quickly found some nice data for monthly initial unemployment claims for King County Washington (the Seattle area).

Moral of the story: if you need data for exploring machine learning techniques, search government Web sites.

