I am currently writing a short (100 pages) e-book. The topic is the CNTK code library for deep neural networks. I spent a lot of time looking through the primary repository for students and researchers – the University of California at Irvine (I got one of my undergraduate degrees there — Go Anteaters!) Web site at https://archive.ics.uci.edu/ml/index.php.
However, the datasets there are quite limited in scope and variety. While hunting around, I found a very neat site at http://www.economagic.com. It’s a commercial site that aggregates tons of state, local, and federal government data. You can get subsets of the data for free, but you need to pay a small subscription fee to get full datasets. The economagic.com site has a very slick interface that pops up a graph when you mouse-over a link to some data.
Anyway, that led me to just start Googling and Binging away for government datasets. Wow! There is absolutely tons of good data available. For example, I was looking for some real data for time series regression. I quickly found some nice data for monthly initial unemployment claims for King County Washington (the Seattle area).
Moral of the story: if you need data for exploring machine learning techniques, search government Web sites.