A list of publicly available datasets
General
- Amazon Public Data Sets
Public Data Sets on AWS: centralized repository of public data sets that can be seamlessly integrated into AWS cloud-based applications - Wikipedia
Wikipedia offers free copies of all available content to interested users. These databases can be used for mirroring, personal use, informal backups, offline use or database queries - Freebase
A community-curated database of people, places and things - World Bank
DataBank is an analysis and visualization tool that contains collections of time series data on a variety of topics - Windows Azure Marketplace
Free datasets via Windows Azure Data Market including Academic data, Speech Recognition data, etc. - Machine Learning Repository
200+ Datasets from Center for ML & Intelligent Systems - Deep Learning Data Sets
Music, natural images, text, speech, faces, recommendation systems datasets for benchmarking algorithms - Stanford Large Network Dataset Collection
A collection of about 50 large network datasets from tens of thousands of nodes and edges to tens of millions of nodes and edges. It includes social networks, web graphs, road networks, internet networks, citation networks, collaboration networks, and communication networks. - Yahoo Datasets
We have various types of data available to share. They are categorized into Ratings, Language, Graph, Advertising and Market Data, Computing Systems and an appendix of other relevant data and resources available via the Yahoo! Developer Network.
And, if you are looking for something specific, you can always try your luck posting on reddit/r/datasets or on Open Data StackExchange