NB :: Not all the DB's are freely available.
- LAW
- SOFTWARE
- ACTUARIAL SCIENCE
- ASTRONOMY
- BIOLOGY
- CHEMISTRY
- DATA
- Earth Science
- Gender Violence
- MACHINE LEARNING
- MATH
- PHYSICS
- EU Data protection law: http://ec.europa.eu/justice/newsroom/data-protection/infographic/2017/index_en.htm
- aristotle-metadata-registry :: Aristotle-MDR is an open-source metadata registry as laid out by the requirements of the ISO/IEC 11179 specification.
- camlipy :: The unofficial Python client for Camlistore. Documentation.
- caffe-oxford102 :: Caffe CNNs for the Oxford 102 flower dataset.
- eggo :: Ready-to-go Parquet-formatted public Genomics datasets.
- data-projects :: Scripts and data for various Vox Media stories and news projects.
- scrapi :: A data processing pipeline that schedules and runs content harvesters, normalizes their data, and outputs that normalized data to a variety of output streams.
- simmetrica :: Lightweight framework for collecting and aggregating event metrics as timeseries data.
- The FRED economic time series data.
- NASA Open DataSets :: To solve looming challenges here on Earth using NASA data, tools, and resources.
- sndatasets :: Download and normalize published supernova photometric data.
- WP's list of biological databases.
- ChromosomeMappings :: This repository contains chromosome/contig name mappings between UCSC <-> Ensembl <-> Gencode for a variety of genomes.
- A database of OA/free RNA-seq and genomic data-sets used in research projects at JHU.
- Download Gene data (via ftp) which integrates information from a wide range of species. A record may include nomenclature, Reference Sequences (RefSeqs), maps, pathways, variations, phenotypes, and links to genome-, phenotype-, and locus-specific resources worldwide.
- Saccharomyces Genome Database
- Genome Project Database
- RefSeqGene defines genomic sequences to be used as reference standards for well-characterized genes and is part of the Locus Reference Genomic (LRG) Project.
- The 3000 Rice Genomes Project Data, GigaScience Database and Journal and [blog article in BMC](See also: http://blogs.biomedcentral.com/gigablog/2014/05/29/publish-data-fight-world-hunger/).
- NCBI's Sequence Read Archive (SRA)
- DataLad :: aims to provide access to scientific data available from various sources (e.g. lab or consortium web-sites such as Human connectome; data sharing portals such as OpenFMRI and CRCNS) through a single convenient interface and integrated with your software package managers (such as APT in Debian). Although initially targeting neuroimaging and neuroscience data in general, it will not be limited by the domain and a wide range of contributions are welcome. Get the source code on github.
- BLAST :: BLAST Assembled Genomes.
- Chimpanzee Genome Project
- DataLad :: aims to provide access to scientific data available from various sources (e.g. lab or consortium web-sites such as Human connectome; data sharing portals such as OpenFMRI and CRCNS) through a single convenient interface and integrated with your software package managers (such as APT in Debian). Although initially targeting neuroimaging and neuroscience data in general, it will not be limited by the domain and a wide range of contributions are welcome. Get the source code on github.
- dbGaP :: The database of Genotypes and Phenotypes (dbGaP) was developed to archive and distribute the results of studies that have investigated the interaction of genotype and phenotype.
- Ensembl DB.
- Ensembl Genomes DB.
- Entrez DB.
- Genome project :: http://en.wikipedia.org/wiki/Genome_project
- Genome Project Database
- Human Genome Project
- Human microbiome project
- Neanderthal genome project
- NHLBI (National Heart, Lung, and Blood Institute) resources, NIH (National Institutes of Health).
- Personal Genome Project
- Reactome DB.
- RefSeqGene defines genomic sequences to be used as reference standards for well-characterized genes and is part of the Locus Reference Genomic (LRG) Project.
- The 3000 Rice Genomes Project Data, GigaScience Database and Journal and [blog article in BMC](See also: http://blogs.biomedcentral.com/gigablog/2014/05/29/publish-data-fight-world-hunger/).
- Saccharomyces Genome Database
- Sequence Read Archive (SRA) from NCBI and the NCBI Genebank.
- The central MANUELA database, a.k.a. _M_eiobenthic _A_nd _N_ematode biodiversity _U_nravelling _E_cological and _L_atitudinal _A_spects database is compiled by capturing the available data on meiobenthos on a broad European scale.
- Nematodes DB from the Blaxter Lab, based on analyses of ESTs or GSSs from "neglected taxa" using the PartiGene suite of programmes.
- Nematode Transcriptome Analyses.
- WormBase :: Species genomes with standardized sequence and annotations.
- NCBI Resources for Genetics and Medicine.
- HIV-1, Human Protein Interaction Database :: A database of known interactions of HIV-1 proteins with proteins from human hosts. It provides annotated bibliographies of published reports of protein interactions, with links to the corresponding PubMed records and sequence data.
- Computed Tomography Emphysema Database.
- Cornell's Public Medical Image Databases.
- SASBDB ::Small Angle Scattering Biological Data Bank.
- http://www.embl-hamburg.de/biosaxs/
- http://www.embl-hamburg.de/biosaxs/software.html
- Codeneuro-Datasets :: Shared data sets for collaborating, testing, and benchmarking.
- Neuroscience Databases list.
- NeuroVault :: A web database for human brain statistical maps, atlases and parcellation maps that researchers can publicly store and share their unthresholded statistical maps produced by MRI and PET studies. Source code.
- OpenfMRI.org :: A project dedicated to the free and open sharing of functional magnetic resonance imaging (fMRI) datasets, including raw data.
- Open PHACTS search service.
- OSDD - open source drug discovery.
- Chemical DB list on WP.
- Crystallographic databases list on WP.
- Crystallography Open Database.
- Protein Data Bank (PDB) on WP.
- Inorganic Crystal Structure Database
- Payer or Prayer - A Look at NYC’s $650 Million Property Tax Breaks Related to Religion
- The European Data Portal Library. Their site's source code is available on gitlab.
- awesome-public-datasets :: A collection of large-scale public datasets on the Internet.
- common-workflow-language :: Repository for CWL Specfications.
- datasets :: Original data or Aggregated / cleaned / restructured existing datasets. Released under Creative Commons Attribution-ShareAlike 4.0 International License.
- Freebase :: A community-curated database of well-known people, places, and things.
- Wikidata :: A free linked database that acts as central storage for the structured data of its Wikimedia sister projects including Wikipedia, Wikivoyage, Wikisource, and others, that can be read and edited by both humans and machines.
- SPARQL endpoints to Wikidata with a general introduction to Wikidata and its data model.
- World Bank Open Data :: Free and open access to data about development in countries around the globe.
- Decision trees.
- Registry of Research Data Repositories :: provides researchers, funding organisations, libraries and publishers with over 1,000 listed research data repositories from all over the world making it the largest and most comprehensive online catalog of research data repositories on the web.
- https://schema.datacite.org/
- http://www.researchobject.org/
- http://wf4ever.github.io/ro-primer/
- https://databasin.org/datasets/
- Scientific Databases list on WP.
- Chemical DB list on WP.
- British Geological Survey data and a blog post linking 900 Gb of digital and electronic data deposited with the British Geological Survey since July 2014.
- Free CORPUS: http://www.cs.pitt.edu/mpqa/
- Juliaset.jl :: Generate Julia set images. This is created primarily as an example for JuliaBox hosted REST APIs.
- CERN OpenData Portal.