- https://en.wikipedia.org/wiki/Document-oriented_database
- How to create an RDF DB, https://www.quora.com/How-do-I-create-a-RDF-database
This cheat sheet presents a basic blueprint for applying MapReduce to solving large-scale, unstructured data processing problems by showing how to deploy and use an Apache Hadoop computational cluster. http://bit.ly/StartedApache
This is a complement of Dzone Refcardz #43 and #103, which provides introductions to high performance computational scalability and high-volume data handling techniques, including MapReduce. Download Here: http://opensourceuniverse.tradepub.com/free/w_dzon04/prgm.cgi
- Also pick up the Apache HBase: The NoSQL Database for Hadoop and Big Data cheat sheet here: http://opensourceuniverse.tradepub.com/free/w_dzon07/prgm.cgi
- https://www.arangodb.com/2016/06/arangodb-3-0-a-solid-ground-to-scale/
- https://github.com/shekhargulati/52-technologies-in-2016/blob/master/13-arangodb/README.md
- Arango vs. MongoDB: https://www.arangodb.com/tutorials/mongodb-to-arangodb-tutorial/
- CheatSheet, http://www.arangodb.org/2012/08/05/arangodb-shell-cheat-sheet
- Try Arango online, http://www.arangodb.org/try
- CaleyDB for RDF data, https://news.ycombinator.com/item?id=7946024
- https://github.com/google/cayley
- https://johngoodwin225.wordpress.com/2014/06/29/quick-play-with-cayley-graph-db-and-ordnance-survey-linked-data/
- Timeseries in Mongo: https://dev.to/riccardo_cardin/implementing-time-series-in-mongodb
- http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/
- Neo4j is a robust, high performance, scalable graph NOSQL database solving the complex, connected data challenges that enterprises face today: http://neo4j.org/
An open source artificial intelligence framework with a graph database that holds terms, atomic formulas, sentences and relationships as hypergraphs; giving them a probabilistic truth-value interpretation, dubbed the AtomSpace.
- Working with large datasets, http://news.ycombinator.com/item?id=3614706 and http://www.bigfastblog.com/how-to-get-experience-working-with-large-datasets
The array DB.
- http://www.datanami.com/2014/04/09/array_databases:_the_next_big_thing_in_data_analytics_/
- http://www.forbes.com/sites/petercohan/2014/02/07/paradigm4-the-next-big-thing-in-big-data/
- http://www.odbms.org/blog/2014/04/interview-mike-stonebraker-paul-brown/
- HDF5 to SciDB, https://groups.google.com/forum/#!topic/pydata/S3kLxyrizkI
- HDF Vs. Pytables, http://stackoverflow.com/questions/7883646/exporting-from-importing-to-numpy-scipy-in-sqlite-and-hdf5-formats/7891137#7891137
- http://semanticommunity.info/AOL_Government/Data_Science_Visualizations_Past_Present_and_Future
- Check out scidb http://www.scidb.org/, a no-sql database specifically for scientific purposes, developed by Stonebraker and others. Here's a paper on a demonstration of how to use it http://people.csail.mit.edu/pcm/papers/SciDB_Demo.pdf
- Data Duplication, Server Redundancy, and Master Failover, http://www.scidb.org/forum/viewtopic.php?f=6&t=1068
- https://github.com/jmeehan16/whitematter/blob/master/read_me.txt