Skip to content
Jan Kotek edited this page Aug 23, 2020 · 2 revisions

Quick and dirty notes and intro for MapDB 4 (aka V4). Will be rewritten into documentation later. It is a brain dump, feel free to send pull request to clarify typos.

Build

Production code is written in Java only. Test code and part of build system is written in Kotlin.

There is code generator for some source files located in srcGen folder. It is ran before compliation, to generate .java files. Just Kotlin code that manipulates strings and file.

Preprocessor marks are in source code and start with //- comment (//-WLOCK, //-newRWLOCK etc).

There are two test gradle tasks:

  • gradle test is quick test ran during normal development. It should run under 10 minutes, use less than 5 GB RAM.
  • gradle testlong is long acceptance test ran before each release. It should ran under one week, requires 64GB RAM and 500GB on disk. Use java.io.tmpdir property to change disk location.

MapDB has some deps (Guava, Eclipse Collections), but those will be removed before release.

Data representation

MapDB is a hybrid between database and Java on-heap collections. This section describes internal representation of data.

Serializers

Older versions used serializers. But over time we extended their role to do hashing, comparators, array inserts etc.. Constructor BTreeMap(Serializer.LONG, Serializer.STRING) is simple way to infer data type. Also this allowed great memory optimalization (Long[] vs long[] for btree nodes) However if Comparator is fused into Seriazer it cements ascending order.

Over time this went into complicated design. V4 replaces Serializer with Shaper (as from Data Shape). Single object Serializer is now special subcase for Shaper.

Data Shape

Data Form

Data form was introduced to reduce memory overhead for heap cache (see BTreeKeySerializer in v3).

For example sorted array of Long numbers in btree node is only used internally, user does not see its content. So on heap it can be stored in many forms, if binary search capability is preserved.

  • Object[] with generic comparator
  • long[] with some plugable comparators
  • int[] if all
  • long start (first value) and byte[] deltas to save memory.

MapDB can also operate directly over binary ByteBuffer. It is possible to compare keys without deserialing them. In this case btree node only needs ByteBuffer offset.

Older versions had this concept (here called data form) added to existing code, a bit dirty. V4 includes this in design from start.

Internally MapDB can represent data in many forms, depending on performance, caching etc.. In BTree high dir nodes could use fast long[] held in cache, lower dir nodes binary ByteBuffer.

User can usually access data only in heap form. For Map<Long,Adress> only Long form of key is accessible. However in future other forms should be accessible. For example Adress value could be transfered from mmap file directly to Netty buffer. This would skip serialization, data copying and CPU cache.

Clone this wiki locally