Skip to content

dinosaure/art

Folders and files

NameName
Last commit message
Last commit date
Aug 5, 2024
Aug 5, 2024
Aug 5, 2024
Aug 5, 2024
Aug 5, 2024
Aug 5, 2024
Aug 6, 2024
Aug 5, 2024
Nov 19, 2019
Aug 5, 2024
Jul 20, 2022
Jan 22, 2021
Jan 22, 2021
Feb 16, 2023
Mar 30, 2023
Jul 11, 2021
Aug 5, 2024

Repository files navigation

Adaptive Radix Tree (ART) in OCaml

This is an implementation in OCaml of ART. Adaptive Radix Tree is like a simple Hashtbl with order:

# let tree = Art.make () ;;
# Art.insert tree (Art.key "foo") 42 ;;
# Art.insert tree (Art.key "bar") 21 ;;
# Art.find tree (Art.key "foo")
- : int = 42

Operation like minimum or maximum are available (which don't exist for a simple Hashtbl.t):

# let tree = Art.make () ;;
# Art.insert tree (Art.key "0") 0 ;;
# Art.insert tree (Art.key "1") 1 ;;
# Art.insert tree (Art.key "2") 2 ;;
# Art.minimum tree
- : int = 0
# Art.maximum tree
- : int = 2

If you want the order and the speed of Hashtbl.t, Art is your library:

The function prefix_iter is also available if you want to get a subset of your tree:

# let t = Art.make () ;;
# Art.insert t (Art.key
# Art.insert t (Art.key "Dalton Joe") 0 ;;
# Art.insert t (Art.key "Dalton Jack") 1 ;;
# Art.insert t (Art.key "Dalton William") 2 ;;
# Art.insert t (Art.key "Dalton Averell") 3 ;;
# Art.insert t (Art.key "Rantanplan") 4 ;;
# let dalton = Art.prefix_iter ~prefix:(Art.key "Dalton")
  (fun k _ a -> (k :> string) :: a) [] t ;;
- : string list = [ "Dalton Joe"
                  ; "Dalton Jack"
		  ; "Dalton William"
		  ; "Dalton Averell" ]

Read Optimised Write Exclusion (ROWEX) in OCaml

ROWEX is a second implementation of ART with atomic operations. It's a functor which expects an implementation of atomic operations such as load or store.

Parallelism, atomic operation & OCaml

The current version of OCaml has a global lock for the GC. By this way, it's not possible for us to execute ROWEX operations (find/insert) with true parallelism if we use the same OCaml runtime. Even if you use LWT or ASYNC, you execute jobs concurrently.

However, ROWEX wants to provide an implementation where find/insert can be executed in parallel without any problems (race condition or ABA problem). So ROWEX provides an implementation, persistent, which implements atomic operations on a memory area. Then, we are able, as parmap, to simulate true parallelism as long as each operations are executed into their own fork().

The goal of this library is provide:

  • the most easy way to switch the implementation to ocaml-multicore
  • a baby step to be able to manipulate a file by several processes (consumers/find, producers/insert) in parallel

ROWEX follows two main papers:

  • The initial implementation of ROWEX
  • A derivation of it to be persistent: PART

Tools

The distribution comes with some tools to manipulate an index:

$ opam pin add -y https://github.com/dinosaure/art
$ opam install rowex
$ part.make index.idx
$ ls -lh
-rw-r--r-- 1 user user 8,0M ----- -- --:-- index.idx
prw------- 1 user user    0 ----- -- --:-- index.idx.socket
prw------- 1 user user    0 ----- -- --:-- index.idx-truncate.socket
$ part.insert index.idx foo 1
$ part.find index.idx foo
1

On the OCaml side, a Part module exists which implements these functions:

type 'a t constraint 'a = [< `Rd | `Wr ]

val create : ?len:int -> string -> unit
val insert : [> `Rd | `Wr ] t -> string -> int -> unit
val lookup : [> `Rd ] t -> string -> int

part is Unix dependent (and it need an Unix named pipe). It ensures with explained internal mechanisms to use multiple readers and one writer:

  • The writer can take the exclusive ownership on the index file and its named pipe
  • readers don't need to take the ownership but they must send a signal into the named pipe (to the writer) that they start to introspect the index

For readers, some functions exist to signal their existence to the write:

val append_reader : Ipc.t -> unit
val delete_reader : Ipc.t -> unit

val ipc : _ t -> Ipc.t

Status: experimental

This part of the distribution is experimental - even if the distribution comes with several tests to ensure that the implementation works, ROWEX is fragile! It still need a synchronization mechanism fsync() which is added pervasively in some parts of the code according to outcomes of errors.