-
Notifications
You must be signed in to change notification settings - Fork 106
ListTypes
Batteries has a number of list types: List
, LazyList
, Seq
, and Enum
. This page aims to clarify the differences between them and help programmers make the right choice for their program.
Plain old OCaml lists. Top-notch syntax for literals and pattern matching.
let a = [1;2;3] in
match a with
| [] -> "Empty"
| 1::_ -> "Starts with 1"
| _h::_t -> "Not empty"
Lists are eagerly evaluated - when you map from one list to another, the result is immediately created as a whole.
List.map (fun x -> x * 2) [1;2;3]
is immediately evaluated, the result list created in memory at once. There is an issue with building lists from head to tail, as immutability generally requires the opposite construction order, but Batteries uses extlib's hacks to build the list in-order despite this restriction. List has a very large library of functions.
The simplest on-demand sequences are built out of just functions: a sequence is a function that, when called, either tells you that the sequence is empty or give you the first element and the rest as a sequence (a function to be called again). In Batteries, this is done by Seq
, and this is what you should use if you need nothing more than that (and its API suits you, of course).
LazyList
adds memoization: if you have the same sequence value and ask for its head element twice, Seq
will call the function twice which may incur some recomputation. LazyList
values are memoized, so that on the second call they are not recomputed but just returned directly (but this requires some book-keeping that has some performance impact). If you are going to access the same elements several times, LazyList
may be better.
Enum
, on the contrary, is "more imperative": when you access an element once, it is consumed, removed from the enum, and you cannot get it back. This in theory guarantees a nice memory property: You cannot leak memory by mistakenly keeping a reference to the head of the sequence.
Lazy lists are similar to lists, with the exception that their contents are only computed whenever requested. This makes them particularly useful in contexts where streams of data are to be handled.
Enumerations (as featured in module Enum
) and lazy lists are quite similar in purpose. Lazy lists are slightly higher level, insofar as no cloning is required to get them to work, which makes them slightly more useful in contexts where backtracking is common. Enumerations, on the other hand, are closer to traditional stream processing, and require more low-level marking whenever backtracking is required, but may be faster and more memory-efficient when used properly. Either choice is recommended over OCaml's built-in Stream
.
Enumerations are a representation of finite or infinite sequences of elements. In Batteries, enumerations are used pervasively, both as a uniform manner of reading and manipulating the contents of a data structure, or as a simple manner of reading or writing sequences of characters, numbers, strings, etc. from/to files, network connections or other inputs/outputs.
Enumerations are typically computed as needed, which allows the definition and manipulation of huge (possibly infinite) sequences. Manipulating an enumeration is a uniform and often comfortable way of extracting subsequences (function BatEnum.filter
or operator //
et al), converting sequences into other sequences (function BatEnum.map
or operators /@
and @/
et al), gathering information (function BatEnum.scanl
et al) or performing loops (functions BatEnum.iter
and BatEnum.map
).
For instance, function BatRandom.enum_int
creates an infinite enumeration of random numbers. Combined with //
and BatEnum.map
, we may turn this into an infinite enumeration of squares of random even numbers: map (fun x -> x * x) ( (Random.enum_int 100) // even
)
Similarly, to obtain an enumeration of 50 random integers, we may use BatEnum.take, as follows: take 50 (Random.enum_int 100)
As most data structures in Batteries can be enumerated and built from enumerations, these operations may be used also on lists, arrays, hashtables, etc. When designing a new data structure, it is usuallly a good idea to allow enumeration and construction from an enumeration.
Note Enumerations are not thread-safe. You should not attempt to access one enumeration from different threads.
Enum
has a very large library of functions.
A sequence represents a collection of elements for which you never construct the complete representation.
Basically you should use a sequence when you would prefer using a list or a lazy-list but constructing the whole list explicitly would explode your memory.
All functions returning a sequence operates in time and space O(1).
Note that if you want a "consumable sequence", you should prefer using enumerations (from module Enum
).