Skip to content

Commit 586ad7c

Browse files
committed
[NFC][ntuple] add schema evolution docs
1 parent c3e4cad commit 586ad7c

File tree

1 file changed

+210
-0
lines changed

1 file changed

+210
-0
lines changed

tree/ntuple/doc/SchemaEvolution.md

Lines changed: 210 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,210 @@
1+
# Schema Evolution
2+
3+
Schema evolution is the capability of the ROOT I/O to read data
4+
into in-memory models that are different but compatible to the on-disk schema.
5+
6+
Schema evolution allows for data models to evolve over time
7+
such that old data can be read into current models ("backward compatibility")
8+
and old software can read newer data models ("forward compatibility").
9+
For instance, data model authors may over time add and reorder class members, change data types
10+
(e.g. `std::vector<float>` --> `ROOT::RVec<double>`), rename classes, etc.
11+
12+
ROOT applies automatic schema evolution rules for common, safe and unambiguous cases.
13+
Users can complement the automatic rules by manual schema evolution ("I/O customization rules")
14+
where custom code snippets implement the transformation logic.
15+
In case neither automatic nor any of the provided I/O customization rules suffice
16+
to transform the on-disk schema into the in-memory model, ROOT will error out and refrain from reading data.
17+
18+
This document describes schema evolution support implemented in RNTuple.
19+
For the most part, schema evolution works identical across the different ROOT I/O systems (TFile, TTree, RNTuple).
20+
The exceptions are listed in the last section of this document.
21+
22+
## Automatic schema evolution
23+
24+
ROOT applies a number of rules to read data transparently into in-memory models
25+
that are not an exact match to the on-disk schema.
26+
The automatic rules apply recursively to compound types (classes, tuples, collections, etc.);
27+
the outer types are evolved before the inner types.
28+
29+
Automatic schema evolution rules transform native _types_ as well as the _shape_ of user-defined classes
30+
as listed in the following, exhaustive tables.
31+
32+
### Class shape transformations
33+
34+
User-defined classes can automatically evolve their layout in the following ways.
35+
Note that users should increase the class version number when the layout changes.
36+
However, for RNTuple automatic rules that is not mandatory;
37+
RNTuple will always compare the current on-disk layout with the in-memory model.
38+
39+
| Layout Change | Also supported in Untyped Records | Comment |
40+
| --------------------------------------- | --------------------------------- | -------------------- |
41+
| Remove member | Yes | Match by member name |
42+
| Add member | Yes | Match by member name |
43+
| Reorder members | Yes | Match by member name |
44+
| Remove all base classes | n/a | |
45+
| Add base class(es) where they were none | n/a | |
46+
47+
Reordering and incremental addition or removal of base classes is currently unsupported
48+
but may be supported in future RNTuple versions.
49+
50+
### Type transformations
51+
52+
ROOT transparently reads into in-memory types that are different from but compatible to the on-disk type.
53+
In the following tables, `T'` denotes a type that is compatible to `T`.
54+
55+
#### Plain fields
56+
57+
| In-memory type | Compatible on-disk types | Comment |
58+
| --------------------------- | --------------------------- | ---------------------|
59+
| `bool` | `char` | |
60+
| | `std::[u]int[8,16,32,64]_t` | |
61+
| | enum | |
62+
|-----------------------------|-----------------------------|----------------------|
63+
| `char` | `bool` | |
64+
| | `std::[u]int[8,16,32,64]_t` | with bounds check |
65+
| | enum | with bounds check |
66+
|-----------------------------|-----------------------------|----------------------|
67+
| `std::[u]int[8,16,32,64]_t` | `bool` | |
68+
| | `char` | |
69+
| | `std::[u]int[8,16,32,64]_t` | with bounds check |
70+
| | enum | with bounds check |
71+
|-----------------------------|-----------------------------|----------------------|
72+
| enum | enum of different type | with bounds check |
73+
|-----------------------------|-----------------------------|----------------------|
74+
| float | double | |
75+
|-----------------------------|-----------------------------|----------------------|
76+
| double | float | |
77+
|-----------------------------|-----------------------------|----------------------|
78+
| `std::atomic<T>` | `T'` | |
79+
80+
81+
#### Variable-length collections
82+
83+
| In-memory type | Compatible on-disk types | Comment |
84+
| -------------------------------- | ------------------------------------ | ------------------------------------- |
85+
| `std::vector<T>` | `ROOT::RVec<T'>` | |
86+
| | `std::array<T', N>` | |
87+
| | `std::[unordered_][multi]set<T'>` | |
88+
| | `std::[unordered_][multi]map<K',V'>` | only `T` = `std::[pair,tuple]<K,V>` |
89+
| | `std::optional<T'>` | |
90+
| | `std::unique_ptr<T'>` | |
91+
| | User-defined collection of `T'` | |
92+
| | Untyped collection of `T'` | |
93+
|----------------------------------|--------------------------------------|---------------------------------------|
94+
| `std::RVec<T>` | `ROOT::vector<T'>` | with size check |
95+
| | `std::array<T', N>` | with size check |
96+
| | `std::[unordered_][multi]set<T'>` | with size check |
97+
| | `std::[unordered_][multi]map<K',V'>` | only `T` = `std::[pair,tuple]<K,V>`, |
98+
| | | with size check |
99+
| | `std::optional<T'>` | |
100+
| | `std::unique_ptr<T'>` | |
101+
| | User-defined collection of `T'` | with size check |
102+
| | Untyped collectionof `T'` | with size check |
103+
|----------------------------------|--------------------------------------|---------------------------------------|
104+
| `std::[unordered_]set<T>` | `std::[unordered_]set<T'>` | |
105+
| | `std::[unordered_]map<K',V'>` | only `T` = `std::[pair,tuple]<K,V>` |
106+
|----------------------------------|--------------------------------------|---------------------------------------|
107+
| `std::[unordered_]multiset<T>` | `ROOT::vector<T'>` | |
108+
| | `std::array<T', N>` | |
109+
| | `std::[unordered_][multi]set<T'>` | |
110+
| | `std::[unordered_][multi]map<K',V'>` | only `T` = `std::[pair,tuple]<K,V>` |
111+
| | User-defined collection of `T'` | |
112+
| | Untyped collection of `T'` | |
113+
|----------------------------------|--------------------------------------|---------------------------------------|
114+
| `std::[unordered_]map<K,V>` | `std::[unordered_]map<K',V'>` | |
115+
| | `std::[unordered_]set<T>` | only `T` = `std::[pair,tuple]<K',V'>` |
116+
|----------------------------------|--------------------------------------|---------------------------------------|
117+
| `std::[unordered_]multimap<K,V>` | `ROOT::vector<T>` | only `T` = `std::[pair,tuple]<K,V>` |
118+
| | `std::array<T, N>` | only `T` = `std::[pair,tuple]<K,V>` |
119+
| | `std::[unordered_][multi]set<T>` | only `T` = `std::[pair,tuple]<K,V>` |
120+
| | `std::[unordered_][multi]map<K',V'>` | |
121+
| | User-defined collection of `T` | only `T` = `std::[pair,tuple]<K,V>` |
122+
| | Untyped collection of `T` | only `T` = `std::[pair,tuple]<K,V>` |
123+
124+
#### Nullable fields
125+
126+
| In-memory type | Compatible on-disk types |
127+
| -------------------- | ------------------------ |
128+
| `std::optional<T>` | `std::unique_ptr<T'>` |
129+
| | `T'` |
130+
|----------------------|--------------------------|
131+
| `std::unique_ptr<T>` | `std::optional<T'>` |
132+
| | `T'` |
133+
134+
#### Records
135+
136+
| In-memory type | Compatible on-disk types |
137+
| --------------------------- | -------------------------------------- |
138+
| `std::pair<T,U>` | `std::tuple<T',U'>` |
139+
|-----------------------------|----------------------------------------|
140+
| `std::tuple<T,U>` | `std::pair<T',U'>` |
141+
|-----------------------------|----------------------------------------|
142+
| Untyped record | User-defined class of compatible shape |
143+
144+
Note that for emulated classes, the in-memory untyped record is constructed from on-disk information.
145+
146+
#### Additional rules
147+
148+
All on-disk types `std::atomic<T'>` can be read into a `T` in-memory model.
149+
150+
If a class property changes from using an RNTuple streamer field to a using regular RNTuple class field,
151+
existing files with on-disk streamer fields will continue to read as streamer fields.
152+
This can be seen as "schema evolution out of streamer fields".
153+
154+
## Manual schema evolution (I/O customization rules)
155+
156+
ROOT I/O customization rules allow for custom code handling the transformation
157+
from the on-disk schema to the in-memory model.
158+
Customization rules are part of the class dictionary.
159+
For the exact syntax of customization rules, we refer to the ROOT manual.
160+
161+
Generally, customization rules consist of
162+
- A target class.
163+
- Target members of the target class, i.e. those class members whose value is set by the rule.
164+
Target members must be direct members, i.e. not part of a base class.
165+
- A source class (possibly having a different class name than the target class)
166+
together with class versions or class checksums
167+
that describe all the possible on-disk class versions the rule applies to.
168+
- Source members of the source class; the given source members will be read as the given type.
169+
Source members can also be from a base class.
170+
Note that there is no way to specify a base class member that has the same name as a member in the derived class.
171+
- The custom code snippet; the code snippet has access to the (whole) target object and to the given source members.
172+
173+
At runtime, for any given target member there must be at most be one applicable rule.
174+
A source member can be read them into any type compatible to its on-disk type
175+
but any given source member can only be read into one type for a given target class
176+
(i.e. multiple rules for the same target/source class must not use different types for the same source member).
177+
178+
There are two special types of rules
179+
1. Pure class rename rules consisting only of source and target class
180+
2. Whole-object rules that have no target members
181+
182+
Class rename rules (pure or not) are not transitive
183+
(if in-memory `A` can read from on-disk `B` and in-memory `B` can read from no-disk `C`,
184+
in-memory `A` can not automatically read from on-disk `C`).
185+
186+
Note that customization rules operate on partially read objects.
187+
Customization rules are executed after all members not subject to customization rules have been read from disk.
188+
Whole-object rules are executed after other rules.
189+
Otherwise, the scheduling of rules is unspecified.
190+
191+
## Interplay between automatic and manual schema evolution
192+
193+
The target members of I/O customization rules are exempt from automatic schema evolution
194+
(applies to the corresponding field of the target member and all its subfields).
195+
Otherwise, automatic and manual schema evolution work side by side.
196+
For instance, a renamed class is still subject to automatic schema evolution.
197+
198+
The source member of a customization rule is subject to the same automatic and manual schema evolution rules
199+
as if it was normally read, e.g. in an `RNTupleView`.
200+
201+
## Schema evolution differences between RNTuple and Classic I/O
202+
203+
In contrast to RNTuple, TTree and TFile apply also the following automatic schema evolution rules
204+
- Conversion between floating point and integer types
205+
- Conversion from `unique_ptr<T>` --> `T'`
206+
- Complete conversion matrix of all collection types
207+
- Insertion and removal of intermediate classes
208+
- Move of a member between base class and derived class
209+
- Reordering of base classes
210+

0 commit comments

Comments
 (0)