|  | 
|  | 1 | +# Schema Evolution | 
|  | 2 | + | 
|  | 3 | +Schema evolution is the capability of the ROOT I/O to read data | 
|  | 4 | +into in-memory models that are different but compatible to the on-disk schema. | 
|  | 5 | + | 
|  | 6 | +Schema evolution allows for data models to evolve over time | 
|  | 7 | +such that old data can be read into current models ("backward compatibility") | 
|  | 8 | +and old software can read newer data models ("forward compatibility"). | 
|  | 9 | +For instance, data model authors may over time add and reorder class members, change data types | 
|  | 10 | +(e.g. `std::vector<float>` --> `ROOT::RVec<double>`), rename classes, etc. | 
|  | 11 | + | 
|  | 12 | +ROOT applies automatic schema evolution rules for common, safe and unambiguous cases. | 
|  | 13 | +Users can complement the automatic rules by manual schema evolution ("I/O customization rules") | 
|  | 14 | +where custom code snippets implement the transformation logic. | 
|  | 15 | +In case neither automatic nor any of the provided I/O customization rules suffice | 
|  | 16 | +to transform the on-disk schema into the in-memory model, ROOT will error out and refrain from reading data. | 
|  | 17 | + | 
|  | 18 | +This document describes schema evolution support implemented in RNTuple. | 
|  | 19 | +For the most part, schema evolution works identical across the different ROOT I/O systems (TFile, TTree, RNTuple). | 
|  | 20 | +The exceptions are listed in the last section of this document. | 
|  | 21 | + | 
|  | 22 | +## Automatic schema evolution | 
|  | 23 | + | 
|  | 24 | +ROOT applies a number of rules to read data transparently into in-memory models | 
|  | 25 | +that are not an exact match to the on-disk schema. | 
|  | 26 | +The automatic rules apply recursively to compound types (classes, tuples, collections, etc.); | 
|  | 27 | +the outer types are evolved before the inner types. | 
|  | 28 | + | 
|  | 29 | +Automatic schema evolution rules transform native _types_ as well as the _shape_ of user-defined classes | 
|  | 30 | +as listed in the following, exhaustive tables. | 
|  | 31 | + | 
|  | 32 | +### Class shape transformations | 
|  | 33 | + | 
|  | 34 | +User-defined classes can automatically evolve their layout in the following ways. | 
|  | 35 | +Note that users should increase the class version number when the layout changes. | 
|  | 36 | +However, for RNTuple automatic rules that is not mandatory; | 
|  | 37 | +RNTuple will always compare the current on-disk layout with the in-memory model. | 
|  | 38 | + | 
|  | 39 | +| Layout Change                           | Also supported in Untyped Records | Comment              | | 
|  | 40 | +| --------------------------------------- | --------------------------------- | -------------------- | | 
|  | 41 | +| Remove member                           | Yes                               | Match by member name | | 
|  | 42 | +| Add member                              | Yes                               | Match by member name | | 
|  | 43 | +| Reorder members                         | Yes                               | Match by member name | | 
|  | 44 | +| Remove all base classes                 | n/a                               |                      | | 
|  | 45 | +| Add base class(es) where they were none | n/a                               |                      | | 
|  | 46 | + | 
|  | 47 | +Reordering and incremental addition or removal of base classes is currently unsupported | 
|  | 48 | +but may be supported in future RNTuple versions. | 
|  | 49 | + | 
|  | 50 | +### Type transformations | 
|  | 51 | + | 
|  | 52 | +ROOT transparently reads into in-memory types that are different from but compatible to the on-disk type. | 
|  | 53 | +In the following tables, `T'` denotes a type that is compatible to `T`. | 
|  | 54 | + | 
|  | 55 | +#### Plain fields | 
|  | 56 | + | 
|  | 57 | +| In-memory type              | Compatible on-disk types    | Comment              | | 
|  | 58 | +| --------------------------- | --------------------------- | ---------------------| | 
|  | 59 | +| `bool`                      | `char`                      |                      | | 
|  | 60 | +|                             | `std::[u]int[8,16,32,64]_t` |                      | | 
|  | 61 | +|                             | enum                        |                      | | 
|  | 62 | +|-----------------------------|-----------------------------|----------------------| | 
|  | 63 | +| `char`                      | `bool`                      |                      | | 
|  | 64 | +|                             | `std::[u]int[8,16,32,64]_t` | with bounds check    | | 
|  | 65 | +|                             | enum                        | with bounds check    | | 
|  | 66 | +|-----------------------------|-----------------------------|----------------------| | 
|  | 67 | +| `std::[u]int[8,16,32,64]_t` | `bool`                      |                      | | 
|  | 68 | +|                             | `char`                      |                      | | 
|  | 69 | +|                             | `std::[u]int[8,16,32,64]_t` | with bounds check    | | 
|  | 70 | +|                             | enum                        | with bounds check    | | 
|  | 71 | +|-----------------------------|-----------------------------|----------------------| | 
|  | 72 | +| enum                        | enum of different type      | with bounds check    | | 
|  | 73 | +|-----------------------------|-----------------------------|----------------------| | 
|  | 74 | +| float                       | double                      |                      | | 
|  | 75 | +|-----------------------------|-----------------------------|----------------------| | 
|  | 76 | +| double                      | float                       |                      | | 
|  | 77 | +|-----------------------------|-----------------------------|----------------------| | 
|  | 78 | +| `std::atomic<T>`            | `T'`                        |                      | | 
|  | 79 | + | 
|  | 80 | + | 
|  | 81 | +#### Variable-length collections | 
|  | 82 | + | 
|  | 83 | +| In-memory type                   | Compatible on-disk types             | Comment                               | | 
|  | 84 | +| -------------------------------- | ------------------------------------ | ------------------------------------- | | 
|  | 85 | +| `std::vector<T>`                 | `ROOT::RVec<T'>`                     |                                       | | 
|  | 86 | +|                                  | `std::array<T', N>`                  |                                       | | 
|  | 87 | +|                                  | `std::[unordered_][multi]set<T'>`    |                                       | | 
|  | 88 | +|                                  | `std::[unordered_][multi]map<K',V'>` | only `T` = `std::[pair,tuple]<K,V>`   | | 
|  | 89 | +|                                  | `std::optional<T'>`                  |                                       | | 
|  | 90 | +|                                  | `std::unique_ptr<T'>`                |                                       | | 
|  | 91 | +|                                  | User-defined collection of `T'`      |                                       | | 
|  | 92 | +|                                  | Untyped collection of `T'`           |                                       | | 
|  | 93 | +|----------------------------------|--------------------------------------|---------------------------------------| | 
|  | 94 | +| `std::RVec<T>`                   | `ROOT::vector<T'>`                   | with size check                       | | 
|  | 95 | +|                                  | `std::array<T', N>`                  | with size check                       | | 
|  | 96 | +|                                  | `std::[unordered_][multi]set<T'>`    | with size check                       | | 
|  | 97 | +|                                  | `std::[unordered_][multi]map<K',V'>` | only `T` = `std::[pair,tuple]<K,V>`,  | | 
|  | 98 | +|                                  |                                      | with size check                       | | 
|  | 99 | +|                                  | `std::optional<T'>`                  |                                       | | 
|  | 100 | +|                                  | `std::unique_ptr<T'>`                |                                       | | 
|  | 101 | +|                                  | User-defined collection of `T'`      | with size check                       | | 
|  | 102 | +|                                  | Untyped collectionof `T'`            | with size check                       | | 
|  | 103 | +|----------------------------------|--------------------------------------|---------------------------------------| | 
|  | 104 | +| `std::[unordered_]set<T>`        | `std::[unordered_]set<T'>`           |                                       | | 
|  | 105 | +|                                  | `std::[unordered_]map<K',V'>`        | only `T` = `std::[pair,tuple]<K,V>`   | | 
|  | 106 | +|----------------------------------|--------------------------------------|---------------------------------------| | 
|  | 107 | +| `std::[unordered_]multiset<T>`   | `ROOT::vector<T'>`                   |                                       | | 
|  | 108 | +|                                  | `std::array<T', N>`                  |                                       | | 
|  | 109 | +|                                  | `std::[unordered_][multi]set<T'>`    |                                       | | 
|  | 110 | +|                                  | `std::[unordered_][multi]map<K',V'>` | only `T` = `std::[pair,tuple]<K,V>`   | | 
|  | 111 | +|                                  | User-defined collection of `T'`      |                                       | | 
|  | 112 | +|                                  | Untyped collection of `T'`           |                                       | | 
|  | 113 | +|----------------------------------|--------------------------------------|---------------------------------------| | 
|  | 114 | +| `std::[unordered_]map<K,V>`      | `std::[unordered_]map<K',V'>`        |                                       | | 
|  | 115 | +|                                  | `std::[unordered_]set<T>`            | only `T` = `std::[pair,tuple]<K',V'>` | | 
|  | 116 | +|----------------------------------|--------------------------------------|---------------------------------------| | 
|  | 117 | +| `std::[unordered_]multimap<K,V>` | `ROOT::vector<T>`                    | only `T` = `std::[pair,tuple]<K,V>`   | | 
|  | 118 | +|                                  | `std::array<T, N>`                   | only `T` = `std::[pair,tuple]<K,V>`   | | 
|  | 119 | +|                                  | `std::[unordered_][multi]set<T>`     | only `T` = `std::[pair,tuple]<K,V>`   | | 
|  | 120 | +|                                  | `std::[unordered_][multi]map<K',V'>` |                                       | | 
|  | 121 | +|                                  | User-defined collection of `T`       | only `T` = `std::[pair,tuple]<K,V>`   | | 
|  | 122 | +|                                  | Untyped collection of `T`            | only `T` = `std::[pair,tuple]<K,V>`   | | 
|  | 123 | + | 
|  | 124 | +#### Nullable fields | 
|  | 125 | + | 
|  | 126 | +| In-memory type       | Compatible on-disk types | | 
|  | 127 | +| -------------------- | ------------------------ | | 
|  | 128 | +| `std::optional<T>`   | `std::unique_ptr<T'>`    | | 
|  | 129 | +|                      | `T'`                     | | 
|  | 130 | +|----------------------|--------------------------| | 
|  | 131 | +| `std::unique_ptr<T>` | `std::optional<T'>`      | | 
|  | 132 | +|                      | `T'`                     | | 
|  | 133 | + | 
|  | 134 | +#### Records | 
|  | 135 | + | 
|  | 136 | +| In-memory type              | Compatible on-disk types               | | 
|  | 137 | +| --------------------------- | -------------------------------------- | | 
|  | 138 | +| `std::pair<T,U>`            | `std::tuple<T',U'>`                    | | 
|  | 139 | +|-----------------------------|----------------------------------------| | 
|  | 140 | +| `std::tuple<T,U>`           | `std::pair<T',U'>`                     | | 
|  | 141 | +|-----------------------------|----------------------------------------| | 
|  | 142 | +| Untyped record              | User-defined class of compatible shape | | 
|  | 143 | + | 
|  | 144 | +Note that for emulated classes, the in-memory untyped record is constructed from on-disk information. | 
|  | 145 | + | 
|  | 146 | +#### Additional rules | 
|  | 147 | + | 
|  | 148 | +All on-disk types `std::atomic<T'>` can be read into a `T` in-memory model. | 
|  | 149 | + | 
|  | 150 | +If a class property changes from using an RNTuple streamer field to a using regular RNTuple class field, | 
|  | 151 | +existing files with on-disk streamer fields will continue to read as streamer fields. | 
|  | 152 | +This can be seen as "schema evolution out of streamer fields". | 
|  | 153 | + | 
|  | 154 | +## Manual schema evolution (I/O customization rules) | 
|  | 155 | + | 
|  | 156 | +ROOT I/O customization rules allow for custom code handling the transformation | 
|  | 157 | +from the on-disk schema to the in-memory model. | 
|  | 158 | +Customization rules are part of the class dictionary. | 
|  | 159 | +For the exact syntax of customization rules, we refer to the ROOT manual. | 
|  | 160 | + | 
|  | 161 | +Generally, customization rules consist of | 
|  | 162 | +  - A target class. | 
|  | 163 | +  - Target members of the target class, i.e. those class members whose value is set by the rule. | 
|  | 164 | +    Target members must be direct members, i.e. not part of a base class. | 
|  | 165 | +  - A source class (possibly having a different class name than the target class) | 
|  | 166 | +    together with class versions or class checksums | 
|  | 167 | +    that describe all the possible on-disk class versions the rule applies to. | 
|  | 168 | +  - Source members of the source class; the given source members will be read as the given type. | 
|  | 169 | +    Source members can also be from a base class. | 
|  | 170 | +    Note that there is no way to specify a base class member that has the same name as a member in the derived class. | 
|  | 171 | +  - The custom code snippet; the code snippet has access to the (whole) target object and to the given source members. | 
|  | 172 | + | 
|  | 173 | +At runtime, for any given target member there must be at most be one applicable rule. | 
|  | 174 | +A source member can be read them into any type compatible to its on-disk type | 
|  | 175 | +but any given source member can only be read into one type for a given target class | 
|  | 176 | +(i.e. multiple rules for the same target/source class must not use different types for the same source member). | 
|  | 177 | + | 
|  | 178 | +There are two special types of rules | 
|  | 179 | +  1. Pure class rename rules consisting only of source and target class | 
|  | 180 | +  2. Whole-object rules that have no target members | 
|  | 181 | + | 
|  | 182 | +Class rename rules (pure or not) are not transitive | 
|  | 183 | +(if in-memory `A` can read from on-disk `B` and in-memory `B` can read from no-disk `C`, | 
|  | 184 | +in-memory `A` can not automatically read from on-disk `C`). | 
|  | 185 | + | 
|  | 186 | +Note that customization rules operate on partially read objects. | 
|  | 187 | +Customization rules are executed after all members not subject to customization rules have been read from disk. | 
|  | 188 | +Whole-object rules are executed after other rules. | 
|  | 189 | +Otherwise, the scheduling of rules is unspecified. | 
|  | 190 | + | 
|  | 191 | +## Interplay between automatic and manual schema evolution | 
|  | 192 | + | 
|  | 193 | +The target members of I/O customization rules are exempt from automatic schema evolution | 
|  | 194 | +(applies to the corresponding field of the target member and all its subfields). | 
|  | 195 | +Otherwise, automatic and manual schema evolution work side by side. | 
|  | 196 | +For instance, a renamed class is still subject to automatic schema evolution. | 
|  | 197 | + | 
|  | 198 | +The source member of a customization rule is subject to the same automatic and manual schema evolution rules | 
|  | 199 | +as if it was normally read, e.g. in an `RNTupleView`. | 
|  | 200 | + | 
|  | 201 | +## Schema evolution differences between RNTuple and Classic I/O | 
|  | 202 | + | 
|  | 203 | +In contrast to RNTuple, TTree and TFile apply also the following automatic schema evolution rules | 
|  | 204 | +  - Conversion between floating point and integer types | 
|  | 205 | +  - Conversion from `unique_ptr<T>` --> `T'` | 
|  | 206 | +  - Complete conversion matrix of all collection types | 
|  | 207 | +  - Insertion and removal of intermediate classes | 
|  | 208 | +  - Move of a member between base class and derived class | 
|  | 209 | +  - Reordering of base classes | 
|  | 210 | + | 
0 commit comments