diff --git a/site/specification/ORCv0.md b/site/specification/ORCv0.md index 32ce14a151..b4fea4e81b 100644 --- a/site/specification/ORCv0.md +++ b/site/specification/ORCv0.md @@ -725,7 +725,7 @@ DIRECT | PRESENT | Yes | Boolean RLE ## Map Columns Maps are encoded as the PRESENT stream and a length stream with number -of items in each list. They have a child column for the key and +of items in each map. They have a child column for the key and another child column for the value. Encoding | Stream Kind | Optional | Contents diff --git a/site/specification/ORCv1.md b/site/specification/ORCv1.md index fb90c8353c..5dbd3d027f 100644 --- a/site/specification/ORCv1.md +++ b/site/specification/ORCv1.md @@ -581,8 +581,6 @@ the index values and the additional value bits. bit is set, the entire value is negated. * Data values (W * L bits padded to the byte) - A sequence of W bit positive values that are added to the base value. -* Data values (W * L bits padded to the byte) - A sequence of W bit positive - values that are added to the base value. * Patch list (PLL * (PGW + PW) bytes) - A list of patches for values that didn't fit within W bits. Each entry in the list consists of a gap, which is the number of elements skipped from the previous @@ -899,7 +897,7 @@ DIRECT_V2 | PRESENT | Yes | Boolean RLE ## Map Columns Maps are encoded as the PRESENT stream and a length stream with number -of items in each list. They have a child column for the key and +of items in each map. They have a child column for the key and another child column for the value. Encoding | Stream Kind | Optional | Contents @@ -978,7 +976,7 @@ group (default to 10,000 rows) in a column. Only the row groups that satisfy min/max row index evaluation will be evaluated against the bloom filter index. -Each BloomFilterEntry stores the number of hash functions ('k') used +Each bloom filter entry stores the number of hash functions ('k') used and the bitset backing the bloom filter. The original encoding (pre ORC-101) of bloom filters used the bitset field encoded as a repeating sequence of longs in the bitset field with a little endian encoding diff --git a/site/specification/ORCv2.md b/site/specification/ORCv2.md index 76ee571f0e..d91139c0fe 100644 --- a/site/specification/ORCv2.md +++ b/site/specification/ORCv2.md @@ -601,8 +601,6 @@ the index values and the additional value bits. bit is set, the entire value is negated. * Data values (W * L bits padded to the byte) - A sequence of W bit positive values that are added to the base value. -* Data values (W * L bits padded to the byte) - A sequence of W bit positive - values that are added to the base value. * Patch list (PLL * (PGW + PW) bytes) - A list of patches for values that didn't fit within W bits. Each entry in the list consists of a gap, which is the number of elements skipped from the previous @@ -916,7 +914,7 @@ DIRECT_V2 | PRESENT | Yes | Boolean RLE ## Map Columns Maps are encoded as the PRESENT stream and a length stream with number -of items in each list. They have a child column for the key and +of items in each map. They have a child column for the key and another child column for the value. Encoding | Stream Kind | Optional | Contents @@ -995,7 +993,7 @@ group (default to 10,000 rows) in a column. Only the row groups that satisfy min/max row index evaluation will be evaluated against the bloom filter index. -Each BloomFilterEntry stores the number of hash functions ('k') used +Each bloom filter entry stores the number of hash functions ('k') used and the bitset backing the bloom filter. The original encoding (pre ORC-101) of bloom filters used the bitset field encoded as a repeating sequence of longs in the bitset field with a little endian encoding