Improve VariableWidthBlock encoding#27377
Conversation
a842e17 to
f1d3e60
Compare
core/trino-spi/src/main/java/io/trino/spi/block/EncoderUtil.java
Outdated
Show resolved
Hide resolved
core/trino-spi/src/main/java/io/trino/spi/block/VariableWidthBlockEncoding.java
Outdated
Show resolved
Hide resolved
core/trino-spi/src/main/java/io/trino/spi/block/EncoderUtil.java
Outdated
Show resolved
Hide resolved
10a9db5 to
e08afb0
Compare
| @Language("SQL") String createTableSql = | ||
| """ | ||
| CREATE TABLE test_table_writer_skew_mitigation WITH (%s = ARRAY['returnflag']) AS | ||
| SELECT orderkey, partkey, suppkey, linenumber, quantity, extendedprice, discount, tax, linestatus, shipdate, commitdate, receiptdate, shipinstruct, shipmode, comment, returnflag |
There was a problem hiding this comment.
Why do we want to delete column comment?
There was a problem hiding this comment.
The test is highly dependent on the serialized page size, so adding 4 extra bytes per serialized page breaks the test. I'm duplicating the workaround from #15760 but this is indeed a hack.
There was a problem hiding this comment.
I understand, hacky part is from the test setup itself though.
There was a problem hiding this comment.
I'm actually going to avoid changing the serialized size- we don't need to add the extra 4 bytes anyway since we can just send the non-null ending offsets and not send the starting offset of 0 in the serialized representation.
There was a problem hiding this comment.
...and it didn't work. Reintroducing the hack.
There was a problem hiding this comment.
I actually think this is not a good fix to the spurious test, since we will probably always change block serde, another code change may fail on the case where comment is deleted, but pass on the case where comment is added back. I think we should change the assertion here instead https://github.com/trinodb/trino/blob/master/testing/trino-faulttolerant-tests/src/test/java/io/trino/faulttolerant/BaseFaultTolerantExecutionTest.java#L62C20-L62C39, the two numbers just happen to be equal and is not guaranteed to be equal on all cases.
b076058 to
85afc34
Compare
|
Started benchmark workflow for this PR with test type =
|
|
Started benchmark workflow for this PR with test type =
|
db6b908 to
ddaef15
Compare
core/trino-spi/src/main/java/io/trino/spi/block/VariableWidthBlockEncoding.java
Show resolved
Hide resolved
| .writeInts(rawOffsets, baseOffset + 1, positionCount); | ||
| } | ||
| else if (valueIsNull == null) { | ||
| // Subtract starting offset from each offset value to translate them to start from zero, no null suppression required |
There was a problem hiding this comment.
Is it possible to reuse io.trino.spi.block.BlockUtil#compactOffsets ?
There was a problem hiding this comment.
Not quite, compactOffsets leaves the initial starting offset value of 0 in the resulting array whereas this branch omits it. I think we're better off with slight duplication to emphasize that we're working only with the ending offset for each position here.
core/trino-spi/src/main/java/io/trino/spi/block/VariableWidthBlockEncoding.java
Outdated
Show resolved
Hide resolved
core/trino-spi/src/main/java/io/trino/spi/block/VariableWidthBlockEncoding.java
Outdated
Show resolved
Hide resolved
core/trino-spi/src/main/java/io/trino/spi/block/VariableWidthBlockEncoding.java
Outdated
Show resolved
Hide resolved
core/trino-spi/src/main/java/io/trino/spi/block/VariableWidthBlockEncoding.java
Outdated
Show resolved
Hide resolved
core/trino-spi/src/main/java/io/trino/spi/block/VariableWidthBlockEncoding.java
Outdated
Show resolved
Hide resolved
Avoids converting between offsets and lengths when serializing and deserializing VariableWidthBlock instances, which enables a fast-path conversion for blocks without nulls present. When nulls are present, the compaction and expansion of offsets still outperforms the length to offset conversion.
Avoids encoding the slice length separately from the offsets, which can be used to infer the value directly.
ddaef15 to
99fa73f
Compare
Description
Avoids converting between offsets and lengths when serializing and deserializing VariableWidthBlock instances, which enables a fast-path conversion for blocks without nulls present. When nulls are present, the compaction and expansion of offsets still outperforms the length to offset conversion.
Also removes an additional
intper serialized block by referencing theoffsetsarray to determine the slice length instead of sending it as a separate field.Benchmarks
jmh.morethan.io - Significant improvements
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text: