Tsdb doc values inline building jump table#126499
Conversation
Build jump table (disi) when iterating over SortedNumericDocValues, instead of separately iterating over SortedNumericDocValues. In case when indexing sorting is active, this requires an additional merge sort. Follow up from elastic#125403
server/src/main/java/org/elasticsearch/index/codec/tsdb/es819/ES819TSDBDocValuesConsumer.java
Outdated
Show resolved
Hide resolved
|
The latest micro benchmark result on top of this PR. Two new benchmark methods were added that test the sparse case. The dense benchmark methods were the same as before. The benchmark method changed to |
dnhatn
left a comment
There was a problem hiding this comment.
I am good with this approach. Thanks Martijn!
| encoder.encode(buffer, data); | ||
| IndexOutput disiTempOutput = null; | ||
| String skipListTempFileName = null; | ||
| IndexedDISIBuilder docIdSetBuilder = null; |
There was a problem hiding this comment.
Should we make IndexedDISIBuilder (or maybe Accumulator is a better name?) Closable, pass a Directory in its constructor, and pass the IndexOutput in the build (or flush) method to copy the temporary output to the data output? This way, we only need a single reference here, making the code more manageable.
There was a problem hiding this comment.
This really made the code more manageable! I also forked the original IndexedDISI tests and adapted that for DISIAccumulator.
| values = valuesProducer.getSortedNumeric(field); | ||
| final int bitsPerOrd = maxOrd >= 0 ? PackedInts.bitsRequired(maxOrd - 1) : -1; | ||
| if (enableOptimizedMerge && numDocsWithValue < maxDoc) { | ||
| // TODO: which IOContext should be used here? |
There was a problem hiding this comment.
I think we should use MERGE for this IOContext?
There was a problem hiding this comment.
I don't see a MERGE constant for io context. But I did the following instead: 45308e4
I think this way we always get most appropriate io context? In case of merge it will be an io context for merging?
|
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
| this.denseRankPower = denseRankPower; | ||
|
|
||
| this.origo = disiTempOutput.getFilePointer(); // All jumps are relative to the origo | ||
| if ((denseRankPower < 7 || denseRankPower > 15) && denseRankPower != -1) { |
There was a problem hiding this comment.
nit: Move this check above dir.createTempOutput to avoid leaks if the check fails.
|
Recent run on top of latest commit of this pr: |
Build jump table (disi) while iterating over SortedNumericDocValues for encoding the values, instead of separately iterating over SortedNumericDocValues just to build the jump table. In case when indexing sorting is active, this requires an additional merge sort. Follow up from elastic#125403
💚 Backport successful
|
Build jump table (disi) while iterating over SortedNumericDocValues for encoding the values, instead of separately iterating over SortedNumericDocValues just to build the jump table. In case when indexing sorting is active, this requires an additional merge sort. Follow up from #125403
Build jump table (disi) while iterating over SortedNumericDocValues for encoding the values, instead of separately iterating over SortedNumericDocValues just to build the jump table.
In case when indexing sorting is active, this requires an additional sorting of segments while merging.
Follow up from #125403