Validate individual offset values in BULK_OFFSETS bounds checks#144643
Validate individual offset values in BULK_OFFSETS bounds checks#144643ChrisHegarty merged 18 commits intoelastic:mainfrom
Conversation
|
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
benwtrent
left a comment
There was a problem hiding this comment.
This would have been a very sneaky bug to track down!
I think @thecoop already noticed and fixed that? |
ldematte
left a comment
There was a problem hiding this comment.
I like the change, but I wonder if we should make that a "assert-like" check, running only with assertions enabled (e.g. in tests)
It's still broken in main, but I did not try to fix it here. Just test and avoid it for noe.
I dunno. Maybe. I am worried about dereferencing the wrong memory, so checks are super important. The existing checks are always present, so I just followed the same pattern - tho this is expanding somewhat. I don't think that it will have any real noticeable affect on performance. However we could check afterwards and move these to asserts is needed? |
|
I was already not 100% sure about the existing checks to be honest :)
Well, that ship has sailed the day we decided to go with native code I think :D It's true that in this case the check is meaningful though. |
|
Do we have info on what performance effect this has? It's changing the check from O(1) to some kind of O(n), so it's going to have some effect. |
|
Could we have a more in-depth check with assertions, and a O(1) top-level sanity check for production? All the code paths should be covered during tests, so there should be no need to run the full checks in production...right? |
That could be a good middle ground; my suggestion was along the same lines, but a bit more radical -- assertions should be enough as tests should cover us, and we should have validation somewhere so that it's not possible to generate invalid data (e.g. ordinals that are negative or > the num of vectors). |
|
The O(count) per-offset bounds checks in checkBulkOffsets and checkBBQBulkOffsets are moved into separate validateBulkOffsets / validateBBQBulkOffsets methods, called via assert so they have zero cost in production. The validate methods also add alignment checks on the offsets/result segments and non-negative/positive guards on count, length, and pitch. |
libs/native/src/main/java/org/elasticsearch/nativeaccess/jdk/JdkVectorLibrary.java
Show resolved
Hide resolved
…o validate-bulk-offset-values
ldematte
left a comment
There was a problem hiding this comment.
LGTM!
But what about Int4? Is that fixed already, or do we want to address it here, or in a separate PR?
I already added an int4 unit test in this PR, so it should be covered and verified. Is there something else that I'm missing? |
I don't know.. probably not, I was referring to the note in the PR description. If that's solved, probably you just need to update the description. |
Oh yeah. That's a separate issue. If we want to do it at all. |
* upstream/main: (146 commits) Revert "[Native] Gradle-related tweaks to improve handling of the simdvec native library (elastic#144539)" Fix ArrayIndexOutOfBoundsException in fetch phase with partial results (elastic#144385) ESQL: Correctly manage NULL data type for SUM (elastic#144942) [ESQL] Fixes GroupedTopNBenchmark not executing (elastic#144944) Fix reader context leak when query response serialization fails (elastic#144708) Validate individual offset values in BULK_OFFSETS bounds checks (elastic#144643) Merge main21 source set into main in simdvec (elastic#144921) [TEST] Unmute TsidExtractingIdFieldMapperTests (elastic#144848) [Native] Gradle-related tweaks to improve handling of the simdvec native library (elastic#144539) Fix `ThreadedActionListenerTests#testRejectionHandling` (elastic#144795) Add new DLM Frozen Tier Transition execution plugin and service (elastic#144595) Prometheus: execute query_range via parsed EsqlStatement plan (elastic#144416) Investigate `testBulkIndexingRequestSplitting` failure (elastic#144766) Add test utility for wrapping directories in FilterDirectory layer (elastic#143563) Fix ES|QL decay tests with negative scale (elastic#144657) Fix circuit breaker leak in percolator query construction (elastic#144827) Use XPerFieldDocValuesFormat in AbstractTSDBSyntheticIdCodec (elastic#144744) [DOCS] Document how reindex work in CPS (elastic#144016) Fix Int4 vector library tests failing on Java 21 (elastic#144830) [DiskBBQ] Fix index sorting on flush (elastic#144938) ...
…tic#144643) While working on bulk sparse scoring (elastic#144557), I noticed that checkBulkOffsets and checkBBQBulkOffsets validated segment sizes but not individual offset values. An out-of-range or negative offset would silently read memory beyond the data segment, risking a crash or silently wrong results. The solution is to replace the sequential size check with per-offset validation that checks each offset points to a valid vector within the data segment. The O(count) loop should be negligible relative to the O(count * dims) native call, but we've made the checks conditional on asserts to avoid any potential negative cost of this, and asserts should be good enough given our testing. Note: INT4 skips size=2 (packedLen=1) because checkBulkOffsets computes rowBytes = packedLen * 4 / 8 which truncates to 0 via integer division, making the bounds check trivially pass. This is a pre-existing issue with how INT4 passes packed byte length (not element count) as the length parameter to the generic check formula. We can address this separately, if needed.
…tic#144643) While working on bulk sparse scoring (elastic#144557), I noticed that checkBulkOffsets and checkBBQBulkOffsets validated segment sizes but not individual offset values. An out-of-range or negative offset would silently read memory beyond the data segment, risking a crash or silently wrong results. The solution is to replace the sequential size check with per-offset validation that checks each offset points to a valid vector within the data segment. The O(count) loop should be negligible relative to the O(count * dims) native call, but we've made the checks conditional on asserts to avoid any potential negative cost of this, and asserts should be good enough given our testing. Note: INT4 skips size=2 (packedLen=1) because checkBulkOffsets computes rowBytes = packedLen * 4 / 8 which truncates to 0 via integer division, making the bounds check trivially pass. This is a pre-existing issue with how INT4 passes packed byte length (not element count) as the length parameter to the generic check formula. We can address this separately, if needed.
…tic#144643) While working on bulk sparse scoring (elastic#144557), I noticed that checkBulkOffsets and checkBBQBulkOffsets validated segment sizes but not individual offset values. An out-of-range or negative offset would silently read memory beyond the data segment, risking a crash or silently wrong results. The solution is to replace the sequential size check with per-offset validation that checks each offset points to a valid vector within the data segment. The O(count) loop should be negligible relative to the O(count * dims) native call, but we've made the checks conditional on asserts to avoid any potential negative cost of this, and asserts should be good enough given our testing. Note: INT4 skips size=2 (packedLen=1) because checkBulkOffsets computes rowBytes = packedLen * 4 / 8 which truncates to 0 via integer division, making the bounds check trivially pass. This is a pre-existing issue with how INT4 passes packed byte length (not element count) as the length parameter to the generic check formula. We can address this separately, if needed.
While working on bulk sparse scoring (#144557), I noticed that
checkBulkOffsetsandcheckBBQBulkOffsetsvalidated segment sizes but not individual offset values. An out-of-range or negative offset would silently read memory beyond the data segment, risking a crash or silently wrong results.The solution is to replace the sequential size check with per-offset validation that checks each offset points to a valid vector within the data segment. The O(count) loop should be negligible relative to the O(count * dims) native call, but we've made the checks conditional on asserts to avoid any potential negative cost of this, and asserts should be good enough given our testing.
Note: INT4 skips
size=2(packedLen=1) becausecheckBulkOffsetscomputesrowBytes = packedLen * 4 / 8which truncates to 0 via integer division, making the bounds check trivially pass. This is a pre-existing issue with how INT4 passes packed byte length (not element count) as thelengthparameter to the generic check formula. We can address this separately, if needed.