Skip to content

[9.1] fix(mapper): handle VOID entries in _ignored_source with FLS (#141506)#144765

Closed
salvatore-campagna wants to merge 1 commit intoelastic:9.1from
salvatore-campagna:fix/142341-fls-copy-to-synthetic-source
Closed

[9.1] fix(mapper): handle VOID entries in _ignored_source with FLS (#141506)#144765
salvatore-campagna wants to merge 1 commit intoelastic:9.1from
salvatore-campagna:fix/142341-fls-copy-to-synthetic-source

Conversation

@salvatore-campagna
Copy link
Copy Markdown
Contributor

Backport Note

This is a backport of #141506 to 9.1. Closes #142341.

On 9.1, FieldSubsetReader.java requires an explicit null check for the return value of decodeAsMap(). On main, this class was refactored with an IgnoredSourceFormat parameter that handles nulls through a different code path, so the null check is not needed there.

Also unmutes the test that was muted in #142341.


Problem

When Field Level Security (FLS) is active on an index using synthetic source, any field with a copy_to target causes search requests to fail with:

[-1:12] Unexpected Break (0xFF) token in definite length (-1) Object at [Source: (byte[])[12 bytes]; byte offset: #12]

The failure path is:

  1. During indexing, copy_to target fields are recorded in _ignored_source as VOID entries (XContentDataHelper.voidValue()): markers indicating the field exists but its data is stored elsewhere (in the copy_to source field).
  2. When a query runs with FLS enabled, FieldSubsetReader iterates over _ignored_source stored fields and calls IgnoredSourceFieldMapper.decodeAsMap for each entry to decide which fields to keep or strip based on FLS access permissions.
  3. decodeAsMap builds a CBOR object containing the field name but writes no value for VOID entries (XContentDataHelper.decodeAndWrite is a no-op). When XContentHelper.convertToMap tries to parse this malformed CBOR, the Jackson CBOR parser encounters an unexpected Break token (0xFF) and throws.

Fix

The core fix is in decodeAsMap(). It now checks nameValue.hasValue() before attempting to parse, returning null for VOID entries instead of building malformed CBOR.

The null return propagates to FieldSubsetReader, which has an explicit null check and drops the entry. VOID entries carry no field data, so omitting them is correct.

Tests

  • testDecodeAsMapReturnsNullForVoidEntry: unit test verifying decodeAsMap returns null for a VOID entry.
  • testSyntheticSourceWithCopyToAndFLS: integration test exercising FieldSubsetReader with copy_to, synthetic source, and FLS.
  • YAML REST tests: two end-to-end scenarios: one with API key FLS + synthetic source + copy_to, and one covering the skip_ignored_source_read workaround.
./gradlew :server:test --tests "org.elasticsearch.index.mapper.IgnoredSourceFieldMapperTests.testDecodeAsMapReturnsNullForVoidEntry"
./gradlew :x-pack:plugin:core:test --tests "org.elasticsearch.xpack.core.security.authz.accesscontrol.FieldSubsetReaderTests.testSyntheticSourceWithCopyToAndFLS"
./gradlew :x-pack:plugin:yamlRestTest -Dtests.method="test {p0=security/authz_api_keys/30_field_level_security_synthetic_source/*copy_to*}"

…#141506)

Backport of elastic#141506 to 9.1. When Field Level Security is active on an
index using synthetic source, fields with copy_to targets are recorded
in _ignored_source as VOID entries. FieldSubsetReader attempted to parse
these entries as CBOR, causing search failures with an unexpected Break
token (0xFF).

The fix checks hasValue() in decodeAsMap before parsing and returns null
for VOID entries. FieldSubsetReader now skips null entries gracefully.

Also unmutes the test from elastic#142341.
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@salvatore-campagna
Copy link
Copy Markdown
Contributor Author

salvatore-campagna commented Apr 5, 2026

The yamlRestCompatTest suite on the 9.1 branch is broken due to fixes that were merged to main and backported to 9.3/9.2/8.19 but skipped for 9.1. The YAML tests from those fixes still get picked up by the test suite, so they fail for every PR targeting 9.1.

I tried fixing this incrementally:

  1. fix: dynamic template vector array is overridden by automatic dense_vector mapping #143733 (dynamic template: nested object float arrays mapped as dense_vector instead of float). I created backport PR [9.1] fix: dynamic template vector array is overridden by automatic dense_vector mapping (#143733) #145698 for 9.1 and the test failing before is not green but, once that CI run completed, a second unrelated failure surfaced:

  2. Support multi-value dimensions in downsampling (bug fix) #145458 (downsampling crashes on multi-value dimensions). This one (and other downsampling) crashes the node during the Downsample index with multi-value dimensions test, causing 40 cascading Connection refused failures across all downsample tests. Also backported to 8.19, 9.2, 9.3 but not 9.1.

There may be more hidden behind these. Right now no PR targeting 9.1 can get a green rest-compatibility check. It looks like there is a whole set of PRs not backported to 9.1 which stop CI from passing. Should these fixes be bulk-backported to 9.1, or is there another way to get the test suite into a known-good state?

cc @gmarouli @martijnvg

@gmarouli
Copy link
Copy Markdown
Contributor

gmarouli commented Apr 6, 2026

Hi @salvatore-campagna , as far as I know 9.1 is not maintained anymore, there are no releases planned and no branch:9.x available. This is why I did not backport it there.

If we need to backport these fixes to 9.1, we can do it, but what is the motivation for backporting this specific fix to 9.1? I am concerned with backporting only a subset of the fixes.

@salvatore-campagna
Copy link
Copy Markdown
Contributor Author

salvatore-campagna commented Apr 7, 2026

Hi @salvatore-campagna , as far as I know 9.1 is not maintained anymore, there are no releases planned and no branch:9.x available. This is why I did not backport it there.

If we need to backport these fixes to 9.1, we can do it, but what is the motivation for backporting this specific fix to 9.1? I am concerned with backporting only a subset of the fixes.

Because we have/had tests failing in CI: #142341

@gmarouli
Copy link
Copy Markdown
Contributor

gmarouli commented Apr 7, 2026

Because we have/had tests failing in CI: #142341

I thought that 9.1 pipelines wouldn't be running anymore. @elastic/es-delivery is fixing this in 9.1 still relevant, I see the test failure is from February so maybe we just close it now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants