Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix parseToken array popping #13620

Merged

Conversation

naomichi-y
Copy link
Contributor

@naomichi-y naomichi-y commented May 10, 2024

Description

Fixed an issue in the flat_object where array field names were incorrectly removed after processing the first object, leading to inaccurate key-paths in flattened JSON data.

Related Issues

#13351

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

❌ Gradle check result for 51d6d10: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 3862517: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@naomichi-y
Copy link
Contributor Author

@msfroh
I tried to fix the issue where array fields disappear in flat_object Is this content correct?
#13259 (comment)

Copy link
Contributor

❌ Gradle check result for fe60f0d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for c3adeb4: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 7f7ee10: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

msfroh added 2 commits August 13, 2024 09:42
There is logic in DocumentParser that expects any object parser to start
with currentToken() pointing to START_OBJECT, and return with
currentToken() pointing to END_OBJECT.

My previous commits were stepping over the start/end, and returning on
the token following the end.

Signed-off-by: Michael Froh <[email protected]>
@msfroh msfroh force-pushed the feature/fix-parseToken-array-popping branch from 8b14531 to 02c12c6 Compare August 13, 2024 18:00
@msfroh msfroh requested a review from jed326 as a code owner August 13, 2024 18:00
@msfroh
Copy link
Collaborator

msfroh commented Aug 13, 2024

Oof... that was complicated. Even debugging the yamlRestTests, the cause of failure wasn't obvious.

It turns out that it was all because I was stepping past the final END_OBJECT token, which DocumentParser doesn't like.

@naomichi-y
Copy link
Contributor Author

@msfroh
Sorry for the late response. Thank you for your help!

Copy link
Contributor

❌ Gradle check result for 02c12c6: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 02c12c6: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@msfroh
Copy link
Collaborator

msfroh commented Aug 13, 2024

java.lang.AssertionError
	at __randomizedtesting.SeedInfo.seed([D25DBB2E808359FD:17E94777B0CD7A53]:0)
	at org.opensearch.common.xcontent.JsonToStringXContentParser.parseObject(JsonToStringXContentParser.java:69)
	at org.opensearch.index.mapper.FlatObjectFieldMapper.parseCreateField(FlatObjectFieldMapper.java:582)
	at org.opensearch.index.mapper.FieldMapper.parse(FieldMapper.java:269)
	at org.opensearch.index.mapper.DocumentParser.parseObjectOrField(DocumentParser.java:527)
	at org.opensearch.index.mapper.DocumentParser.parseNullValue(DocumentParser.java:725)
	at org.opensearch.index.mapper.DocumentParser.innerParseObject(DocumentParser.java:451)
	at org.opensearch.index.mapper.DocumentParser.parseObjectOrNested(DocumentParser.java:419)
	at org.opensearch.index.mapper.DocumentParser.internalParseDocument(DocumentParser.java:138)
	at org.opensearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:93)
	at org.opensearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:251)

Hmm... apparently I missed the null case.

The existing behavior throws an exception on null flat object by
advancing the parser beyond the end of the flat object input.

We could simply skip a null flat_object field, but this would be
a behavioral change from 2.7.

Signed-off-by: Michael Froh <[email protected]>
Copy link
Contributor

✅ Gradle check result for cac0d94: SUCCESS

Copy link

codecov bot commented Aug 13, 2024

Codecov Report

Attention: Patch coverage is 90.90909% with 4 lines in your changes missing coverage. Please review.

Project coverage is 71.85%. Comparing base (ea1ab7c) to head (cac0d94).
Report is 2 commits behind head on main.

Files Patch % Lines
...ch/common/xcontent/JsonToStringXContentParser.java 90.00% 1 Missing and 3 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #13620      +/-   ##
============================================
- Coverage     71.90%   71.85%   -0.06%     
+ Complexity    63045    63029      -16     
============================================
  Files          5197     5197              
  Lines        295313   295309       -4     
  Branches      42677    42675       -2     
============================================
- Hits         212341   212189     -152     
- Misses        65558    65748     +190     
+ Partials      17414    17372      -42     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@reta reta merged commit 6e27e5a into opensearch-project:main Aug 13, 2024
34 checks passed
@reta reta added the bug Something isn't working label Aug 13, 2024
opensearch-trigger-bot bot pushed a commit that referenced this pull request Aug 13, 2024
* Fix parseToken array popping

Signed-off-by: naomichi-y <[email protected]>

* Add fix description

Signed-off-by: naomichi-y <[email protected]>

* Refactor JsonToStringXContentParser#parseToken

This method was pretty complicated, hard to follow, and therefore prone
to bugs.

This change introduces a proper stack (rather than a StringBuilder) to
keep track of fields under fields. When ever we parse a field name, we
push it onto the stack, recursively parse its value (an object, an
array, or a primitive value), then we pop (guaranteeing pushes and pops
are balanced).

Signed-off-by: Michael Froh <[email protected]>

* Remove use of org.apache.logging.log4j.util.Strings

Signed-off-by: Michael Froh <[email protected]>

* Make sure JsonToStringXContentParser ends on END_OBJECT

There is logic in DocumentParser that expects any object parser to start
with currentToken() pointing to START_OBJECT, and return with
currentToken() pointing to END_OBJECT.

My previous commits were stepping over the start/end, and returning on
the token following the end.

Signed-off-by: Michael Froh <[email protected]>

* Throw exception when flat object is null

The existing behavior throws an exception on null flat object by
advancing the parser beyond the end of the flat object input.

We could simply skip a null flat_object field, but this would be
a behavioral change from 2.7.

Signed-off-by: Michael Froh <[email protected]>

---------

Signed-off-by: naomichi-y <[email protected]>
Signed-off-by: Michael Froh <[email protected]>
Co-authored-by: Michael Froh <[email protected]>
(cherry picked from commit 6e27e5a)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
reta pushed a commit that referenced this pull request Aug 13, 2024
* Fix parseToken array popping



* Add fix description



* Refactor JsonToStringXContentParser#parseToken

This method was pretty complicated, hard to follow, and therefore prone
to bugs.

This change introduces a proper stack (rather than a StringBuilder) to
keep track of fields under fields. When ever we parse a field name, we
push it onto the stack, recursively parse its value (an object, an
array, or a primitive value), then we pop (guaranteeing pushes and pops
are balanced).



* Remove use of org.apache.logging.log4j.util.Strings



* Make sure JsonToStringXContentParser ends on END_OBJECT

There is logic in DocumentParser that expects any object parser to start
with currentToken() pointing to START_OBJECT, and return with
currentToken() pointing to END_OBJECT.

My previous commits were stepping over the start/end, and returning on
the token following the end.



* Throw exception when flat object is null

The existing behavior throws an exception on null flat object by
advancing the parser beyond the end of the flat object input.

We could simply skip a null flat_object field, but this would be
a behavioral change from 2.7.



---------




(cherry picked from commit 6e27e5a)

Signed-off-by: naomichi-y <[email protected]>
Signed-off-by: Michael Froh <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Michael Froh <[email protected]>
@nicklasmoeller
Copy link

Hi - not sure if this is the place to ask, but can't find anything in the release docs.

When can this be expected to be tagged in a release? We're running into this on AWS OpenSearch

@dblock
Copy link
Member

dblock commented Aug 21, 2024

Hi - not sure if this is the place to ask, but can't find anything in the release docs.

When can this be expected to be tagged in a release? We're running into this on AWS OpenSearch

This was backported to 2.x in #15235, so it will be in the next open source 2.17 release (https://opensearch.org/releases.html).

For Amazon OpenSearch this is really a question for Amazon support. The service deploys every other release, see https://docs.aws.amazon.com/opensearch-service/latest/developerguide/release-notes.html and you can see the past dates to get an idea of when you can expect something (2.9: October 2, 2023, 2.11: November 17, 2023, 2.13: May 21, 2024, 2.15 would be next).

wdongyu pushed a commit to wdongyu/OpenSearch that referenced this pull request Aug 22, 2024
* Fix parseToken array popping

Signed-off-by: naomichi-y <[email protected]>

* Add fix description

Signed-off-by: naomichi-y <[email protected]>

* Refactor JsonToStringXContentParser#parseToken

This method was pretty complicated, hard to follow, and therefore prone
to bugs.

This change introduces a proper stack (rather than a StringBuilder) to
keep track of fields under fields. When ever we parse a field name, we
push it onto the stack, recursively parse its value (an object, an
array, or a primitive value), then we pop (guaranteeing pushes and pops
are balanced).

Signed-off-by: Michael Froh <[email protected]>

* Remove use of org.apache.logging.log4j.util.Strings

Signed-off-by: Michael Froh <[email protected]>

* Make sure JsonToStringXContentParser ends on END_OBJECT

There is logic in DocumentParser that expects any object parser to start
with currentToken() pointing to START_OBJECT, and return with
currentToken() pointing to END_OBJECT.

My previous commits were stepping over the start/end, and returning on
the token following the end.

Signed-off-by: Michael Froh <[email protected]>

* Throw exception when flat object is null

The existing behavior throws an exception on null flat object by
advancing the parser beyond the end of the flat object input.

We could simply skip a null flat_object field, but this would be
a behavioral change from 2.7.

Signed-off-by: Michael Froh <[email protected]>

---------

Signed-off-by: naomichi-y <[email protected]>
Signed-off-by: Michael Froh <[email protected]>
Co-authored-by: Michael Froh <[email protected]>
akolarkunnu pushed a commit to akolarkunnu/OpenSearch that referenced this pull request Sep 10, 2024
* Fix parseToken array popping

Signed-off-by: naomichi-y <[email protected]>

* Add fix description

Signed-off-by: naomichi-y <[email protected]>

* Refactor JsonToStringXContentParser#parseToken

This method was pretty complicated, hard to follow, and therefore prone
to bugs.

This change introduces a proper stack (rather than a StringBuilder) to
keep track of fields under fields. When ever we parse a field name, we
push it onto the stack, recursively parse its value (an object, an
array, or a primitive value), then we pop (guaranteeing pushes and pops
are balanced).

Signed-off-by: Michael Froh <[email protected]>

* Remove use of org.apache.logging.log4j.util.Strings

Signed-off-by: Michael Froh <[email protected]>

* Make sure JsonToStringXContentParser ends on END_OBJECT

There is logic in DocumentParser that expects any object parser to start
with currentToken() pointing to START_OBJECT, and return with
currentToken() pointing to END_OBJECT.

My previous commits were stepping over the start/end, and returning on
the token following the end.

Signed-off-by: Michael Froh <[email protected]>

* Throw exception when flat object is null

The existing behavior throws an exception on null flat object by
advancing the parser beyond the end of the flat object input.

We could simply skip a null flat_object field, but this would be
a behavioral change from 2.7.

Signed-off-by: Michael Froh <[email protected]>

---------

Signed-off-by: naomichi-y <[email protected]>
Signed-off-by: Michael Froh <[email protected]>
Co-authored-by: Michael Froh <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch bug Something isn't working v2.17.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants