Skip to content

Conversation

@zsxwing
Copy link
Member

@zsxwing zsxwing commented Feb 24, 2020

What changes were proposed in this pull request?

Set FAIL_ON_UNKNOWN_PROPERTIES to false in JsonProtocol to allow ignore unknown fields in a Spark event. After this change, if we add new fields to a Spark event parsed by ObjectMapper, the event json string generated by a new Spark version can still be read by an old Spark History Server.

Since Spark History Server is an extra service, it usually takes time to upgrade, and it's possible that a Spark application is upgraded before SHS. Forwards-compatibility will allow an old SHS to support new Spark applications (may lose some new features but most of functions should still work).

Why are the changes needed?

JsonProtocol is supposed to provide strong backwards-compatibility and forwards-compatibility guarantees: any version of Spark should be able to read JSON output written by any other version, including newer versions.

However, the forwards-compatibility guarantee is broken for events parsed by ObjectMapper. If a new field is added to an event parsed by ObjectMapper (e.g., 6dc5921#diff-dc5c7a41fbb7479cef48b67eb41ad254R33), the event json string generated by a new Spark version cannot be parsed by an old version of SHS right now.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

The new added tests.

@zsxwing zsxwing requested a review from cloud-fan February 24, 2020 06:58
@zsxwing zsxwing changed the title [SPARK-30936][Core]Enable FAIL_ON_UNKNOWN_PROPERTIES by default to parse Spark events [SPARK-30936][Core]Set FAIL_ON_UNKNOWN_PROPERTIES to false by default to parse Spark events Feb 24, 2020
@SparkQA
Copy link

SparkQA commented Feb 24, 2020

Test build #118851 has finished for PR 27680 at commit 4172676.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Feb 24, 2020

Test build #118859 has finished for PR 27680 at commit 4172676.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Feb 24, 2020

Test build #118864 has finished for PR 27680 at commit 4172676.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Feb 24, 2020

Test build #118875 has finished for PR 27680 at commit 4172676.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

}

testQuietly("ReplayListenerBus should ignore broken event jsons generated in 2.0.0") {
testQuietly("ReplayListenerBus should ignore broken event jsons generated in 2_0_0") {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed test names to remove . so that we can get a better test report.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes .. this is sad. It's actually a bug in SBT, see #25630.

@SparkQA
Copy link

SparkQA commented Feb 24, 2020

Test build #118882 has finished for PR 27680 at commit 7938701.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 24, 2020

Test build #118883 has finished for PR 27680 at commit 9b18998.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master/3.0!

@cloud-fan cloud-fan closed this in 3126557 Feb 25, 2020
cloud-fan pushed a commit that referenced this pull request Feb 25, 2020
…t to parse Spark events

### What changes were proposed in this pull request?

Set `FAIL_ON_UNKNOWN_PROPERTIES` to `false` in `JsonProtocol` to allow ignore unknown fields in a Spark event. After this change, if we add new fields to a Spark event parsed by `ObjectMapper`, the event json string generated by a new Spark version can still be read by an old Spark History Server.

Since Spark History Server is an extra service, it usually takes time to upgrade, and it's possible that a Spark application is upgraded before SHS. Forwards-compatibility will allow an old SHS to support new Spark applications (may lose some new features but most of functions should still work).

### Why are the changes needed?

`JsonProtocol` is supposed to provide strong backwards-compatibility and forwards-compatibility guarantees: any version of Spark should be able to read JSON output written by any other version, including newer versions.

However, the forwards-compatibility guarantee is broken for events parsed by `ObjectMapper`. If a new field is added to an event parsed by `ObjectMapper` (e.g., 6dc5921#diff-dc5c7a41fbb7479cef48b67eb41ad254R33), the event json string generated by a new Spark version cannot be parsed by an old version of SHS right now.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

The new added tests.

Closes #27680 from zsxwing/SPARK-30936.

Authored-by: Shixiong Zhu <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 3126557)
Signed-off-by: Wenchen Fan <[email protected]>
@zsxwing zsxwing deleted the SPARK-30936 branch February 25, 2020 04:52
sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
…t to parse Spark events

### What changes were proposed in this pull request?

Set `FAIL_ON_UNKNOWN_PROPERTIES` to `false` in `JsonProtocol` to allow ignore unknown fields in a Spark event. After this change, if we add new fields to a Spark event parsed by `ObjectMapper`, the event json string generated by a new Spark version can still be read by an old Spark History Server.

Since Spark History Server is an extra service, it usually takes time to upgrade, and it's possible that a Spark application is upgraded before SHS. Forwards-compatibility will allow an old SHS to support new Spark applications (may lose some new features but most of functions should still work).

### Why are the changes needed?

`JsonProtocol` is supposed to provide strong backwards-compatibility and forwards-compatibility guarantees: any version of Spark should be able to read JSON output written by any other version, including newer versions.

However, the forwards-compatibility guarantee is broken for events parsed by `ObjectMapper`. If a new field is added to an event parsed by `ObjectMapper` (e.g., apache@6dc5921#diff-dc5c7a41fbb7479cef48b67eb41ad254R33), the event json string generated by a new Spark version cannot be parsed by an old version of SHS right now.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

The new added tests.

Closes apache#27680 from zsxwing/SPARK-30936.

Authored-by: Shixiong Zhu <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants