Spark 3.x: Support rewrite data files with starting sequence number #4701

rajarshisarkar · 2022-05-05T08:46:01Z

I noticed that use-starting-sequence-number option is missing while developing #4377 and #4579. Adding support for the same in this PR.

When use-starting-sequence-number is enabled then compaction should use the sequence number of the snapshot at compaction start time for new data files, instead of using the sequence number of the newly produced snapshot. This avoids commit conflicts with updates that add newer equality deletes at a higher sequence number.

cc: @RussellSpitzer @jackye1995 @szehon-ho

RussellSpitzer · 2022-05-05T15:28:38Z

Thanks @rajarshisarkar !

Spark 3.1: Support rewrite data files with starting sequence number

6fe3779

github-actions bot added the spark label May 5, 2022

Spark 3.0: Support rewrite data files with starting sequence number

9e4a797

rajarshisarkar force-pushed the rewrite-data-files-with-starting-sequence-number branch from 92d29b8 to 9e4a797 Compare May 5, 2022 12:47

RussellSpitzer approved these changes May 5, 2022

View reviewed changes

RussellSpitzer merged commit bf582eb into apache:master May 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark 3.x: Support rewrite data files with starting sequence number #4701

Spark 3.x: Support rewrite data files with starting sequence number #4701

Uh oh!

rajarshisarkar commented May 5, 2022

Uh oh!

RussellSpitzer commented May 5, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Spark 3.x: Support rewrite data files with starting sequence number #4701

Spark 3.x: Support rewrite data files with starting sequence number #4701

Uh oh!

Conversation

rajarshisarkar commented May 5, 2022

Uh oh!

RussellSpitzer commented May 5, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants