Skip to content

Conversation

@amogh-jahagirdar
Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar commented Jan 22, 2023

Co-authored-by: liliwei [email protected]
Co-authored-by: xuwei [email protected]
Co-authored-by: chidayong [email protected]

CC: @jackye1995 @hililiwei @flyrain

@github-actions github-actions bot added the spark label Jan 22, 2023
Comment on lines 76 to 77
| ALTER TABLE multipartIdentifier CREATE BRANCH identifier (AS OF VERSION snapshotId)? (RETAIN snapshotRefRetain snapshotRefRetainTimeUnit)? (snapshotRetentionClause)? #createBranch
| ALTER TABLE multipartIdentifier REPLACE BRANCH identifier (AS OF VERSION snapshotId)? (RETAIN snapshotRefRetain snapshotRefRetainTimeUnit)? (snapshotRetentionClause)? #replaceBranch
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After writing this out, starting to change my mind :) maybe we combine {CREATE | REPLACE} in the same definition if that's possible?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, I think in this case there are a few arguments for combining it:

  1. syntax is exactly the same, and adding replace does not complicate the logic too much
  2. CREATE/REPLACE table follows a similar pattern
  3. At API level, we also have similar thing like createOrReplaceTableTransaction

@amogh-jahagirdar amogh-jahagirdar force-pushed the replace-branch branch 4 times, most recently from 71206d9 to 245cf1b Compare January 29, 2023 09:06
return Nil
}

manageSnapshots.createBranch(branch, snapshotId)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like we are implicitly letting the API throw exception for branch already exists. It's fine to me, but would like to know what other people think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, it's implicits since the API provides those guarantees. We could also throw our own before calling createBranch but, I'll let @hililiwei @yyanyy @rdblue provide their thoughts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good to me

Copy link
Contributor

@jackye1995 jackye1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good to me, just 1 doubt regarding create existing branch. @hililiwei @yyanyy @rdblue any thoughts?

@jackye1995 jackye1995 requested review from flyrain and hililiwei and removed request for hililiwei February 2, 2023 05:52
Copy link
Contributor

@flyrain flyrain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for working on it.
Not related to this PR directly, neither a blocker. Do we need to delete a branch? I didn't go through all design details, but I assume this is how a branch is deleted now. We relies on the retain to keep the branch. A branch will be clean up by procedure expireSnapshots once it expired. Could you confirm?

Co-authored-by: liliwei [email protected]
Co-authored-by: xuwei [email protected]
Co-authored-by: chidayong [email protected]
@amogh-jahagirdar
Copy link
Contributor Author

amogh-jahagirdar commented Feb 2, 2023

Thanks for the review @flyrain really appreciate it!

So there are a few operations:

1.) replaceBranch: Replace branch will change the snapshot that the branch points to. Nothing will be removed here. The same principle applies for replaceTag.

2.) The removeBranch API (DROP BRANCH in SQL): This will remove the branch from the references in metadata. When this is committed it's an immediate removal of the reference.

Correct that a branch will be cleaned up during snapshot expiration if it is past the reference retention age maxRefAgeMs in the code. The updated expiration algorithm which considers branching and tagging can be found here https://iceberg.apache.org/spec/#snapshot-retention-policy

I'm also working on a PR for a dedicated branching/tagging doc page which has some details there. #6723 appreciate any feedback on this as well if anything is unclear!

@jackye1995
Copy link
Contributor

Looks like we have enough votes and all comments are addressed. I will go ahead to merge this, and we can address further comments in subsequent PRs like #6637

Thanks @amogh-jahagirdar and @hililiwei for the work, thanks @yyanyy and @flyrain for the review!

@jackye1995 jackye1995 merged commit 8b6c777 into apache:master Feb 2, 2023
@amogh-jahagirdar
Copy link
Contributor Author

Thanks for the reviews @flyrain @jackye1995 @yyanyy @hililiwei!

krvikash pushed a commit to krvikash/iceberg that referenced this pull request Mar 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants