Skip to content

Conversation

@zhangbutao
Copy link
Contributor

@zhangbutao zhangbutao commented Apr 11, 2023

What changes were proposed in this pull request?

This PR refers to spark-sql about iceberg branch ddl implementation apache/iceberg#6617
If someone has different opinions about the sql syntax of branch, we can discuss here.

Why are the changes needed?

Personally, branch is more useful than snapshot in iceberg, and it is more friendly to users. We can use branch do lots of meaningfull work.

Does this PR introduce any user-facing change?

Added a new sql syntax and hive users can create iceberg branch using the sql.

ALTER TABLE tableName
{CREATE BRANCH branchName [FOR SYSTEM_VERSION AS OF {snapshotId} | FOR SYSTEM_TIME AS OF {timestamp}]
[RETAIN interval {DAYS | HOURS | MINUTES}]
[WITH SNAPSHOT RETENTION {[num_snapshots SNAPSHOTS] [interval {DAYS | HOURS | MINUTES}]}]}]

How was this patch tested?

UT

@aturoczy
Copy link

aturoczy commented Apr 19, 2023

uhhhhh! This is super cool idea! I want it! Actually with this it is possible to play on dataset without any risk. I love it! I want it!
@deniskuzZ @ayushtkn @simhadri-g Could you please make this review a priority?

Documents about it:
https://docs.google.com/document/d/1tbATFPrKF3vNlzkgZQdaW8CAJmbjvryfrlg6C2Ci_aA/edit#heading=h.v8gsu2fe19q2

This is my new favorite PR (After the Docker support)

@ayushtkn
Copy link
Member

ayushtkn commented Apr 19, 2023

Didn't check the code, But it was something I was also reading coincidently 2-3 hours before only :)

The main thing to chase is not creating a branch. But to insert into those branches. @zhangbutao We should chase that once we have the create in. Will review this in a day or two along with Denys. :)

Just FYI. I think Iceberg has options to create Tags as well now, and an option to do cherry-pick to branches as well. They are gonna release it for spark in 1.12.0

Good to have stuff!!!

Was reading here:
https://www.dremio.com/blog/exploring-branch-tags-in-apache-iceberg-using-spark/

@aturoczy
Copy link

Tags!!!! 😍

@aturoczy
Copy link

image

@zhangbutao
Copy link
Contributor Author

zhangbutao commented Apr 20, 2023

@TuroczyX @ayushtkn yes, branch and tag are important feats in iceberg 1.2.0. I have created a umbrella ticket https://issues.apache.org/jira/browse/HIVE-27233 to track the new feats about branch&tag.

The main thing to chase is not creating a branch. But to insert into those branches.

I absolutely agree. I am exploring how to achieve this in hive. Next maybe we shoud consider upgrade iceberg to latest version 1.2.1. In addition, we can refer to this PR apache/iceberg#6965 to implement the sql syntax for insert into branch

@zhangbutao
Copy link
Contributor Author

#4252 Upgrade iceberg to 1.2.1 in order to better integrate branch&tag features.

@aturoczy
Copy link

yes, let's upgrade the iceberg version

@zhangbutao
Copy link
Contributor Author

Git rebase to fix code conflicts.

@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug C 2 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 11 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

@zhangbutao
Copy link
Contributor Author

@deniskuzZ @ayushtkn Can we merge this PR first? Thanks.

@deniskuzZ deniskuzZ merged commit e2b36e4 into apache:master May 15, 2023
yeahyung pushed a commit to yeahyung/hive that referenced this pull request Jul 20, 2023
…eviewed by Attila Turoczy, Ayush Saxena, Denys Kuzmenko)

Closes apache#4216
tarak271 pushed a commit to tarak271/hive-1 that referenced this pull request Dec 19, 2023
…eviewed by Attila Turoczy, Ayush Saxena, Denys Kuzmenko)

Closes apache#4216
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants