Skip to content

Conversation

@dramaticlly
Copy link
Contributor

@dramaticlly dramaticlly commented Jan 27, 2025

Documentation for #11931 #11555

I took wording and docs with original author of @flyrain and @szehon-ho
Also @manuzhang is interested in this and wondering if @amogh-jahagirdar is ready to take this as part of 1.8

@github-actions github-actions bot added the docs label Jan 27, 2025
@dramaticlly
Copy link
Contributor Author

thank you @flyrain @RussellSpitzer for the promptly review, updated per your suggestion!

Co-authored-by: Russell Spitzer <[email protected]>
Stages a copy of the Iceberg table's metadata files where every absolute path source prefix is replaced by the specified target prefix.
This can be the starting point to fully or incrementally copy an Iceberg table located under an absolute path under a
source prefix to another under the target prefix.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"under an absolute path under a source prefix to another under the target prefix." is not easy to grok.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me simplify the second sentence to This can be the starting point to fully or incrementally copy an Iceberg table under a source prefix to another under the target prefix.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now 'another' seems like 'another Iceberg table'. How about, just:

This can be the starting point to fully or incrementally move or copy an Iceberg table to a new location.

?

source prefix to another under the target prefix.
!!! info
This procedure only prepares metadata or/and data files for an existing Iceberg table for copying or moving to a new location.
Copy link
Member

@manuzhang manuzhang Feb 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Data files here means delete files, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly, to fully replicate the table from one existing location to another, all files referenced by given iceberg table need to move, which include all of metadata and data files. The rewrite table path only rewrite the necessary files which have the path reference

Copy link
Member

@manuzhang manuzhang Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused. Does it copy data files (not delete files) to staging folder?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it does not, this is just first step of three: rewrite, copy and re-register. Rewrite focus on iceberg metadata which have path reference, think about metadata.json, manfiest-list, manifest and position delete files, those all contain a file pointer in its content which require rewrite path for. Copy is the second step, where we can copy the files needed to the target. The copy target is superset of rewritten files, also include the data files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:

`` This procedure only stages rewritten metadata files and prepares a list of files to copy. The actual file copy is not part of this procedure.```

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding copy + move everywhere in the doc is a bit awkward, I think it is not a common case to 'mv' the file as it destroys the original table. Even if so, I think the user will get it and we dont have to make the doc awkward for that?

@dramaticlly
Copy link
Contributor Author

@RussellSpitzer @flyrain @szehon-ho appreciate another look on the docs.

Stages a copy of the Iceberg table's metadata files where every absolute path source prefix is replaced by the specified target prefix.
This can be the starting point to fully or incrementally copy an Iceberg table located under an absolute path under a
source prefix to another under the target prefix.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now 'another' seems like 'another Iceberg table'. How about, just:

This can be the starting point to fully or incrementally move or copy an Iceberg table to a new location.

?

source prefix to another under the target prefix.
!!! info
This procedure only prepares metadata or/and data files for an existing Iceberg table for copying or moving to a new location.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:

`` This procedure only stages rewritten metadata files and prepares a list of files to copy. The actual file copy is not part of this procedure.```

source prefix to another under the target prefix.
!!! info
This procedure only prepares metadata or/and data files for an existing Iceberg table for copying or moving to a new location.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding copy + move everywhere in the doc is a bit awkward, I think it is not a common case to 'mv' the file as it destroys the original table. Even if so, I think the user will get it and we dont have to make the doc awkward for that?

Signed-off-by: Hongyue Zhang <[email protected]>
Copy link
Member

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Some more comments

Signed-off-by: Hongyue Zhang <[email protected]>
@szehon-ho
Copy link
Member

szehon-ho commented Feb 17, 2025

I think it almost looks ready, I just had one last comment, sorry about not putting it correctly before !

I think when I looked last night at for #12115 (comment) , my thought was just to unify the two formats for lists, as you had both

// newline after the title
* Full Rewrite

This is a full rewrite

* Incremetntal Rewrite

This is an incremental rewrite

and

// no newline after the title
* Source Path: The original path...
* Target Path: The path with... 

Sorry I did not consider carefully which one last night, and I think I end up slightly suggesting the former for both. But now after reading the rest of the doc , there is a precedent for the latter. The former may also be a bit confusing as the description is not indented and looks like a first level paragraph. Can we change both lists to be like the latter? (ie, remove the newline, and put the text on the exact same line as the sub-title, putting the colon back between them)

* Full Rewrite:  A full rewrite...
* Incremental Rewrite: An incremental rewrite...

Signed-off-by: Hongyue Zhang <[email protected]>
@dramaticlly
Copy link
Contributor Author

I think it almost looks ready, I just had one last comment, sorry about not putting it correctly before !

I think when I looked last night at for #12115 (comment) , my thought was just to unify the two formats for lists, as you had both

// newline after the title
* Full Rewrite

This is a full rewrite

* Incremetntal Rewrite

This is an incremental rewrite

and

// no newline after the title
* Source Path: The original path...
* Target Path: The path with... 

Sorry I did not consider carefully which one last night, and I think I end up slightly suggesting the former for both. But now after reading the rest of the doc , there is a precedent for the latter. The former may also be a bit confusing as the description is not indented and looks like a first level paragraph. Can we change both lists to be like the latter? (ie, remove the newline, and put the text on the exact same line as the sub-title, putting the colon back between them)

* Full Rewrite:  A full rewrite...
* Incremental Rewrite: An incremental rewrite...

thank you @szehon-ho , updated as suggested. Appreciate your detailed walk though

Copy link
Member

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Noticed a few more things, hope its the last one, thanks for the patience !

Copy link
Member

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks a lot for the patience @dramaticlly !

@szehon-ho szehon-ho merged commit 7e4f0ca into apache:main Feb 19, 2025
2 checks passed
@szehon-ho
Copy link
Member

Merged, thanks again !

ankurbansal-tradedoubler added a commit to ankurbansal-tradedoubler/iceberg that referenced this pull request Feb 19, 2025
* Site: Learn More to point to Spark QuickStart Doc (apache#12272)

* Build: Bump datamodel-code-generator from 0.27.2 to 0.28.1 (apache#12290)

* Spark 3.5: Fix job description of RewriteTablePathSparkAction (apache#12282)

* Build: Bump io.netty:netty-buffer from 4.1.117.Final to 4.1.118.Final (apache#12287)

Bumps [io.netty:netty-buffer](https://github.com/netty/netty) from 4.1.117.Final to 4.1.118.Final.
- [Commits](netty/netty@netty-4.1.117.Final...netty-4.1.118.Final)

---
updated-dependencies:
- dependency-name: io.netty:netty-buffer
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Build: Bump software.amazon.awssdk:bom from 2.30.16 to 2.30.21 (apache#12286)

Bumps software.amazon.awssdk:bom from 2.30.16 to 2.30.21.

---
updated-dependencies:
- dependency-name: software.amazon.awssdk:bom
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* OpenAPI: Add overwrite option when registering a table (apache#12239)

* OpenAPI: Add optional overwrite when registering table

* simplify to overwrite

* Add the article to the description

Co-authored-by: Eduard Tudenhoefner <[email protected]>

* Update generated python as well

Signed-off-by: Hongyue Zhang <[email protected]>

* Fix import order

---------

Signed-off-by: Hongyue Zhang <[email protected]>
Co-authored-by: Eduard Tudenhoefner <[email protected]>

* Build: Bump mkdocs-material from 9.6.3 to 9.6.4 (apache#12284)

Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.6.3 to 9.6.4.
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](squidfunk/mkdocs-material@9.6.3...9.6.4)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Core: Fix Enabling row-lineage during Create Table (apache#12307)

* API: Reject unknown type for required fields and validate defaults (apache#12302)

* API: Fix TestInclusiveMetricsEvaluator notStartsWith tests. (apache#12303)

* Core: Add variant type support to utils and visitors (apache#11831)

* Core: Fix CI: Update tests with UnknownType from required to optional (apache#12316)

* Docs: Refactor site navigation bar (apache#12289)

* Parquet: Implement Variant readers (apache#12139)

* Docs: Add rewrite_table_path Spark Procedure (apache#12115)

* Parquet: Fix errorprone warning (apache#12324)

* Docs: Add Apache Amoro docs (apache#11966)

* Parquet: Fix performance regression in reader init (apache#12305)

* Core: Fallback to GET requests for namespace/table/view exists checks (apache#12314)

Co-authored-by: Daniel Weeks <[email protected]>

* Docs: Fix refs in Apache Amoro docs (apache#12332)

* Revert "Core: Serialize `null` when there is no current snapshot (apache#11560)" (apache#12312)

This reverts commit bf8d25f.

* Parquet: Fix performance regression in reader init (apache#12305) (apache#12329)

Co-authored-by: Bryan Keller <[email protected]>

* Checkstyle: Apply the same generic type naming rules to interfaces and classes (apache#12333)

---------

Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Hongyue Zhang <[email protected]>
Co-authored-by: Danica Fine <[email protected]>
Co-authored-by: Manu Zhang <[email protected]>
Co-authored-by: Yuya Ebihara <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Hongyue/Steve Zhang <[email protected]>
Co-authored-by: Eduard Tudenhoefner <[email protected]>
Co-authored-by: Tom Tanaka <[email protected]>
Co-authored-by: Ryan Blue <[email protected]>
Co-authored-by: Aihua Xu <[email protected]>
Co-authored-by: Fokko Driesprong <[email protected]>
Co-authored-by: ConradJam <[email protected]>
Co-authored-by: Bryan Keller <[email protected]>
Co-authored-by: Daniel Weeks <[email protected]>
Co-authored-by: pvary <[email protected]>
@dramaticlly dramaticlly deleted the rewrite-doc branch February 19, 2025 18:26
@dramaticlly
Copy link
Contributor Author

Special shoutout to @szehon-ho for iterating and refining the documentation, also appreciate @RussellSpitzer @flyrain @singhpk234 @manuzhang for additional input and review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants