Skip to content

Conversation

@jackye1995
Copy link
Contributor

follow up for #2845 , because now we allow a fallback mechanism for object storage data path, it is safe to remove this property when doing a table snapshot. The destination table will use its default location as data path, and users can configure later for a new path if necessary.

@kbendick @aokolnychyi

@github-actions github-actions bot added the spark label Aug 13, 2021
Copy link
Contributor

@kbendick kbendick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments, particularly around documenting somehow (via code comments or in the docs) that we’re explicitly not bringing the object storage path with the snapshot, but overall I think this is a good idea.

If a user snapshots a table, sharing the same object storage path could possibly have some funky implications for things like removing orphan files. So this seems safer in my opinion.

properties.remove(LOCATION);
properties.remove(TableProperties.WRITE_METADATA_LOCATION);
properties.remove(TableProperties.WRITE_NEW_DATA_LOCATION);
properties.remove(TableProperties.OBJECT_STORE_PATH);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking: it might make sense to add a comment here that we’re explicitly choosing not to bring along OBJECT_STORE_PATH in the snapshot?

Either a comment, or possibly updating the ObjectStorageLocationProvider docs / snapshot docs with this detail would be great 🙂. Documentation updates can be done in a separate PR of course (and happy to assist there if you’d like).

Assert.assertEquals("should use object storage location provider",
"org.apache.iceberg.LocationProviders$ObjectStoreLocationProvider",
locationProvider.getClass().getName());
Assert.assertTrue("should use table folder storage path",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we might want to further clarify what we’re testing for in the assertion.

Something like Should use table folder storage path after unsetting the object storage location path or `should use table folder storage path if present when object storage path is not present”.

One could argue that these assertions could be subject to the same problems as comment rot if tests get changed, so I’ll defer to your judgement.

Also: Given that the names of the constants and their string representations are a little funky (particularly folder storage path / WRITE_NEW_DATA_LOCATION), it might make sense to refer to both at some point? Again, will leave that to your discretion but I think it might help clarify for readers. 🙂

@github-actions
Copy link

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Jul 19, 2024
@github-actions
Copy link

This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this Jul 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants