Skip to content

Conversation

@jackye1995
Copy link
Contributor

closes #2964

@cobookman, @kbendick and @openinx brought up the issue during #2845 that the variable name WRITE_NEW_DATA_LOCATION is inconsistent with the actual value write.folder-storage.path. This PR deprecates the old variable name to use a name that is consistent with the actual value.

I looked into the original PR that introduced this which is #6, I think @rdblue suggested using the value write.folder-storage.path, but the author @mccheah did not update the variable name.

Please let me know if there is any concern about this deprecation.

Copy link
Contributor

@kbendick kbendick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I agree that the name is somewhat confusing, but now that we have added some consistency in the naming, does the multiple fallback resolution strategy (object storage path -> folder storage path -> default location) still make sense?

I believe that it does, to be somewhat consistent for users (as once a user has specified a path, they probably intend to keep it), but I’d be open to being swayed if somebody presented a strong argument.

With time, we should potentially consider deprecating allowing both to be set and keep them distinct (folder storage or object storage). But that’s definitely a discussion for the dev list and another day (if ever).

Overall the PR looks clean to me though 👍

I’m interested in other people’s thoughts, but in the absence of feedback this looks good to me.

/**
* @deprecated will be removed in 0.14.0, use {@link #WRITE_FOLDER_STORAGE_LOCATION} instead
*/
@Deprecated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to the deprecation.

When the deprecation comes around, we should consider providing an action to migrate tables off of deprecated properties. In general, I think that would allow us to make smarter decisions about table property names as new needs surface. Something similar to the v2 table migration action. But that’s a long ways off.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think migration is unnecessary. The actual value stored is always write.folder-storage.path, we are just changing the Java variable name. And as I noted, we will deprecate after 2 releases, so that people can have time to change their variable name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why will be removed in 0.14.0? Is there a convention to follow?

public void testDefaultLocationProviderWithCustomDataLocation() {
this.table.updateProperties()
.set(TableProperties.WRITE_NEW_DATA_LOCATION, "new_location")
.set(TableProperties.WRITE_FOLDER_STORAGE_LOCATION, "new_location")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to leave a test for the deprecated config value?

I’d love to even log something if a user sets it, but that can be considered later (if at all).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the same value, just using a different name to avoid confusion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I see now. This won't affect any external folks unless they have some custom stuff that touches this config.

@jackye1995
Copy link
Contributor Author

close to force restart CI

@jackye1995 jackye1995 closed this Aug 13, 2021
@jackye1995 jackye1995 reopened this Aug 13, 2021
@flyrain
Copy link
Contributor

flyrain commented Aug 19, 2021

The PR looks good to me.

fallback strategy (object storage path -> folder storage path -> default location)

An open question: can we consolidate both write.folder-storage.path and write.object-storage.path? They are unnecessary complexity. Ironically, we have to introduce a new config to consolidate two olde ones, :-).

@flyrain
Copy link
Contributor

flyrain commented Aug 19, 2021

@jackye1995
Copy link
Contributor Author

@flyrain great point around this, I am fully for a consolidation, this PR is really to prepare for that. My thought is the following:

  1. in Core: allow default data location for ObjectStorageLocationProvider #2845 I implemented the fallback strategy
  2. in this PR, we first consolidate the variable name to be consistent
  3. in a subsequent PR, we can deprecate write.object-storage.path and only have 1 overwrite config.

For 3, my thinking is to avoid the need to introduce yet another config key, although I understand the name write.folder-storage.path isn't really ideal. Ideally we can have write.data.path which parallels write.metadata.path, but I really don't want to introduce yet another variable. What's your thoughts on this?

Copy link
Member

@RussellSpitzer RussellSpitzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm +1

Not sure about the deprecation message since i'm not sure we'll have the right version there when we actually release but otherwise seems good.

@flyrain
Copy link
Contributor

flyrain commented Aug 19, 2021

  1. in a subsequent PR, we can deprecate write.object-storage.path and only have 1 overwrite config.

For 3, my thinking is to avoid the need to introduce yet another config key, although I understand the name write.folder-storage.path isn't really ideal. Ideally we can have write.data.path which parallels write.metadata.path, but I really don't want to introduce yet another variable. What's your thoughts on this?

Nice plan! I like the name write.data.path, but write.folder-storage.path is fine since it is there for a while. We can bring it up in the next community sync-up. We can go with write.data.path if there aren't many users using write.folder-storage.path.

Also let us know if you don't have time for step 3, we are glad to help. Thanks @jackye1995!

@rdblue rdblue merged commit aef12c0 into apache:master Aug 23, 2021
@rdblue
Copy link
Contributor

rdblue commented Aug 23, 2021

Thanks, @jackye1995!

@jackye1995
Copy link
Contributor Author

Thanks for approving! @flyrain great, let's bring this up in the next community meeting then. Meanwhile I still need to catch up with your discussion on dev list 😝

@rdblue
Copy link
Contributor

rdblue commented Sep 8, 2021

+1 for deprecating write.folder-storage.path, but I'm not sure we'll be able to actually remove it, since that would break forward-compatibility when older writers write to a table. We should definitely update docs to use write.data.path, though.

@flyrain
Copy link
Contributor

flyrain commented Sep 9, 2021

Cool. Let's deprecate both write.folder-storage.path and write.object-storage.path, and use a new name write.data.path instead.

@kbendick
Copy link
Contributor

kbendick commented Sep 9, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rename TableProperties.WRITE_NEW_DATA_LOCATION to TableProperties.FOLDER_STORE_PATH

5 participants