Skip to content

Conversation

@mccheah
Copy link
Contributor

@mccheah mccheah commented Dec 13, 2018

Tricky because the table operations needs to be exposed. Added a mixed interface of Table + TableOperations accordingly. Therefore now the Spark data source is opinionated about ensuring the returned table also implements HasTableOperations.

Tricky because the table operations needs to be exposed. Added a mixed interface of Table + TableOperations accordingly.
}

protected Table findTable(DataSourceOptions options) {
protected TableWithTableOperations findTable(DataSourceOptions options) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rdblue this is the most interesting question raised by this patch. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about adding io to the Table interface? I'd rather do that since FileIO is a public interface. I think that is mostly what HasTableOperations is used for anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's fine.

/**
* @return a {@link FileIO} to read and write table data and metadata files
*/
FileIO io();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should TableOperations#io continue to exist?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because TableOperations is still how implementations are passed.

int hash = HASH_FUNC.apply(partitionAndFilename);
return new Path(objectStorePath,
String.format("%08x/%s/%s", hash, context, partitionAndFilename));
return UriBuilder.fromUri(URI.create(objectStore))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Path/URI methods cause problems with escape characters. This should use strings for reliability.


TestTableOperations ops() {
@Override
public TestTableOperations operations() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this rename needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous version didn't actually override the superclass's method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably because we didn't expose TableOperations at first. Now that FileIO will be public, we can probably remove HasTableOperations in a follow-up.

@rdblue
Copy link
Contributor

rdblue commented Dec 14, 2018

@mccheah, thanks for working on this! It's about ready to go, but we need to change back to using String instead of URI. There's also an unnecessary rename that we should remove.

Function<PartitionKey, String> outputPathFunc = key ->
String.format("%s/%s/%s",
stripTrailingSlash(baseDataPath),
stripTrailingSlash(stripLeadingSlash(key.toPath())),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is always implemented by PartitionSpec#partitionToPath. That never adds a starting or trailing / so this can rely on that and doesn't need to do the extra check.

String baseDataPath = stripTrailingSlash(dataLocation); // avoid calling this in the output path function
Function<PartitionKey, String> outputPathFunc = key ->
String.format("%s/%s/%s",
stripTrailingSlash(baseDataPath),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

baseDataPath already stripped the trailing /.

Path baseDataPath = lazyDataPath(); // avoid calling this in the output path function
Function<PartitionKey, Path> outputPathFunc = key ->
new Path(new Path(baseDataPath, key.toPath()), filename);
String baseDataPath = stripTrailingSlash(dataLocation); // avoid calling this in the output path function
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is an internal implementation, I think that the dataLocation() method should guarantee that the trailing / is removed. That way we never need to do this in the tasks.

String.format("%08x/%s/%s", hash, context, partitionAndFilename));
return String.format(
"%s/%08x/%s/%s/%s",
stripTrailingSlash(objectStore),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same general ideas here as above. This shouldn't remove slashes if it can be done in one central place or strings are generated by Iceberg code. We may need to document that other methods must not produce trailing or starting /, but that is a better way to go.

@rdblue
Copy link
Contributor

rdblue commented Jan 7, 2019

@mccheah, I think this is just about ready to merge, but it doesn't need most of the calls to strip slashes. Once those are fixed, I'll merge it. No rush since you're probably just back from the holidays.

@mccheah
Copy link
Contributor Author

mccheah commented Jan 8, 2019

Almost there, addressed the comments, fixed the merge conflicts.

@rdblue rdblue merged commit 67fbe5d into apache:master Jan 9, 2019
@rdblue
Copy link
Contributor

rdblue commented Jan 9, 2019

Merged. Thanks @mccheah!

renato2099 pushed a commit to renato2099/incubator-iceberg that referenced this pull request Jan 30, 2019
jun-ma-0 pushed a commit to jun-ma-0/incubator-iceberg that referenced this pull request Feb 6, 2020
PLAT-41915 Add Base ADLS Table ops implementation for atomic version_hint io
puchengy added a commit to puchengy/iceberg that referenced this pull request Jun 20, 2023
…cked Iceberg Parquet table (apache#28) (apache#52)

(cherry picked from commit 165c5e8)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants