Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .palantir/revapi.yml
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,9 @@ acceptedBreaks:
- code: "java.method.addedToInterface"
new: "method java.util.List<org.apache.iceberg.ManifestFile> org.apache.iceberg.Snapshot::deleteManifests(org.apache.iceberg.io.FileIO)"
justification: "Allow adding a new method to the interface - old method is deprecated"
- code: "java.method.addedToInterface"
new: "method java.util.Map<java.lang.String, org.apache.iceberg.SnapshotRef> org.apache.iceberg.Table::refs()"
justification: "Adding new refs method to Table API for easier access"
- code: "java.method.addedToInterface"
new: "method long org.apache.iceberg.actions.ExpireSnapshots.Result::deletedEqualityDeleteFilesCount()"
justification: "Interface is backward compatible, very unlikely anyone implements this Result bean interface"
Expand Down
21 changes: 21 additions & 0 deletions api/src/main/java/org/apache/iceberg/Table.java
Original file line number Diff line number Diff line change
Expand Up @@ -305,4 +305,25 @@ default AppendFiles newFastAppend() {

/** Returns a {@link LocationProvider} to provide locations for new data files. */
LocationProvider locationProvider();

/**
* Returns the current refs for the table
*
* @return the current refs for the table
*/
Map<String, SnapshotRef> refs();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to expose SnapshotRef in the Table API? Why not return Snapshot for ref names?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking it made sense to expose Refs in the Table API because refs are maintained per table and any operation which leverages the table API can easily access them.

I think refs() is helpful mostly as a convenience method to list all the refs in a table (similar being able to list the table properties, or the schemas/partition specs etc).

For the cases where a user of the table API knows what ref they are looking for then the ref(ref) becomes helpful. If we don't return Snapshot for a given ref name, then a caller has to do

table.snapshot(table.refs(name).snashotId())

and they have to also take care of the null check in case a ref with name does not exist.

So my conclusion is the following:

1.) Expose the SnapshotRef in the Table API should be fine, because refs are maintained at the table level and an API expose to them either as a collection or a convenience method to look them up by name fits the model, and could be used.

2.) It will be common to want the snapshot for a given ref, and it also makes sense to have an API for returning a Snapshot for a given ref name.

Alternative:

The tradeoff of the above is the API is less minimal. For the purpose of the table scan we will ultimately need the snapshot for a given ref. So if we want to start minimal, what we can do is add the

Snapshot snapshot(String ref) signature and only when the refs(), ref(String name) are truly needed we add those.

@rdblue @jackye1995 @singhpk234 Let me know your thoughts!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, I'd probably add the snapshot(String name) method and put off adding methods to Table until we know that we will definitely need them.

Copy link
Contributor Author

@amogh-jahagirdar amogh-jahagirdar May 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, one thing I forgot was for SerializableTable, we'll need to pass the refs through https://github.com/apache/iceberg/pull/4428/files#diff-46dafb425240806166ccc8f27a6c301781c5082bf1a187291cc92c2a8f17a588R85 so we'll need Map<String, SnapshotRef> refs() to implement the contract of snapshot(String name) in SerializableTable.

So then we need both refs() and snapshot(String name).

I think can leave off the ref(String name) for now, until we know that convenience method is really useful; I anticipate it will become useful mostly just to avoid another level of indirection when looking up from the map, but we can aim for just keeping the API changes minimal for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this also have a default that creates a map of "main" to the current snapshot ID?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's one way to go, I've been thinking it would actually be simpler if when parsing the metadata, if a main branch does not exist, set it to the current snapshot. That way a lot of the API logic can rely on this assumption that main exists and we don't need a lot of special code for the case where the new Iceberg library is reading an older metadata file where refs (and thus main) may not exist.

This PR would be blocked on a PR for doing that so I will raise that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that sounds reasonable as well.


/**
* Returns the snapshot referenced by the given name or null if no such reference exists.
*
* @return the snapshot which is referenced by the given name or null if no such reference exists.
*/
default Snapshot snapshot(String name) {
SnapshotRef ref = refs().get(name);
if (ref != null) {
return snapshot(ref.snapshotId());
}

return null;
}
}
5 changes: 5 additions & 0 deletions core/src/main/java/org/apache/iceberg/BaseMetadataTable.java
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,11 @@ public List<HistoryEntry> history() {
return table().history();
}

@Override
public Map<String, SnapshotRef> refs() {
return table().refs();
}

@Override
public UpdateSchema updateSchema() {
throw new UnsupportedOperationException("Cannot update the schema of a metadata table");
Expand Down
5 changes: 5 additions & 0 deletions core/src/main/java/org/apache/iceberg/BaseTable.java
Original file line number Diff line number Diff line change
Expand Up @@ -239,6 +239,11 @@ public LocationProvider locationProvider() {
return operations().locationProvider();
}

@Override
public Map<String, SnapshotRef> refs() {
return ops.current().refs();
}

@Override
public String toString() {
return name();
Expand Down
5 changes: 5 additions & 0 deletions core/src/main/java/org/apache/iceberg/BaseTransaction.java
Original file line number Diff line number Diff line change
Expand Up @@ -747,6 +747,11 @@ public LocationProvider locationProvider() {
return transactionOps.locationProvider();
}

@Override
public Map<String, SnapshotRef> refs() {
return current.refs();
}

@Override
public String toString() {
return name();
Expand Down
7 changes: 7 additions & 0 deletions core/src/main/java/org/apache/iceberg/SerializableTable.java
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ public class SerializableTable implements Table, Serializable {
private final FileIO io;
private final EncryptionManager encryption;
private final LocationProvider locationProvider;
private final Map<String, SnapshotRef> refs;

private transient volatile Table lazyTable = null;
private transient volatile Schema lazySchema = null;
Expand All @@ -81,6 +82,7 @@ protected SerializableTable(Table table) {
this.io = fileIO(table);
this.encryption = table.encryption();
this.locationProvider = table.locationProvider();
this.refs = table.refs();
}

/**
Expand Down Expand Up @@ -235,6 +237,11 @@ public LocationProvider locationProvider() {
return locationProvider;
}

@Override
public Map<String, SnapshotRef> refs() {
return refs;
}

@Override
public void refresh() {
throw new UnsupportedOperationException(errorMsg("refresh"));
Expand Down