Conversation
There was a problem hiding this comment.
It's quite difficult to test DATE type, I'm still thinking about how to simulate that... meanwhile I am just supporting the 2 timestamp types.
There was a problem hiding this comment.
It's also unclear to me if we should take the beginning of a day or end of a day for time travel, so I think it's better to just not support it until explicit user request.
There was a problem hiding this comment.
I agree, using dates would be problematic as Iceberg stores the snapshot timestamp in millis.
There was a problem hiding this comment.
It's also unclear to me if we should take the beginning of a day or end of a day
TIMESTAMP 2021-01-02 03:04:05 means TIMESTAMP 2021-01-02 03:04:05.000000000
TIMESTAMP 2021-01-02 03:04 means TIMESTAMP 2021-01-02 03:04:00.000000000
TIMESTAMP 2021-01-02 means TIMESTAMP 2021-01-02 00:00:00.000000000
DATE 2021-01-02 means TIMESTAMP 2021-01-02
i.e. beginning of the day
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergReadVersionedTable.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergReadVersionedTable.java
Outdated
Show resolved
Hide resolved
2124cdb to
891221f
Compare
891221f to
d47082f
Compare
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergUtil.java
Outdated
Show resolved
Hide resolved
d47082f to
82af1cc
Compare
82af1cc to
aaf1cd9
Compare
alexjo2144
left a comment
There was a problem hiding this comment.
Overall, looks good to me.
One more general testing question, what happens if you have the FOR TIMESTAMP AS OF syntax in a DML statement like an insert? Does that parse?
There was a problem hiding this comment.
Any reason not to reuse the code in IcebergQueryRunner#createIcebergQueryRunner?
There was a problem hiding this comment.
because I need the catalog object to perform operations like loadTable and inspect information in the table.
There was a problem hiding this comment.
Please do use a helper method like the following
private Optional<Long> getLatestSnapshotId(String tableName)
{
MaterializedResult result = computeActual(format("SELECT snapshot_id FROM \"%s$snapshots\" ORDER BY committed_at DESC LIMIT 1", tableName));
return result.getRowCount() > 0 ? Optional.of((Long) result.getOnlyValue()) : Optional.empty();
}
and replace the createQueryRunner() method with:
@Override
protected DistributedQueryRunner createQueryRunner()
throws Exception
{
return createIcebergQueryRunner();
}
In this fashion you can avoid the low-level details of building from scratch the IcebergQueryRunner and working with Iceberg specific API and the maintenance of the test can be made easier.
There was a problem hiding this comment.
More of a semantic question for the AS OF syntax, but should this fail or should it return an empty set of data?
There was a problem hiding this comment.
I think it should fail, returning an empty set of data should mean the table snapshot at that point of time actually contains no data.
There was a problem hiding this comment.
And this will also match the Spark SQL behavior in Iceberg.
There was a problem hiding this comment.
I agree with @jackye1995 that the query should fail instead of returning an empty set of data.
|
@alexjo2144 thanks for the review!
if you are talking about If you are talking about |
There was a problem hiding this comment.
Please do check whether the timestamp is in the future.
I think we shouldn't allow future timestamps here.
There was a problem hiding this comment.
why not? I think there could be a valid use case where a process provide a fixed timestamp for query to get consistent query result, and the timestamp could be in the future at the beginning.
There was a problem hiding this comment.
@findinpath i think this is a really important question.
The time travel syntax is supposed to provide a consistent, time-related view of a table.
Allowing timestamps in the future doesn't provide this, since future state naturally still may change.
i think this should be handled on the engine level though.
There was a problem hiding this comment.
From a user perspective, if the oldest history timestamp is shown, I think it should be formatted to be easily readable.
Use for reference (in case you want to show the timestamp to the user) :
- https://github.com/trinodb/trino/blob/master/plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/HistoryTable.java#L85
- https://github.com/trinodb/trino/blob/master/core/trino-spi/src/main/java/io/trino/spi/type/Timestamps.java#L147 (this corresponds to the what is being seen in the
trino cliwhen retrievingselect current_timestamp;)
Just as an alternative, Snowflake uses the following message in such situations:
Time travel data is not available for table %s. The requested time is either beyond the allowed time travel period or before the object creation time.
There was a problem hiding this comment.
Thanks, I have made the changes (similar to the Snowflake's message).
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
|
Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to cla@trino.io. For more information, see https://github.com/trinodb/cla. |
|
I have rebased the changes. |
There was a problem hiding this comment.
Please refactor the test to sleep at most few millis.
There was a problem hiding this comment.
"unixtime" can be misunderstood.
call it "epochSeconds"
(same below)
There was a problem hiding this comment.
You don't need a new schema. Use the default schema provided by the query runner.
(i.e. don't qualify table references with schema at all)
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Avoid string concatenation when condition holds
verify(versionType == BIGINT, "Iceberg target ID time travel only supports type Bigint, but got %s", versionType);
There was a problem hiding this comment.
I am not sure that the interpretation in UTC is the right one.
This is user-provided, so interpreting in session zone is probably appropriate.
cc @martint
There was a problem hiding this comment.
bump.
for now you can remove timestamp support from here, to unblock the PR
There was a problem hiding this comment.
Sure, I have removed the timestamp support.
There was a problem hiding this comment.
It's also unclear to me if we should take the beginning of a day or end of a day
TIMESTAMP 2021-01-02 03:04:05 means TIMESTAMP 2021-01-02 03:04:05.000000000
TIMESTAMP 2021-01-02 03:04 means TIMESTAMP 2021-01-02 03:04:00.000000000
TIMESTAMP 2021-01-02 means TIMESTAMP 2021-01-02 00:00:00.000000000
DATE 2021-01-02 means TIMESTAMP 2021-01-02
i.e. beginning of the day
There was a problem hiding this comment.
What is the order of table.history() elements?
is it guaranteed?
(the API doesn't seem to document this and this is being relied on)
There was a problem hiding this comment.
Yes, it is guaranteed
I can read the implementation myself, and the usages.
Still, to me "guaranteed" means something more than that.
It's not "guaranteed" unless documentation says it is guaranteed.
So for now we should do filter + max.
Or, make a PR to Iceberg to update table.history() documentation.
There was a problem hiding this comment.
Was going through this, just wanted to mention I don't think table.history() ordered is guaranteed due to staged commits. See apache/iceberg#3891 (comment) for more details.
It's part the reason why for time travel lookups of snapshots, we don't leverage binary search
There was a problem hiding this comment.
Thanks @amogh-jahagirdar for the reference
@rajarshisarkar can you add that link as an explanatory code comment?
There was a problem hiding this comment.
Ah, I see. Sure, let me use the filter + max logic here and add the link as an explanatory code comment.
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergUtil.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Use the pattern from io.trino.plugin.iceberg.BaseIcebergConnectorTest#getLatestSnapshotId or extract an utility method
b5e412b to
935996e
Compare
There was a problem hiding this comment.
| if (table.snapshot((long) version.getVersion()) == null) { | |
| throw new TrinoException(GENERIC_USER_ERROR, "Iceberg target ID does not exists: " + version.getVersion()); | |
| } | |
| return (long) version.getVersion(); | |
| ilong snapshotId = (long) version.getVersion(); | |
| if (table.snapshot(snapshotId) == null) { | |
| throw new TrinoException(GENERIC_USER_ERROR, "Iceberg snapshot ID does not exists: " + snapshotId); | |
| } | |
| return snapshotId; |
There was a problem hiding this comment.
bump.
for now you can remove timestamp support from here, to unblock the PR
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
| Long snapshotId = null; | |
| for (HistoryEntry logEntry : table.history()) { | |
| if (logEntry.timestampMillis() <= timestampMillis) { | |
| snapshotId = logEntry.snapshotId(); | |
| } | |
| } | |
| if (snapshotId == null) { | |
| throw new TrinoException(GENERIC_USER_ERROR, format("Time travel data is not available for table " + | |
| "%s. The requested time is beyond the allowed time travel period.", table.name())); | |
| } | |
| return snapshotId; | |
| return table.history().stream() | |
| .filter(logEntry -> logEntry.timestampMillis() <= timestampMillis) | |
| .max(comparing(HistoryEntry::timestampMillis)) | |
| .orElseThrow(() -> new TrinoException(GENERIC_USER_ERROR, format( | |
| "Time travel data is not available for table %s. The requested time is beyond the allowed time travel period.", | |
| table.name()))) | |
| .snapshotId(); |
There was a problem hiding this comment.
Please refactor the test to sleep at most few millis.
bump
There was a problem hiding this comment.
v1SnapshotId is the ID of the v1 snapshot
v1Unixtime should be timestamp of the v1 snapshot, not "some timestamp after that"
read actual snapshot's timestamp (committed_at) instead of assuming (otherwise code requires explanation)
There was a problem hiding this comment.
Makes sense to me, I have made the change. Thanks, for the suggestion.
There was a problem hiding this comment.
redundant, query runner cleanup will do that
There was a problem hiding this comment.
inline timestamp(1, 8)
just use some fixed timestamp that's before today, as a SQL literal value
There was a problem hiding this comment.
Sure, I have used TIMESTAMP '1970-01-01 00:00:00.001000000 Z' now
There was a problem hiding this comment.
from_unixtime semantics are never obvious to me (eg is it session zone dependent?)
Make the code more explicit
private String timestampLiteral(long epochSeconds, int precision)
{
return DateTimeFormatter.ofPattern("'TIMESTAMP '''uuuu-MM-dd HH:mm:ss." + "S".repeat(precision) + " VV''")
.format(Instant.ofEpochMilli(epochSeconds).atZone(UTC));
}this also frees you from needing to wrap the values in CAST and differentiate between "short" and "long".
There was a problem hiding this comment.
This looks a lot cleaner, I have made the change. Thanks, for the suggestion.
There was a problem hiding this comment.
"throws" should be on next line
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
|
@rajarshisarkar did you maybe have time to apply the comments? |
|
@findepi I will update the PR this week. |
935996e to
2008a31
Compare
|
@findepi I have addressed the review comments + rebased the changes. |
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergReadVersionedTable.java
Outdated
Show resolved
Hide resolved
|
@rajarshisarkar i've applied some changes myself. |
|
@rajarshisarkar @jackye1995 thank you! |

Support AS OF TIMESTAMP and AS OF VERSION queries for Iceberg.
@electrum @findepi @losipiuk @alexjo2144