Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -420,7 +420,7 @@ public void testOlderThanTimestamp() throws InterruptedException {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't comment on it, but do any of the other waitUntilAfter calls need this change? There's one right above this comment.

I would think the one below would be sufficient, as it's two in a row (essentially this is sort of like thread.sleep(1000)).

Copy link
Copy Markdown
Contributor Author

@sumeetgajjar sumeetgajjar May 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't comment on it, but do any of the other waitUntilAfter calls need this change?

This is the only place where it is necessary.
I took a look at all the other tests in this Suite when I filed the PR, and I did not find any places where we would require such a change.

as it's two in a row (essentially this is sort of like thread.sleep(1000)).

Yes, exactly :-)

long timestamp = System.currentTimeMillis();

waitUntilAfter(System.currentTimeMillis());
waitUntilAfter(System.currentTimeMillis() + 1000L);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be better, if possible, would be to access the timestamp from the snapshot summary and then wait until after that (which is what we do in many other tests).

However, given that this is a Spark test, the time it would take to access the summary from the commit means that it would likely take longer to do that than any of the waitUntilAfter (i.e. that time would likely have passed by at least a few hundred milliseconds on any machine).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let me try that.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I believe in this explicit case, using the timestamp from the snapshot summary won't be useful. The primary reason being we are trying to delete parquet files at the "data/c2_trunc=AA/c3=AAAA" location which are not managed by Iceberg.

df.write().mode("append").parquet(tableLocation + "/data/c2_trunc=AA/c3=AAAA");
df.write().mode("append").parquet(tableLocation + "/data/c2_trunc=AA/c3=AAAA");

Had it been the case if the supposed to be orphan files were created using iceberg we could have leveraged the timestamp from the snapshot summary.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the point is still valid. waitUntilAfter should be used so that you know that a certain amount of time, in milliseconds, has elapsed. That gives you the ability to create one file, wait until the next millisecond, create another, wait, etc. to make sure they don't have the same write time. It doesn't need to be a snapshot timestamp, but you shouldn't need to wait for an entire second.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That gives you the ability to create one file, wait until the next millisecond, create another, wait, etc. to make sure they don't have the same write time.

Hi @rdblue - I agree with you.
Writing the file at the very next millisecond should do the trick here but I believe that is what the original code is doing and still, we are seeing the failure.

We experimented with various values less than 1000L milliseconds but none of them got us a 100% success rate for the test. 1000L was the lowest value that gave a 100% success rate.

I can investigate more if required or use the 1000L value. Using a value of 1000L would make this test equivalent to what we had before #4711

Please let me know your thoughts.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sumeetgajjar left a comment on the main page of this PR, but check out the solution we came up with in another PR / issue discussion: #4859

We noticed a different test in this suite fail in GitHub CI as well. We just made the olderThan argument in the future to account for it the time precision issue without having to do excessive busy waiting.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sumeetgajjar, sorry, I think this was my misunderstanding. After going through your write-up more carefully, I see that the timestamps reported by the files are in seconds. So you're right: to ensure that the timestamp is different, we need to wait a full second.


df.write().mode("append").parquet(tableLocation + "/data/c2_trunc=AA/c3=AAAA");

Expand Down