-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Test: Fix flaky testOlderThanTimestamp in TestRemoveOrphanFilesAction3 #4825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -420,7 +420,7 @@ public void testOlderThanTimestamp() throws InterruptedException { | |||||
|
|
||||||
| long timestamp = System.currentTimeMillis(); | ||||||
|
|
||||||
| waitUntilAfter(System.currentTimeMillis()); | ||||||
| waitUntilAfter(System.currentTimeMillis() + 1000L); | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What would be better, if possible, would be to access the timestamp from the snapshot summary and then wait until after that (which is what we do in many other tests). However, given that this is a Spark test, the time it would take to access the summary from the commit means that it would likely take longer to do that than any of the
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So I believe in this explicit case, using the timestamp from the snapshot summary won't be useful. The primary reason being we are trying to delete parquet files at the "data/c2_trunc=AA/c3=AAAA" location which are not managed by Iceberg. Lines 416 to 417 in 566b2fe
Had it been the case if the supposed to be orphan files were created using iceberg we could have leveraged the timestamp from the snapshot summary.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the point is still valid.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Hi @rdblue - I agree with you. We experimented with various values less than 1000L milliseconds but none of them got us a 100% success rate for the test. 1000L was the lowest value that gave a 100% success rate. I can investigate more if required or use the 1000L value. Using a value of 1000L would make this test equivalent to what we had before #4711 Please let me know your thoughts.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @sumeetgajjar left a comment on the main page of this PR, but check out the solution we came up with in another PR / issue discussion: #4859 We noticed a different test in this suite fail in GitHub CI as well. We just made the
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @sumeetgajjar, sorry, I think this was my misunderstanding. After going through your write-up more carefully, I see that the timestamps reported by the files are in seconds. So you're right: to ensure that the timestamp is different, we need to wait a full second. |
||||||
|
|
||||||
| df.write().mode("append").parquet(tableLocation + "/data/c2_trunc=AA/c3=AAAA"); | ||||||
|
|
||||||
|
|
||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't comment on it, but do any of the other
waitUntilAftercalls need this change? There's one right above this comment.I would think the one below would be sufficient, as it's two in a row (essentially this is sort of like thread.sleep(1000)).
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only place where it is necessary.
I took a look at all the other tests in this Suite when I filed the PR, and I did not find any places where we would require such a change.
Yes, exactly :-)