-
Notifications
You must be signed in to change notification settings - Fork 3k
Core: Maintain passed in ordering of files in Manifest Lists #13411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Theoretically I think we can use an Array instead of the AtomicReference Array but performance is basically the same again. Perf with vanilla array |
singhpk234
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM !
| forceGC = true | ||
| includeTests = true | ||
| humanOutputFile = file(jmhOutputPath) | ||
| jvmArgs = ['-Xmx32g'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[doubt] is this required for the change ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was to make the benchmarks run without ooming. The OOM had to due with caching the files to be added before committing. The other change fixes a "Table Already Exists" exception that gets thrown if you run the benchmark suite
|
|
||
| @Setup | ||
| public void setupBenchmark() { | ||
| dropTable(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't cleanup already called in line 96 in the tearDown method? in the other comment, you mentioned that table exist exception, is it leftover from previous run that was interrupted and not cleaned up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I honestly didn't check, I think it was probably the previous run OOMing
|
thanks @RussellSpitzer for the change and @singhpk234 for the review |
…es in manifests apache#13411 help maintain passed in ordering of files in manifest lists and this help ensure this ordering guarantee
The fix in #11086 allows the base writer class to write manifests in parallel rather than one at a time but it also made the final ordering of the manifest files in the manifest list no longer deterministic.
In order to add back in the determinism and preserve the order of the passed in files (if the collection does have such an order) I added a change to regroup the results of the written manifests based on their original positions.
I don't think there are any performance implications of this (since we stored all the entries in memory both before and after I patched) but I ran the benchmarks just to check. The results look pretty much the same.
Pre-Patch
Post-patch