-
Notifications
You must be signed in to change notification settings - Fork 100
file upload: do not refresh on every chunk #6048
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This pull request does not have a backport label. Could you fix it @pzl? 🙏
|
|
It doesn't seem like I have access to add/edit labels on this? |
Is there any way to get this into a test? There is nothing preventing us from regressing on the fix in this PR in the future right now. Even calls against mocks to ensure a refresh request happens after the whole file is received would do it. |
I think the reason here is that there isn't a behavioral change, but an implementation detail change. Testing implementation this narrowly would be fairly brittle, and would fail if modified at all (even if it maintains valid behavior). How the function validates that all chunks are present is interior. Could alternatively call search, count chunks, and wait+retry for some I do grant, "we found a bug, we should add tests to make sure we don't do it again by accident" is right, but "calling refresh 3,000 times in rapid succession" on an index was the bug. |
IMO this is the point, that the next person to touch this in 6 months or whatever is pointed to this call and any comments around it should they happen to change it. I wouldn't want to test everything this way but a brittle test is fine in this specific scenario. |
|
got it: this is more of a "here be dragons, don't break it unless you know the ramifications," situation. Accepting some brittle tests to draw even more attention that changes to this need to be very intentional. Guarding against inadvertent changes that upend the learnings and testing here I can add some comments and tests to that effect, sure |
cmacknz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the tests and additional comments, there is a nice paper trail to follow for use of refresh now.
|
Buildkite test this |
|
@pzl i saw you mention you didn't have the ability to add labels - do we need to backport this to any other branches? |
No, 9.3 and-up is good (and label permissions have since been fixed, thank you!) |
What is the problem this PR solves?
Improves elasticsearch performance when uploading large files
How does this PR solve the problem?
forcing
refreshis somewhat costly, and doing that on every chunk upload was unnecessary. For large files and many chunks, this was able to overwhelm elasticsearch. The chunk records only need to be searchable and found at the very end, so a single refresh call is used at the finalization step instead of every chunk.Note that the final refresh call is still necessary. It was observed that the finalization step was not always able to locate the most-recently-uploaded chunks immediately prior to the finish api call, and validation would fail.
How to test this PR locally
One could either do the "get agent diagnostics" zip function, or otherwise use elastic defend, go to responder and do a "get-file" command to execute this path.
Design Checklist
Checklist
./changelog/fragmentsusing the changelog tool