-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Worker] Implement parallel upload processing #84
Comments
This is a great idea and merits further research. Please research this and create a notion investigation into the feasibility here. Involve Scott, Dana, Gio as needed to gain context and discuss the solution. |
first step is build something that will let us measure whether the idea works or not. the way the code is, it's not dashboardable with statsd timers or sentry traces. i have a PR up with a solution, and once it is in prod for a little while we can make a dashboard, deploy this proposal, and see how the metrics move |
end of sprint update:
|
i've spent a couple days hacking on this now. pushed where i left off to i think it's achievable but implementation/validation/rollout will be a lot of work. going to backburner it for now and will revisit later in q3 or in q4. so leaving the task open but it won't be added to sprints until maybe later the high-level approach is:
the current report-merging code really only supports merging one upload at a time. when i come back to it, i think the fastest way to get a demo is to lower the batch size from 3 to 1. then i should be able to reuse for a real implementation, there's more thinking to do:
|
codecov/worker#127 is a draft PR that implements this in more validation needs to happen before merging:
going to backburner for now for two reasons:
|
Linking this issue: https://github.com/codecov/internal-issues/issues/699 as I think this can be resolved with parallelization. |
UploadTask
breaks the list of uploads for a commit into chunks of 3 and then dispatchesUploadProcessorTasks
for each chunk serially:https://github.com/codecov/worker/blob/master/tasks/upload.py#L374-L387
UploadProcessorTask
will lock the whole task in redis, fetch theReport
for the commit as it was left by the previous instance ofUploadProcessorTask
, and update it with the result of processing the current chunk:https://github.com/codecov/worker/blob/master/tasks/upload_processor.py#L80-L84
https://github.com/codecov/worker/blob/master/tasks/upload_processor.py#L135-L140
https://github.com/codecov/worker/blob/master/tasks/upload_processor.py#L168-L174
https://github.com/codecov/worker/blob/master/tasks/upload_processor.py#L191-L192
https://github.com/codecov/worker/blob/master/tasks/upload_processor.py#L201-L209
We may be able to improve performance for many-upload uses by following more of a map/reduce pattern: run all the chunk processing tasks in parallel (via Celery
chord
) and then synchronize and merge all the reports in a single task at the end.The text was updated successfully, but these errors were encountered: