Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: experiment with bulk instert test results #319

Merged
merged 3 commits into from
Mar 13, 2024

Conversation

giovanni-guidini
Copy link
Contributor

These changes have 2 obejctives:

  1. (duh) try to speed up time of writing tests and test instances in the DB.
    This is inspired by https://docs.sqlalchemy.org/en/13/faq/performance.html#i-m-inserting-400-000-rows-with-the-orm-and-it-s-really-slow
    The idea is to bulk_update and let the DB handle the concurrency and the conflicts.

  2. Test using Sentry metrics + Features for perf potential improvements.
    Although I suspect the time benefit will be enough for us to prefer the bulk_insert technique I don't know what the benefit is (potentially). And there's the drawback of using extra memory, which I'm also not sure how much extra memory it is.
    So I want to use this opportunity to explore this setup of running perf experiments :D

Legal Boilerplate

Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. In 2022 this entity acquired Codecov and as result Sentry is going to need some rights from me in order to utilize my contributions in this PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.

These changes have 2 obejctives:
1. (duh) try to speed up time of writing tests and test instances in the DB.
    This is inspired by https://docs.sqlalchemy.org/en/13/faq/performance.html#i-m-inserting-400-000-rows-with-the-orm-and-it-s-really-slow
    The idea is to bulk_update and let the DB handle the concurrency and the conflicts.

2. Test using Sentry metrics + Features for perf potential improvements.
    Although I suspect the time benefit will be enough for us to prefer the bulk_insert technique I don't know what the benefit is (potentially). And there's the drawback of using extra memory, which I'm also not sure how much extra memory it is.
    So I want to use this opportunity to explore this setup of running perf experiments :D
Copy link

sentry-io bot commented Mar 13, 2024

🔍 Existing Issues For Review

Your pull request is modifying functions with the following pre-existing issues:

📄 File: tasks/test_results_processor.py

Function Unhandled Issue
process_individual_upload FileNotInStorageError: File test_results/v1/raw/2024-03-12/25D3451FB922A5B3C2F2E4A374E5B8F0/f7475b19e0a2366ab6d57730d0f0... ...
Event Count: 1

Did you find this useful? React with a 👍 or 👎

@codecov-qa
Copy link

codecov-qa bot commented Mar 13, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.12%. Comparing base (98df06e) to head (b6ba449).

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #319   +/-   ##
=======================================
  Coverage   98.12%   98.12%           
=======================================
  Files         385      385           
  Lines       31901    31959   +58     
=======================================
+ Hits        31302    31360   +58     
  Misses        599      599           
Flag Coverage Δ
integration 98.12% <100.00%> (+<0.01%) ⬆️
latest-uploader-overall 98.12% <100.00%> (+<0.01%) ⬆️
unit 98.12% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
NonTestCode 96.24% <100.00%> (+0.01%) ⬆️
OutsideTasks 97.91% <100.00%> (+<0.01%) ⬆️
Files Coverage Δ
rollouts/__init__.py 100.00% <100.00%> (ø)
tasks/test_results_processor.py 99.40% <100.00%> (+0.15%) ⬆️
...sks/tests/unit/test_test_results_processor_task.py 100.00% <100.00%> (ø)

Copy link

codecov-public-qa bot commented Mar 13, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (98df06e) 98.12% compared to head (b6ba449) 98.12%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #319   +/-   ##
=======================================
  Coverage   98.12%   98.12%           
=======================================
  Files         385      385           
  Lines       31901    31959   +58     
=======================================
+ Hits        31302    31360   +58     
  Misses        599      599           
Flag Coverage Δ
integration 98.12% <100.00%> (+<0.01%) ⬆️
latest-uploader-overall 98.12% <100.00%> (+<0.01%) ⬆️
unit 98.12% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
NonTestCode 96.24% <100.00%> (+0.01%) ⬆️
OutsideTasks 97.91% <100.00%> (+<0.01%) ⬆️
Files Coverage Δ
rollouts/__init__.py 100.00% <100.00%> (ø)
tasks/test_results_processor.py 99.40% <100.00%> (+0.15%) ⬆️
...sks/tests/unit/test_test_results_processor_task.py 100.00% <100.00%> (ø)

Copy link

codecov bot commented Mar 13, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.11%. Comparing base (98df06e) to head (b6ba449).

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #319   +/-   ##
=======================================
  Coverage   98.10%   98.11%           
=======================================
  Files         416      416           
  Lines       32601    32659   +58     
=======================================
+ Hits        31984    32042   +58     
  Misses        617      617           
Flag Coverage Δ
integration 98.12% <100.00%> (+<0.01%) ⬆️
latest-uploader-overall 98.12% <100.00%> (+<0.01%) ⬆️
unit 98.12% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
NonTestCode 96.18% <100.00%> (+0.01%) ⬆️
OutsideTasks 97.91% <100.00%> (+<0.01%) ⬆️
Files Coverage Δ
rollouts/__init__.py 100.00% <100.00%> (ø)
tasks/test_results_processor.py 99.40% <100.00%> (+0.15%) ⬆️
...sks/tests/unit/test_test_results_processor_task.py 100.00% <100.00%> (ø)
Related Entrypoints
run/app.tasks.test_results.TestResultsProcessor

Copy link
Contributor

@joseph-sentry joseph-sentry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes look good, I had trouble verifying the memory usage gathering locally but I think we can see if that problem persists in prod

# Obviously this is a very rough estimate of sizes. We are interested more
# in the difference between the insert approaches. SO this should be fine.
# And these aux memory structures take the bulk of extra memory we need
memory_used += getsizeof(test_data) // 1024
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this out locally and the memory_used i was getting was 0, I'm not sure if this is just a local thing I ran into or if there's a problem with this approach, I'm okay with trying it out to see if it works out in prod, since that's just my experience running it locally

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on the number of tests that you tested against it would be small enough in size that it would return 0. After all it's Kb and the integer division will round down.

Before adding the // 1024 I was getting value for those calls, so... maybe that 🤷

@giovanni-guidini giovanni-guidini force-pushed the gio/experiment-bulk-insert-testinstances branch from 878cb98 to b6ba449 Compare March 13, 2024 16:19
@giovanni-guidini giovanni-guidini merged commit 7ea10af into main Mar 13, 2024
26 checks passed
@giovanni-guidini giovanni-guidini deleted the gio/experiment-bulk-insert-testinstances branch March 13, 2024 16:33
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants