-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: experiment with bulk instert test results #319
perf: experiment with bulk instert test results #319
Conversation
These changes have 2 obejctives: 1. (duh) try to speed up time of writing tests and test instances in the DB. This is inspired by https://docs.sqlalchemy.org/en/13/faq/performance.html#i-m-inserting-400-000-rows-with-the-orm-and-it-s-really-slow The idea is to bulk_update and let the DB handle the concurrency and the conflicts. 2. Test using Sentry metrics + Features for perf potential improvements. Although I suspect the time benefit will be enough for us to prefer the bulk_insert technique I don't know what the benefit is (potentially). And there's the drawback of using extra memory, which I'm also not sure how much extra memory it is. So I want to use this opportunity to explore this setup of running perf experiments :D
🔍 Existing Issues For ReviewYour pull request is modifying functions with the following pre-existing issues: 📄 File: tasks/test_results_processor.py
Did you find this useful? React with a 👍 or 👎 |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #319 +/- ##
=======================================
Coverage 98.12% 98.12%
=======================================
Files 385 385
Lines 31901 31959 +58
=======================================
+ Hits 31302 31360 +58
Misses 599 599
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
@@ Coverage Diff @@
## main #319 +/- ##
=======================================
Coverage 98.12% 98.12%
=======================================
Files 385 385
Lines 31901 31959 +58
=======================================
+ Hits 31302 31360 +58
Misses 599 599
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #319 +/- ##
=======================================
Coverage 98.10% 98.11%
=======================================
Files 416 416
Lines 32601 32659 +58
=======================================
+ Hits 31984 32042 +58
Misses 617 617
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes look good, I had trouble verifying the memory usage gathering locally but I think we can see if that problem persists in prod
# Obviously this is a very rough estimate of sizes. We are interested more | ||
# in the difference between the insert approaches. SO this should be fine. | ||
# And these aux memory structures take the bulk of extra memory we need | ||
memory_used += getsizeof(test_data) // 1024 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested this out locally and the memory_used i was getting was 0, I'm not sure if this is just a local thing I ran into or if there's a problem with this approach, I'm okay with trying it out to see if it works out in prod, since that's just my experience running it locally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending on the number of tests that you tested against it would be small enough in size that it would return 0. After all it's Kb and the integer division will round down.
Before adding the // 1024
I was getting value for those calls, so... maybe that 🤷
878cb98
to
b6ba449
Compare
These changes have 2 obejctives:
(duh) try to speed up time of writing tests and test instances in the DB.
This is inspired by https://docs.sqlalchemy.org/en/13/faq/performance.html#i-m-inserting-400-000-rows-with-the-orm-and-it-s-really-slow
The idea is to bulk_update and let the DB handle the concurrency and the conflicts.
Test using Sentry metrics + Features for perf potential improvements.
Although I suspect the time benefit will be enough for us to prefer the bulk_insert technique I don't know what the benefit is (potentially). And there's the drawback of using extra memory, which I'm also not sure how much extra memory it is.
So I want to use this opportunity to explore this setup of running perf experiments :D
Legal Boilerplate
Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. In 2022 this entity acquired Codecov and as result Sentry is going to need some rights from me in order to utilize my contributions in this PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.