-
Notifications
You must be signed in to change notification settings - Fork 367
Make database cleaner active record seeded deletion start/clean more threadsafe #9672
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make database cleaner active record seeded deletion start/clean more threadsafe #9672
Conversation
66662c7 to
5047426
Compare
lib/extensions/database_cleaner-activerecord-seeded_deletion.rb
Outdated
Show resolved
Hide resolved
lib/extensions/database_cleaner-activerecord-seeded_deletion.rb
Outdated
Show resolved
Hide resolved
|
|
||
| def start | ||
| Rails.logger.info "SeededDeletion strategy start" | ||
| Rails.logger.info "SeededDeletion strategy start" if defined?(Rails) && Rails.logger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious why you need the Rails.logger check - do we actually have cases with no logger present?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. I removed it.
5047426 to
dc70812
Compare
| connection.transaction(:requires_new => true) do | ||
| # Use a transaction with serializable isolation to prevent other threads | ||
| # from modifying the tables during deletion | ||
| connection.transaction(:requires_new => true, :isolation => :read_committed) do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to read_committed... it seems to prevent the errors we were seeing locally.
dc70812 to
d28c1f6
Compare
* Prevent other threads from modifying the tables we're deleting from
* Used read_committed since it's the lowest level that works
* Tested various isolations:
* read_uncommitted - works, but PG treats it as read_committed[1]
* read_committed - works
* repeatable_read - skipped
* serializable - works (highest isolation level)
* Don't allow changes to table_max_id_cache by other threads while we're
accessing it
[1] https://www.postgresql.org/docs/13/transaction-iso.html
"In PostgreSQL, you can request any of the four standard transaction isolation
levels, but internally only three distinct isolation levels are implemented,
i.e., PostgreSQL's Read Uncommitted mode behaves like Read Committed. This is
because it is the only sensible way to map the standard isolation levels to
PostgreSQL's multiversion concurrency control architecture."
This reverts commit 9c20c47. This puts back the changes reverted in ManageIQ#9684. This should allow us to verify the db setup/teardown in cypress on rails is no longer causing sporadic test failures in this area of code.
d28c1f6 to
d1d4cbe
Compare
| ); | ||
| } | ||
|
|
||
| // TODO: Aside from test that validates deletion, replace with a more reliable cleanup mechanism when ready |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes in is this file are reverts of #9684
| def self.table_max_id_cache=(table_id_hash) | ||
| @table_max_id_cache ||= table_id_hash | ||
| @mutex.synchronize do | ||
| @table_max_id_cache = table_id_hash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be cleaner as
| @table_max_id_cache = table_id_hash | |
| @table_max_id_cache ||= {} | |
| @table_max_id_cache.replace(table_id_hash) |
The reason is that if anyone has a reference to the "old" hash from the getter, then this would update that existing reference instead of creating a new one. Not a huge deal, but is a little safer.
[WIP] I'm still investigating some local failures when I run it a bunch of times.EDIT: The above failures appear to be sporadic failures in other specs and happen on master too.
EDIT: For now, I disabled the changes to the tenant test until this PR can be merged: #9684
I've been running the following in a loop and it seemed to fail 1 every 8-10 times before this PR. I noticed a BEGIN in a thread that never completed so it looks like it's waiting on a lock. In the failures I debugged, it looks like there was an API request that was being processed while it was cleaning the database tables. I suspect the UI had setup periodic API requests for notifications so even though the test is not currently driving the page, there was a possibility that other requests would be handled in other threads.
With the PR changes, I've run it probably 30-40 times and have yet to have the error return.
Details
Example error 1
from:
https://github.com/ManageIQ/manageiq-ui-classic/actions/runs/18583984959/job/52984103601 and
https://github.com/ManageIQ/manageiq-ui-classic/actions/runs/18568245125/job/52935140469
Example error 2
from: https://github.com/ManageIQ/manageiq-ui-classic/actions/runs/18553317637/job/52884994391?pr=9535