-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes #38169 - Fix operatingsystem race condition with concurrent registrations #10425
Fixes #38169 - Fix operatingsystem race condition with concurrent registrations #10425
Conversation
I also just noticed this: The primary key may auto-increment on each create, even if it fails. This can accelerate the problem of running out of integers, if the underlying table is still stuck on a primary key of type int (note: All Rails apps since 5.1+ have defaulted to bigint, which is not liable to this problem). If this solution is accepted, I probably will need to add a migration to change the operatingsystem |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
APJ with a test and migration
9df388e
to
6c6953a
Compare
Updated with test and also with an additional |
e245b4e
to
d298858
Compare
The test failures are related :/ |
d298858
to
0db5fb2
Compare
🍏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK
…istrations Refs #38169 - add test
0db5fb2
to
4af85c4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @jeremylenz and @parthaa !
While testing concurrent Katello registrations in batches we see the following error leading to some hosts (2, 3 usually) with the same OS version failing to register:
After these errors, the rest of registrations work without problems.
The error occurs with 50 concurrent registrations
but does not occur with >120 concurrent registrations.This PR switches from
find_or_create_by
tocreate_or_find_by
. Per the Active Record documentation:Attempts to create a record with the given attributes in a table that has a unique database constraint on one or several of its columns. If a row already exists with one or several of these unique constraints, the exception such an insertion would normally raise is caught, and the existing record with those attributes is found using #find_by!.
This is similar to #find_or_create_by, but avoids the problem of stale reads between the SELECT and the INSERT, as that method needs to first query the table, then attempt to insert a row if none is found.
In our testing we found
create_or_find_by
would successfully avoid the unique constraint error, but then becauseThis method will return a record if all given attributes are covered by unique constraints (unless the INSERT -> DELETE -> SELECT race condition is triggered), but if creation was attempted and failed due to validation errors it won’t be persisted, you get what #create returns in such situation.
the Rails validation for
:name
would cause it to return an unpersisted object with errors, and causeWe get around this with the second code change here, doing another
find_by
.With these two changes, Pablo says he ran the test 15 times and did not get any errors.
Considerations
The tradeoff with this solution is that you may end up with some (harmless) errors in your Postgres logs.
Another possible approach is to use
active_record_retry
from Katello: https://github.com/Katello/katello/blob/882b257afce348d1795db28fce81611543287b9e/app/lib/katello/util/support.rb#L96But that would require moving that code to Foreman, and we have not yet tested that it would work as well as this.