Support Updating Counter Cache After Commit (to avoid deadlocks) #263

jbritten · 2019-07-01T17:22:02Z

Support updating the counter cache after commit (outside the primary transaction) where the SQL UPDATE calls would be less susceptible to deadlocks.

Let's say we've got a Campaign model which can have many Subscribers. We store lots of campaign metrics in a separate CampaignMetrics summary table. When a subscriber confirms their email address we may see a MySQL transaction such as the following, which illustrates counter_culture updating various counts as well as additional proprietary work being performed:

BEGIN
SELECT 1 AS one FROM `subscribers` WHERE (`subscribers`.`email` = '[email protected]' AND `subscribers`.`campaign_id` = 1001) LIMIT 1
SELECT `campaigns`.* FROM `campaigns` WHERE `campaigns`.`id` = 1001 LIMIT 1
UPDATE `subscribers` SET `confirmed_at` = '2019-07-01 16:34:04', `status` = 'confirmed', `updated_at` = '2019-07-01 16:34:04' WHERE `subscribers`.`id` = 2001
SELECT `campaign_metrics`.* FROM `campaign_metrics` WHERE `campaign_metrics`.`campaign_id` = 1001 LIMIT 1
UPDATE `campaign_metrics` SET `confirmed_subscribers_count` = COALESCE(`confirmed_subscribers_count`, 0) + 1 WHERE `campaign_metrics`.`id` = 1001
SELECT `campaigns`.* FROM `campaigns` WHERE `campaigns`.`id` = 1001 LIMIT 1
SELECT `campaign_metrics`.* FROM `campaign_metrics` WHERE `campaign_metrics`.`campaign_id` = 1001 LIMIT 1
UPDATE `campaign_metrics` SET `unconfirmed_subscribers_count` = COALESCE(`unconfirmed_subscribers_count`, 0) - 1 WHERE `campaign_metrics`.`id` = 1001
 ... (some additional proprietary work) ...
COMMIT

There are other actions a Subscriber could do which will also update data in the CampaignMetrics table. For example, opening an email received would update the email_opens_count on the CampaignMetrics table with a transaction such as the following:

BEGIN
SELECT 1 AS one FROM `subscribers` WHERE (`subscribers`.`email` = '[email protected]' AND `subscribers`.`campaign_id` = 1001) LIMIT 1
SELECT `campaigns`.* FROM `campaigns` WHERE `campaigns`.`id` = 1001 LIMIT 1
SELECT `campaign_metrics`.* FROM `campaign_metrics` WHERE `campaign_metrics`.`campaign_id` = 1001 LIMIT 1
UPDATE `campaign_metrics` SET `email_opens_count` = COALESCE(`email_opens_count`, 0) + 1 WHERE `campaign_metrics`.`id` = 1001
 ... (some additional proprietary work) ...
COMMIT

Now, at scale when many concurrent activities are occurring, such as many subscribers confirming their email address, opening emails, clicking emails, etc. deadlocks such as the following can occur when updating the CampaignMetrics summary table:

------------------------
LATEST DETECTED DEADLOCK
------------------------
2019-07-01 13:17:44 2b373c029700
*** (1) TRANSACTION:
TRANSACTION 2384728875, ACTIVE 0 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 5 lock struct(s), heap size 376, 2 row lock(s), undo log entries 2
MySQL thread id 1358, OS thread handle 0x2b34803cb700, query id 8039221 10.0.102.43 deploy updating
UPDATE `campaign_metrics` SET `confirmed_subscribers_count` = COALESCE(`confirmed_subscribers_count`, 0) + 1 WHERE `campaign_metrics`.`id` = 1001
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 663 page no 110 n bits 109 index `PRIMARY` of table `database_production`.`campaign_metrics` trx id 2384728875 lock_mode X locks rec but not gap waiting

Yes, SQL calls to update the counter cache are atomic; however, transactions must be short (lock fewer rows for the smallest amount of time as much as possible). Supporting an optional configuration flag to execute updating the counter cache after commit would yield the following SQL, which would be much less susceptible to deadlocks:

BEGIN
SELECT 1 AS one FROM `subscribers` WHERE (`subscribers`.`email` = '[email protected]' AND `subscribers`.`campaign_id` = 1001) LIMIT 1
SELECT `campaigns`.* FROM `campaigns` WHERE `campaigns`.`id` = 1001 LIMIT 1
UPDATE `subscribers` SET `confirmed_at` = '2019-07-01 16:34:04', `status` = 'confirmed', `updated_at` = '2019-07-01 16:34:04' WHERE `subscribers`.`id` = 2001
 ... (some additional proprietary work) ...
COMMIT
BEGIN
SELECT `campaign_metrics`.* FROM `campaign_metrics` WHERE `campaign_metrics`.`campaign_id` = 1001 LIMIT 1
UPDATE `campaign_metrics` SET `confirmed_subscribers_count` = COALESCE(`confirmed_subscribers_count`, 0) + 1 WHERE `campaign_metrics`.`id` = 1001
COMMIT

The text was updated successfully, but these errors were encountered:

magnusvk · 2019-07-02T22:00:49Z

Cross-posting this from #120 to explain why this was removed:

Interestingly, calling the counter update after commit was first added because of deadlock issues with Postgres. It's somewhat ironic that it's now MySQL causing a similar issue.

In any case, the history of removing this comes from a bugfix to allow multiple saves in one transaction, see 81dfbf5. Plus, in the absence of issues like this with the database layer, I don't see why you'd ever want to push the counter cache update outside of the transaction. But then again, you are seeing database issues.

If you can figure out how to write a test for this behavior, I think I'd probably be down to re-add this functionality, given that the original code for this was quite straightforward.

This is the first I'm hearing of this issue, so not sure how widespread it is, but of course still would be good to figure out how to address this for you.

magnusvk · 2019-07-02T22:05:31Z

I'm trying to think how we could add a test for this—probably with a test model with different callbacks? Have one after_save callback that runs before the commit and stores the current value of the counter cache in an ivar. Then we can test that ivar to make sure the counter doesn't change before the commit, and test that it changed after the commit normally in the test?

xtagon · 2019-08-27T16:02:40Z

I'm seeing similar deadlocks (Postgres) in my app. They're infrequent and I'm still trying to figure out how to reproduce in a test.

magnusvk · 2019-09-17T02:39:05Z

Just for the record—if we can come up with a sensible PR that adds this back as an option, I'm down to merge that. But I won't have time to work on it myself.

avit · 2021-02-03T07:01:45Z

I don't think this problem is unique to CounterCulture, or even caused by it. Still, this could be made an option I suppose, or deferred to before commit like touch: true does.

@jbritten, if you can make your association updates happen in a consistent order (generally, starting from the lowest model in your belonging hierarchy and going up) then you can eliminate those deadlocks. (I wrote a blog post about Transaction deadlocks on ActiveRecord associations in case it helps anyone else debug these.)

In your example transaction you have:

BEGIN
UPDATE `subscribers` ... WHERE `subscribers`.`id` = 2001   -- lock A
UPDATE `campaign_metrics` ... WHERE `campaign_metrics`.`id` = 1001   -- lock B
UPDATE `campaign_metrics` ... WHERE `campaign_metrics`.`id` = 1001
COMMIT

This is the order of the row locks, from when rows are first updated in the transaction (the third update doesn't matter, as it reuses an existing lock). You must have another transaction elsewhere that is updating these rows in the opposite order. e.g.

BEGIN
UPDATE `campaign_metrics` ... WHERE `campaign_metrics`.`id` = 1001   -- lock B
UPDATE `subscribers` ... WHERE `subscribers`.`id` = 2001   -- lock A
COMMIT

I'm calling this second one "out of order", since if I'm assuming correctly, your belonging hierarchy starts from subscriber and belongs_to :campaign_metric. That's the common sequence you should follow for best results. You can fix this either by changing the update order if you can, or else by explicitly adding an earlier subscriber.lock! / Subscriber.lock.find(id) to claim a lock on that row first:

BEGIN
SELECT * FROM `subscribers` WHERE `subscribers`.`id` = 2001 FOR UPDATE   -- lock A
UPDATE `campaign_metrics` ... WHERE `campaign_metrics`.`id` = 1001   -- lock B
UPDATE `subscribers` ... WHERE `subscribers`.`id` = 2001
COMMIT

ActiveRecord has a bug related to this when using touch: true, which also executes within the transaction but with several records in indeterminate order.

lightyrs · 2021-02-04T03:58:40Z

@avit

Very interesting blog post and rails PR.
I'm wondering if the same logic of your PR could be applied specifically to the transaction logic of this gem.

xhs345 · 2021-02-26T21:09:40Z

I would also like to see the after_commit option being added back in, since this was for me the original reason to consider this library.

In our case we have the following deadlock scenario (based on my current understanding):

Long running job A starts transaction
Short job B starts to update some fields on users table (using update_all), but gets stuck because of A
Job A inserts data which triggers an update on the users table counter cache.
Deadlock occurs and in some cases B gets rolled back

magnusvk · 2021-03-02T19:38:13Z

Hey guys—I took a quick stab at adding this back, see #309. Let's see if the tests go green on that, and I'd love some extra eyes on that PR, too.

jbritten · 2021-03-03T21:06:56Z

Hey @magnusvk, I really appreciate you taking the time to add this back! I'm testing your 'execute-after-commit' branch and getting the following error:

! Unable to load application: LoadError: cannot load such file -- after_commit_action

Looks like a missing dependency.

magnusvk · 2021-03-04T01:14:08Z

@jbritten so the thing is that you only need that gem if you set the execute_after_commit option to true, so I don't want to make it a straight-up gem dependency. I just pushed a commit that adds a more helpful error message. But the upshot is that you'll manually have to include the after_commit_action gem in your dependencies.

jbritten · 2021-03-05T18:13:40Z

@magnusvk got it; thanks for the more helpful error message. I've been running this branch in staging for 2 days and seems to be working as expected.

magnusvk · 2021-03-17T01:41:20Z

@jbritten any problems with this branch? I’m thinking I should merge and release as it seems to be working.

jbritten · 2021-03-17T01:44:31Z

@magnusvk I've had the branch running in our production app for over a week and haven't encountered issues yet. I'd say go ahead and merge and release.

magnusvk · 2021-03-17T03:31:37Z

Awesome, thanks for the update. Just released this as gem version 2.8.0. See documentation here.

jbritten mentioned this issue Jul 1, 2019

Move counter cache update back into transaction #120

Closed

lightyrs mentioned this issue Feb 4, 2021

Ability to batch multiple counter cache updates for same row #307

Closed

magnusvk closed this as completed Mar 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Updating Counter Cache After Commit (to avoid deadlocks) #263

Support Updating Counter Cache After Commit (to avoid deadlocks) #263

jbritten commented Jul 1, 2019 •

edited

Loading

magnusvk commented Jul 2, 2019

magnusvk commented Jul 2, 2019

xtagon commented Aug 27, 2019

magnusvk commented Sep 17, 2019

avit commented Feb 3, 2021

lightyrs commented Feb 4, 2021 •

edited

Loading

xhs345 commented Feb 26, 2021

magnusvk commented Mar 2, 2021

jbritten commented Mar 3, 2021

magnusvk commented Mar 4, 2021

jbritten commented Mar 5, 2021

magnusvk commented Mar 17, 2021

jbritten commented Mar 17, 2021

magnusvk commented Mar 17, 2021

Support Updating Counter Cache After Commit (to avoid deadlocks) #263

Support Updating Counter Cache After Commit (to avoid deadlocks) #263

Comments

jbritten commented Jul 1, 2019 • edited Loading

magnusvk commented Jul 2, 2019

magnusvk commented Jul 2, 2019

xtagon commented Aug 27, 2019

magnusvk commented Sep 17, 2019

avit commented Feb 3, 2021

lightyrs commented Feb 4, 2021 • edited Loading

xhs345 commented Feb 26, 2021

magnusvk commented Mar 2, 2021

jbritten commented Mar 3, 2021

magnusvk commented Mar 4, 2021

jbritten commented Mar 5, 2021

magnusvk commented Mar 17, 2021

jbritten commented Mar 17, 2021

magnusvk commented Mar 17, 2021

jbritten commented Jul 1, 2019 •

edited

Loading

lightyrs commented Feb 4, 2021 •

edited

Loading