Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRM-21489 fix DatabaseObject code to retry when deadlock is met. #197

Merged
merged 1 commit into from
Mar 28, 2018

Conversation

eileenmcnaughton
Copy link
Contributor

@eileenmcnaughton eileenmcnaughton commented Nov 28, 2017

This is restoring previous functionality broken when we meddled with deadlocks.

The Database class has handling in it for deadlocks but some time ago, likely around 4.2, we introduced changes that cause it to be bypassed. When

$result = $DB->query($string)

is called & there is a deadlock, it used to return a PEAR_Error object. If the $result was a PEAR_Error then 3 retries were attempted.

However, we made changes to exception handling a while back (around 4.2 I think) that meant that an exception was thrown, by-passing the retry code.

Since this is an unmaintained (& previously hacked) package I think we just patch our version & I propose that we also add a Constant to allow sites to specify the number of retries. This is pretty low-level developer stuff so I don't propose a setting, although some helper text can go in civicrm.settings.php on new installs.

Note re doing retries - from mysql docs

https://dev.mysql.com/doc/refman/5.7/en/innodb-deadlocks-handling.html

"Deadlocks are a classic problem in transactional databases, but they are not dangerous unless they are so frequent that you cannot run certain transactions at all. Normally, you must write your applications so that they are always prepared to re-issue a transaction if it gets rolled back because of a deadlock."

Replication
Note I simulated a deadlock by hacking website_get api with this

function civicrm_api3_website_get($params) {
CRM_Core_DAO::executeQuery('START TRANSACTION');
CRM_Core_DAO::executeQuery('SELECT * FROM civicrm_email WHERE id = 24220 LOCK IN SHARE MODE');
sleep(2000);

and calling it from another browser


@agileware-fj
Copy link
Contributor

agileware-fj commented Nov 28, 2017

(CiviCRM Review Template DEL-1.0)

  • JIRA (r-jira)
    • PASS : The PR has a JIRA reference.
  • Test results (r-test)
    • PASS: The test results are all-clear.
  • Code quality (r-code)
    • PASS: The functionality, purpose, and style of the code seems clear+sensible.
  • Documentation (r-doc)
    • PASS: The changes do not require documentation.
    • COMMENTS: Consider adding documentation on the CIVICRM_DEADLOCK_RETRIES to the default civicrm settings file?
  • Maintainability (r-maint)
    • PASS: The change sufficiently improves test coverage, or the change is trivial enough that it does not require tests.
  • Run it (r-run)
    • ISSUE: Weird output and possibly breaks in error log due to php interpreting ((string . int) + int) . string instead of string . (int + int) . string; needs parenthesises.
    • Otherwise works as advertised. Tested by:
      • MySQL Console: START TRANSACTION; SELECT * FROM civicrm_email WHERE id=238 LOCK IN SHARE MODE; -- and don't commit/rollback/quit
      • Shell: drush cvapi Email.create id=238 [email protected]
      • Check error log for output.
  • User impact (r-user)
    • PASS: The change would be intuitive or unnoticeable for a majority of users who work with this feature.

It's probably also a good idea to throw a better described exception than re-throwing 'DB Error: Unknown Error'

if (!stristr($e->getCause()->getUserInfo(), 'nativecode=1205') || ($tries + 1) === $maxTries) {
throw $e;
}
CRM_Core_Error::debug_log_message('Retrying after deadlock hit on attempt ' . $tries + 1 . ' at query : ' . $string);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This didn't work properly for me with PHP 7 - . and + seem to have the same precendence, so the output was always 1 at query: ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah - the wrong thing in the log - so it should be like the line above?

@eileenmcnaughton eileenmcnaughton force-pushed the dead branch 2 times, most recently from d12daec to f1225ee Compare November 28, 2017 06:55
@eileenmcnaughton
Copy link
Contributor Author

@agileware-fj I have pushed fixes to address the issues that are in this repo (the settings file is in civicrm-core).

Note to self. Never call a branch 'dead'. issuing the following command is disconcerting:

git push -f eileen dead

@eileenmcnaughton
Copy link
Contributor Author

& here is the other PR

civicrm/civicrm-core#11336

I don't really know the convention for the template - editing it to 'pass' once the things in it have been addressed loses history, but otherwise it looks like the review is incomplete ... @totten @seanmadsen

@eileenmcnaughton eileenmcnaughton force-pushed the dead branch 3 times, most recently from 1c02211 to 57cf229 Compare November 28, 2017 22:28
throw $e;
}
if (($tries + 1) === $maxTries) {
$message = (stristr($dbErrorMessage, 'nativecode=1213') ? 'Database deadlock encountered' : 'Database lock encountered');
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bgm - I'm never sure if we are supposed to ts() exception messages. In this case it's not one we would hope the user saw except very rarely

This is restoring previous functionality broken when we meddled with deadlocks.
@agileware-fj
Copy link
Contributor

(CiviCRM Review Template DEL-1.0)

  • Run it (r-run)
    • PASS: Tested with:
      • MySQL Console: START TRANSACTION; SELECT * FROM civicrm_email WHERE id=238 LOCK IN SHARE MODE; -- and don't commit/rollback/quit
      • Shell: drush cvapi Email.create id=238 [email protected]
      • Check error log for output.

@agileware-fj
Copy link
Contributor

I think that's how I've seen it done elsewhere.

@eileenmcnaughton
Copy link
Contributor Author

We've been testing this pretty heavily on our staging too & it's had some internal review (hence minor changes). I tested it by running a query that took several minutes

CREATE TABLE hurt_my_server SELECT * FROM civicrm_email

and then in another session running a batch merge job. The latter hit the big query and had a couple of retries, then the big query ended and the batch merge finished.

@bhahumanists
Copy link

We're running this on production and it seems fine. I modified it to post in a slack channel for every retry, so I can see them happening in real-time. I've seen it loop multiple times when needed.

@eileenmcnaughton
Copy link
Contributor Author

I should also note we have had this in production since it was submitted.

@eileenmcnaughton
Copy link
Contributor Author

@colemanw @mlutfy could one of you make a call on merging this - it has had fairly good review / testing in comments above (including running through peak load on the WMF fundraiser last year)

@seamuslee001
Copy link
Contributor

AUG has been running this in our production for the last few weeks and have had good experience with this. i am a +1 for merging

@seamuslee001
Copy link
Contributor

(CiviCRM Review Template WORD-1.1)

@seamuslee001 seamuslee001 merged commit 6dcfe37 into civicrm:master Mar 28, 2018
@eileenmcnaughton eileenmcnaughton deleted the dead branch March 28, 2018 00:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants