Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use UI Queue runner for import #23669

Merged
merged 4 commits into from
Jun 6, 2022

Conversation

eileenmcnaughton
Copy link
Contributor

@eileenmcnaughton eileenmcnaughton commented Jun 2, 2022

Overview

Use UI Queue runner for import

Before

Faux ajax still times out the proxy setting

After

image

Technical Details

I got it working in the UI - can test later via CLI

Comments

This has a really long chain - most are less scary than they look. The ones that set the path for the rest are the ones that are separately open as PRs with tests

@civibot
Copy link

civibot bot commented Jun 2, 2022

(Standard links)

@civibot civibot bot added the master label Jun 2, 2022
@eileenmcnaughton eileenmcnaughton force-pushed the import_queue branch 2 times, most recently from 09f7d88 to 94bed93 Compare June 2, 2022 14:34
'errorMode' => CRM_Queue_Runner::ERROR_ABORT,
'onEndUrl' => CRM_Utils_System::url('civicrm/import/contact/summary', ['user_job_id' => $this->getUserJobID(), 'reset' => 1]),
]);
$runner->runAllViaWeb();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in a unit-test context, $runner->runAll() would be more appropriate than $runner->runAllViaWeb(). Maybe find a conditional/variable to pick the better runner??

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@totten - yeah that didn't work in the web UI for me - but ideally people could choose to run in background or foreground (setting? dependent or perhaps permission)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@totten ok - I just looked & yes - that is the cause of the test fails - it didn't work for me in the web without that though - the specific tests ACTUALLY test the form flow - so I guess

  1. the running of it should be moved back to the form
  2. the task to set up the queue should take some parameters to make it easier to offer backend as well as front end
  3. the tests need to catch the premature exit exception & run the rest of the queue themselves
  4. side note - in the csv api explorer (and in fact in the wmf custom import implementation - which is on it's way out) - batch size is a configuration option

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok - so what I'm thinking to try in conjunction with your PR is

  • if a queue job already exists with a certain name (civicrm_executor or something genecidal like that) THEN offer the option to process in the background
  • in this scenario we would create a queue - the same as now with a background config (error handling 'abort' ) and we would ALSO add an item to the pre-existing civicrm_executor to run that queue -(error handling 'discard' )
  • the tricky part is I'd need 'some sort of UI' that the user could be redirected to to see what is happening because they are still 'responsible' for checking it completes at this stage.

That's the flow I figure I'll try out @totten

Copy link
Member

@totten totten Jun 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, the batch size option sounds like a good idea.

the task to set up the queue should take some parameters to make it easier to offer backend as well as front end

Yeah, I think its good practice to decouple the "build queue" and "run queue" parts. A few other examples:

  • If you look at the core-upgrader, it knows how build a queue of upgrade tasks (CRM_Upgrade_Form::buildQueue()). Then there are a few frontends:
    • The web page civicrm/upgrade calls buildQueue() + runAllViaWeb()
    • The CLI drush civicrm-upgrade-db calls buildQueue() + runAll()
    • The CLI cv upgrade:db calls buildQueue() and its own ConsoleQueueRunner (which just shows more colorful output and which respects the -verbose flag).
  • If you look at ext-upgrader, it knows how to build a queue of upgrade tasks (CRM_Extension_Upgrades::createQueue()). Then there are a few frontends:
    • The web page civicrm/admin/extensions/upgrade calls createQueue() + runAllViaWeb()
    • The API Extension.upgrade calls createQueue() + runAll().

(EDIT) So in this case - the queue() function could return the prepared CRM_Queue_Queue object. Then the web-based controller could fire runAllViaWeb(), and the headless unit-test could fire runAll().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • if a queue job already exists with a certain name (civicrm_executor or something genecidal like that) THEN offer the option to process in the background
  • in this scenario we would create a queue - the same as now with a background config (error handling 'abort' ) and we would ALSO add an item to the pre-existing civicrm_executor to run that queue -(error handling 'discard' )

OK, so for the current PR, the original scope is spot-on (ie get it working with the foreground AJAX runner). This is going into follow-up work, right?

FWIW, coworker doesn't need a civicrm_executor like that -- its standard behavior is to pick up on any queue which matches runner IS NOT NULL and status = 'active'.

I would note that civicrm_queue has some options that you might call obscure tunables (eg lease_time, retry_limit, retry_interval) -- things which should have safe defaults, but which sysadmins might adjust in some edge-cases. A queue template could serve as both configuration-element and flag-record. Ex:

// Register queue-template
\Civi\Api4\Queue::create()
  ->setValues([
    'name' => 'import/template',
    'type' => 'SqlParallel',
    'runner' => 'task',
    'status' => 'template', /* ?? OR: 'is_template' => TRUE, ?? */
    'error' => 'abort',
    'lease_time' => 5*60,
    'retry_limit' => 0,
  ]),
  ->execute();

Then when it's time to prepare queued tasks, conditionally copy the template:

if (queue('import/template') is an active template) {
  // Found template. Run in background.
  $q = Civi::queue("import/{$jobId}", ['tpl' => 'import/template', 'status' => 'draft']);
  Importer::buildQueue($q);
  $q->setStatus('active'); /* Background worker will take over */
}
else {
  // No template. Run in foreground.
  $q = Civi::queue("import/{$jobId}", ['type' => 'Sql', 'error' => 'abort']);
  Importer::buildQueue($q);
  $r = new CRM_Queue_Runner(...);
  $r->runAllViaWeb();
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@totten I'm trying to use this use case to also test & QA your PR - so yeah - I have pulled down your PR & am trying to get it to add the background queue option in conjunction with that - but I've gonna have to have a snooze & come back to it - about 50% of my brain power won't be released until #23666 is merged & I can rebase this over it & another 45% seems to be taken up by lack of sleep so if I can chase the rabbit into the cage I'm gonna see if I can take a kip & free up some of the 45%

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

& it has been merged - I rebased this

@eileenmcnaughton eileenmcnaughton force-pushed the import_queue branch 4 times, most recently from c948a76 to 27bbff8 Compare June 3, 2022 03:19
@eileenmcnaughton
Copy link
Contributor Author

Same test just passed for me locally - re-running to see impact of merging the Queue PR in the meantime - I might have written to that spec?

I also rebased in a commit that moves around the tpl file a bit - it is technically simple but part of the contact flow r-run

image

@eileenmcnaughton eileenmcnaughton force-pushed the import_queue branch 4 times, most recently from 8734f23 to fd9a1df Compare June 3, 2022 07:14
@eileenmcnaughton
Copy link
Contributor Author

Those tests are NOT failing for me locally in isolation - I have added to the tearDown in case it is that. I also have run the whole class CRM_Contact_Import_Parser_ContactTest locally - it's definitely something about test leakage - hopefully the thing I just tried

@eileenmcnaughton
Copy link
Contributor Author

Dang - still passing locally - failing in the suite

image

@eileenmcnaughton
Copy link
Contributor Author

@totten this part is passing - I'm very keen to work on the queue issue more tomorrow & keep the momentum over the next few days - but I think we can merge this

@eileenmcnaughton
Copy link
Contributor Author

I originally thought this UI bit would be a fairly straight forward merge & once merged we could focus on the background processing - but I think we are treating that as a blocker to merging this - so I've moved some of the parts of this to other PRs so they don't block me cleaning up the other imports - I have rebased that into this PR - hence the rebase & the PR having become 'bigger' again

Additional minor code simplification

Tear down fix

Maybe the var name needs to match in the CI php version
@totten
Copy link
Member

totten commented Jun 5, 2022

(@eileenmcnaughton) I originally thought this UI bit would be a fairly straight forward merge & once merged we could focus on the background processing - but I think we are treating that as a blocker to merging this

I agree with your original thought -- the option for background processing should be a follow-up issue.

Since this has been rebased a couple times since I last did an r-run -- I suppose I should do another (light) r-run? Then merge-on-pass -- and finally (mostly) rely on the RC testers?

@eileenmcnaughton
Copy link
Contributor Author

@totten yeah - I think this is mostly about getting it out of the way so we can focus on the background processing issue - I feel like we are close enough to that that we could get it in before we branch & that if we do it will really make a more 'compelling' case for rc testing

As I mentioned @MegaphoneJon has indicated some rc resource from one of his clients

@eileenmcnaughton
Copy link
Contributor Author

@totten I've also planning to document how the import flow is working - although I wanted to get the last few entities switched over to it first & then I can - there is a PR up for the membership import & while there are 3 more they become increasingly simply once that is merged

@totten
Copy link
Member

totten commented Jun 6, 2022

@eileenmcnaughton Since this removes a couple inherited members, and since the class-hierarchy is somewhat shared by different importers, I was curious if any of those properties would be an issue. Some (like disableUSPS and _newRelatedContacts, getAllFields) clearly don't have any more references.

There's a couple which I'm not sure about - maybe you could skim these grep results:

[bknix-max:~/bknix/build/dmaster/web/sites/all/modules/civicrm] git grep _onDuplicate
CRM/Contact/Import/ImportJob.php:  protected $_onDuplicate;
CRM/Contact/Import/Parser/Contact.php:          if (empty($val) && !is_numeric($val) && $this->_onDuplicate == CRM_Import_Parser::DUPLICATE_FILL) {
CRM/Contribute/Import/Form/MapField.php:            $self->_onDuplicate != CRM_Import_Parser::DUPLICATE_UPDATE
CRM/Contribute/Import/Form/MapField.php:          elseif ($self->_onDuplicate == CRM_Import_Parser::DUPLICATE_UPDATE &&
CRM/Contribute/Import/Form/MapField.php:    $this->_onDuplicate = $this->getSubmittedValue('onDuplicate');
CRM/Contribute/Import/Form/MapField.php:    if ($this->_onDuplicate == CRM_Import_Parser::DUPLICATE_UPDATE) {
CRM/Contribute/Import/Form/MapField.php:    elseif ($this->_onDuplicate == CRM_Import_Parser::DUPLICATE_SKIP) {
CRM/Contribute/Import/Form/MapField.php:    $contactORContributionId = $self->_onDuplicate == CRM_Import_Parser::DUPLICATE_UPDATE ? 'contribution_id' : 'contribution_contact_id';
CRM/Contribute/Import/Form/MapField.php:      if ($self->_onDuplicate == CRM_Import_Parser::DUPLICATE_UPDATE) {
CRM/Custom/Import/Form/MapField.php:    $this->_onDuplicate = $this->get('onDuplicate');
CRM/Event/Import/Form/MapField.php:    $this->_onDuplicate = $this->get('onDuplicate');
CRM/Event/Import/Form/MapField.php:    if ($this->_onDuplicate == CRM_Import_Parser::DUPLICATE_UPDATE) {
CRM/Event/Import/Form/MapField.php:    elseif ($this->_onDuplicate == CRM_Import_Parser::DUPLICATE_SKIP ||
CRM/Event/Import/Form/MapField.php:      $this->_onDuplicate == CRM_Import_Parser::DUPLICATE_NOCHECK
CRM/Event/Import/Form/MapField.php:            if ($self->_onDuplicate == CRM_Import_Parser::DUPLICATE_UPDATE) {
CRM/Member/Import/Form/MapField.php:    $this->_onDuplicate = $this->get('onDuplicate', $onDuplicate ?? "");
CRM/Member/Import/Form/MapField.php:    if ($this->_onDuplicate == CRM_Import_Parser::DUPLICATE_UPDATE) {
CRM/Member/Import/Form/MapField.php:    elseif ($this->_onDuplicate == CRM_Import_Parser::DUPLICATE_SKIP) {
CRM/Member/Import/Form/MapField.php:              $self->_onDuplicate != CRM_Import_Parser::DUPLICATE_UPDATE

[bknix-max:~/bknix/build/dmaster/web/sites/all/modules/civicrm] git grep _dedupeRuleGroupID
tests/phpunit/CRM/Contact/Import/Parser/ContactTest.php:    $parser->_dedupeRuleGroupID = $ruleGroupId;

@eileenmcnaughton
Copy link
Contributor Author

@totten - the other imports share a lot of code with each other - but not with the contact import

I've pushed up a PR to remove the remaining references to onDuplicate as they affect the contact import

$totalRowCount = $totalRows = $dataSource->getRowCount(['new']);
$queue = Civi::queue('user_job_' . $this->getUserJobID(), ['type' => 'Sql', 'error' => 'abort']);
$offset = 0;
$batchSize = 5;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eileenmcnaughton I assume the small batch size (5) is to facilitate inspection (make it easy to watch the batching mechanism)? There's a separate/follow-up issue to extract this to a setting? (That's fine - not a blocker.)

But if we wind up with a constant-value in the stable-release -- it should probably be a bit higher (like... 20 or 100 or 500 or 1000...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@totten yeah - I think the batch size will be an option on the first page of the import

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably go for around 50 as a default - that should import in 30 seconds on most sites

Copy link
Contributor

@andyburnsco andyburnsco Aug 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like there may be sweet spot for it, I seemed to use 50 to 500 in API CSV Importer. 5 I get for testing but doing imports now and it makes it quite slow. I will try switching that hard coded limit around and see.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - let us know what you find

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I played around with some guesstimates about what determines the max batch limit. For example, one might assume that:

  • Each PHP worker is limited to 64mb RAM (the baseline required by Civi installer)
  • Each HTTP request is bound by the default timeout specified by apache.org and nginx.org (ie 60s)
    • Per php.net, the default timeout for a PHP request is actually lower (30s), but most importers call set_time_limit(0).
  • Each import-task requires an average 100ms runtime and 100kb memory.

The gist is a bit fiddly, and (while fiddling) the projected limits ranged somewhere around 300 - 600.

For a hard-coded/application-level default, I'd probably vote for 100...

@totten totten added the merge ready PR will be merged after a few days if there are no objections label Jun 6, 2022
@totten
Copy link
Member

totten commented Jun 6, 2022

@eileenmcnaughton OK, I've done a bit more r-run. It was by no means exhaustive, but I did check some bits like "Add to existing group" and "Add to new tag" -- and I checked a few other importers (eg Contributions, Activities). It seems work (at least well as before). There are follow-up tasks/testing, and they don't need to block this PR.

Merge ready.

@eileenmcnaughton
Copy link
Contributor Author

Erm - @totten what does merge-ready mean - is that merge-on-pass or we put things on hold for a few days now?

@totten
Copy link
Member

totten commented Jun 6, 2022

@eileenmcnaughton If you're happy with the grep-results (eg w/onDuplicate update) and if the plan is to address batch-size separately, then it's merge-on-pass.

If there are any more tweaks you want first (like bumping default batch-size to 50), then do that (and then consider it merge-on-pass).

@eileenmcnaughton
Copy link
Contributor Author

@totten hmm - OK - re batch size - I could go either way - but my suspicion is it might be easier to test the background processing with it at 5 than 50 & then bump it (if I haven't added a checkbox by then)

@eileenmcnaughton
Copy link
Contributor Author

Tests all failed because I didn't update the tests to reflect the code was no longer in use - Ive done that now

@eileenmcnaughton eileenmcnaughton added merge on pass and removed merge ready PR will be merged after a few days if there are no objections labels Jun 6, 2022
@eileenmcnaughton eileenmcnaughton merged commit 2b19182 into civicrm:master Jun 6, 2022
@eileenmcnaughton eileenmcnaughton deleted the import_queue branch June 6, 2022 04:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants