-
-
Notifications
You must be signed in to change notification settings - Fork 824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use UI Queue runner for import #23669
Use UI Queue runner for import #23669
Conversation
(Standard links)
|
09f7d88
to
94bed93
Compare
CRM/Import/Parser.php
Outdated
'errorMode' => CRM_Queue_Runner::ERROR_ABORT, | ||
'onEndUrl' => CRM_Utils_System::url('civicrm/import/contact/summary', ['user_job_id' => $this->getUserJobID(), 'reset' => 1]), | ||
]); | ||
$runner->runAllViaWeb(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in a unit-test context, $runner->runAll()
would be more appropriate than $runner->runAllViaWeb()
. Maybe find a conditional/variable to pick the better runner??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@totten - yeah that didn't work in the web UI for me - but ideally people could choose to run in background or foreground (setting? dependent or perhaps permission)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@totten ok - I just looked & yes - that is the cause of the test fails - it didn't work for me in the web without that though - the specific tests ACTUALLY test the form flow - so I guess
- the running of it should be moved back to the form
- the task to set up the queue should take some parameters to make it easier to offer backend as well as front end
- the tests need to catch the premature exit exception & run the rest of the queue themselves
- side note - in the csv api explorer (and in fact in the wmf custom import implementation - which is on it's way out) - batch size is a configuration option
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok - so what I'm thinking to try in conjunction with your PR is
- if a queue job already exists with a certain name (civicrm_executor or something genecidal like that) THEN offer the option to process in the background
- in this scenario we would create a queue - the same as now with a background config (error handling 'abort' ) and we would ALSO add an item to the pre-existing
civicrm_executor
to run that queue -(error handling 'discard' ) - the tricky part is I'd need 'some sort of UI' that the user could be redirected to to see what is happening because they are still 'responsible' for checking it completes at this stage.
That's the flow I figure I'll try out @totten
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, the batch size option sounds like a good idea.
the task to set up the queue should take some parameters to make it easier to offer backend as well as front end
Yeah, I think its good practice to decouple the "build queue" and "run queue" parts. A few other examples:
- If you look at the core-upgrader, it knows how build a queue of upgrade tasks (
CRM_Upgrade_Form::buildQueue()
). Then there are a few frontends:- The web page
civicrm/upgrade
callsbuildQueue()
+runAllViaWeb()
- The CLI
drush civicrm-upgrade-db
callsbuildQueue()
+runAll()
- The CLI
cv upgrade:db
callsbuildQueue()
and its ownConsoleQueueRunner
(which just shows more colorful output and which respects the-v
erbose flag).
- The web page
- If you look at ext-upgrader, it knows how to build a queue of upgrade tasks (
CRM_Extension_Upgrades::createQueue()
). Then there are a few frontends:- The web page
civicrm/admin/extensions/upgrade
callscreateQueue()
+runAllViaWeb()
- The API
Extension.upgrade
callscreateQueue()
+runAll()
.
- The web page
(EDIT) So in this case - the queue()
function could return the prepared CRM_Queue_Queue
object. Then the web-based controller could fire runAllViaWeb()
, and the headless unit-test could fire runAll()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- if a queue job already exists with a certain name (civicrm_executor or something genecidal like that) THEN offer the option to process in the background
- in this scenario we would create a queue - the same as now with a background config (error handling 'abort' ) and we would ALSO add an item to the pre-existing civicrm_executor to run that queue -(error handling 'discard' )
OK, so for the current PR, the original scope is spot-on (ie get it working with the foreground AJAX runner). This is going into follow-up work, right?
FWIW, coworker
doesn't need a civicrm_executor
like that -- its standard behavior is to pick up on any queue which matches runner IS NOT NULL and status = 'active'
.
I would note that civicrm_queue
has some options that you might call obscure tunables (eg lease_time
, retry_limit
, retry_interval
) -- things which should have safe defaults, but which sysadmins might adjust in some edge-cases. A queue template could serve as both configuration-element and flag-record. Ex:
// Register queue-template
\Civi\Api4\Queue::create()
->setValues([
'name' => 'import/template',
'type' => 'SqlParallel',
'runner' => 'task',
'status' => 'template', /* ?? OR: 'is_template' => TRUE, ?? */
'error' => 'abort',
'lease_time' => 5*60,
'retry_limit' => 0,
]),
->execute();
Then when it's time to prepare queued tasks, conditionally copy the template:
if (queue('import/template') is an active template) {
// Found template. Run in background.
$q = Civi::queue("import/{$jobId}", ['tpl' => 'import/template', 'status' => 'draft']);
Importer::buildQueue($q);
$q->setStatus('active'); /* Background worker will take over */
}
else {
// No template. Run in foreground.
$q = Civi::queue("import/{$jobId}", ['type' => 'Sql', 'error' => 'abort']);
Importer::buildQueue($q);
$r = new CRM_Queue_Runner(...);
$r->runAllViaWeb();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@totten I'm trying to use this use case to also test & QA your PR - so yeah - I have pulled down your PR & am trying to get it to add the background queue option in conjunction with that - but I've gonna have to have a snooze & come back to it - about 50% of my brain power won't be released until #23666 is merged & I can rebase this over it & another 45% seems to be taken up by lack of sleep so if I can chase the rabbit into the cage I'm gonna see if I can take a kip & free up some of the 45%
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
& it has been merged - I rebased this
c948a76
to
27bbff8
Compare
8734f23
to
fd9a1df
Compare
Those tests are NOT failing for me locally in isolation - I have added to the |
@totten this part is passing - I'm very keen to work on the queue issue more tomorrow & keep the momentum over the next few days - but I think we can merge this |
5624e98
to
d604210
Compare
d604210
to
27d0ded
Compare
I originally thought this UI bit would be a fairly straight forward merge & once merged we could focus on the background processing - but I think we are treating that as a blocker to merging this - so I've moved some of the parts of this to other PRs so they don't block me cleaning up the other imports - I have rebased that into this PR - hence the rebase & the PR having become 'bigger' again |
27d0ded
to
df0f2c9
Compare
Additional minor code simplification Tear down fix Maybe the var name needs to match in the CI php version
df0f2c9
to
3592a5e
Compare
I agree with your original thought -- the option for background processing should be a follow-up issue. Since this has been rebased a couple times since I last did an |
@totten yeah - I think this is mostly about getting it out of the way so we can focus on the background processing issue - I feel like we are close enough to that that we could get it in before we branch & that if we do it will really make a more 'compelling' case for rc testing As I mentioned @MegaphoneJon has indicated some rc resource from one of his clients |
@totten I've also planning to document how the import flow is working - although I wanted to get the last few entities switched over to it first & then I can - there is a PR up for the membership import & while there are 3 more they become increasingly simply once that is merged |
@eileenmcnaughton Since this removes a couple inherited members, and since the class-hierarchy is somewhat shared by different importers, I was curious if any of those properties would be an issue. Some (like There's a couple which I'm not sure about - maybe you could skim these grep results:
|
@totten - the other imports share a lot of code with each other - but not with the contact import I've pushed up a PR to remove the remaining references to |
6f93873
to
77c96d8
Compare
$totalRowCount = $totalRows = $dataSource->getRowCount(['new']); | ||
$queue = Civi::queue('user_job_' . $this->getUserJobID(), ['type' => 'Sql', 'error' => 'abort']); | ||
$offset = 0; | ||
$batchSize = 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eileenmcnaughton I assume the small batch size (5
) is to facilitate inspection (make it easy to watch the batching mechanism)? There's a separate/follow-up issue to extract this to a setting? (That's fine - not a blocker.)
But if we wind up with a constant-value in the stable-release -- it should probably be a bit higher (like... 20 or 100 or 500 or 1000...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@totten yeah - I think the batch size will be an option on the first page of the import
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably go for around 50 as a default - that should import in 30 seconds on most sites
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like there may be sweet spot for it, I seemed to use 50 to 500 in API CSV Importer. 5 I get for testing but doing imports now and it makes it quite slow. I will try switching that hard coded limit around and see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - let us know what you find
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I played around with some guesstimates about what determines the max batch limit. For example, one might assume that:
- Each PHP worker is limited to 64mb RAM (the baseline required by Civi installer)
- Each HTTP request is bound by the default timeout specified by
apache.org
andnginx.org
(ie 60s)- Per
php.net
, the default timeout for a PHP request is actually lower (30s), but most importers callset_time_limit(0)
.
- Per
- Each import-task requires an average 100ms runtime and 100kb memory.
The gist is a bit fiddly, and (while fiddling) the projected limits ranged somewhere around 300 - 600.
For a hard-coded/application-level default, I'd probably vote for 100...
@eileenmcnaughton OK, I've done a bit more Merge ready. |
Erm - @totten what does merge-ready mean - is that merge-on-pass or we put things on hold for a few days now? |
@eileenmcnaughton If you're happy with the grep-results (eg w/onDuplicate update) and if the plan is to address batch-size separately, then it's merge-on-pass. If there are any more tweaks you want first (like bumping default batch-size to 50), then do that (and then consider it merge-on-pass). |
@totten hmm - OK - re batch size - I could go either way - but my suspicion is it might be easier to test the background processing with it at 5 than 50 & then bump it (if I haven't added a checkbox by then) |
0b83ca4
to
b75fe83
Compare
Tests all failed because I didn't update the tests to reflect the code was no longer in use - Ive done that now |
Overview
Use UI Queue runner for import
Before
Faux ajax still times out the proxy setting
After
Technical Details
I got it working in the UI - can test later via CLI
Comments
This has a really long chain - most are less scary than they look. The ones that set the path for the rest are the ones that are separately open as PRs with tests