-
-
Notifications
You must be signed in to change notification settings - Fork 824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(dev/core#1304) Queues - Allow background worker to drive tasks via APIv4 #22762
Conversation
(Standard links)
|
2b221bc
to
0c118d7
Compare
13435c6
to
3b41c8e
Compare
jenkins, test this please |
3b41c8e
to
64cdd43
Compare
@eileenmcnaughton @artfulrobot @seamuslee001 I'm removing the "WIP/Draft" flag from here, rebased, fleshed out the description, and added a couple examples. This should be ripe for a round of review: |
More copy-editing on description. Added a pithier example https://gist.github.com/totten/168f15ccf7a1ab14a6a8b695f33074e8. |
I started digging into queuing today - a lot of the things I'm trying to make sense are around how it can (and can't be used) but one thing I hit that relates to the api in this PR is that I currently think claimItems is pretty confusing re dates. What I'm seeing is that the relevant fields are all datetime and that the date that was written when I tried to claim the item is converted into GMT - 12 hours ago - I think that is being handled at other places too - so it seems weird that the item is claimed before it was submitted but it may still work |
CRM/Queue/BAO/Queue.php
Outdated
@@ -18,7 +18,7 @@ | |||
/** | |||
* Track a list of known queues. | |||
*/ | |||
class CRM_Queue_BAO_Queue extends CRM_Queue_DAO_Queue { | |||
class CRM_Queue_BAO_Queue extends CRM_Queue_DAO_Queue implements \Civi\Test\HookInterface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is now \Civi\Core\HookInterface
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eileenmcnaughton Yup, that's prettier. Pushed update.
test this please |
2 similar comments
test this please |
test this please |
@totten running this locally I'm finding the hook is not invoked. I am runnning it on 5.48 patched with this PR - I'm somewhat inclined to merge this and keep testing on a generated tarball (on the understanding that the review might still require follow up fixes) as I'm not seeing anything that would adversely affect pre-existing functionality. My queue item IS being claimed - it just isn't doing anything as the hook is not firing |
Ah - system.flush did it |
OK - this is working for me as I think intended - in terms of usefulness I'm having issues though. What I have is a queue with a bunch of items - this is somewhat analagous to an import job
The hope I had was
I'm still working through this use case & having the api functions be usable in practice may well be out of the scope of this PR. |
Trying to recapture some points from MM discussion:
This could be done as a separate driver/type, eg Additionally, queue utilities (fork/branch and wait/merge) could address the same use-case. That might be more portable, though it may also be more verbose. Both of those can be pursued as additional/follow-up topics that don't change the API semantics here.
Yeah, we should have a small library of tasks/task-adapters -- lke "run api4", "run api5", "fork/branch subqueue", "wait/merge for subqueue". But this is also something that can be improved in additional/follow-up work.
We do need some different+well-defined patterns for error-handling. The retry mechanism here is fine -- but what happens when you exhaust all the retries (or don't use retries)? Some error-handling patterns that came up:
It's ideal for the queue-dev to choose a policy for their queue, but that leaves discussion topics re: (a) defaults and (b) how to indicate the policy. |
So when I tried to use this the debug issue I hit was that I was 'losing' queue items and their associated data and not realising - ie they failed out & disappeared. I don't think our default error_handling should be to remove data from the queue because in my experience I was unknowingly losing data.... In the Ajax UI the error_mode is set up when the queue is run - but I think the error handling is actually implicit to the queue config - not how it is run. The 2 error modes currently in the runner are
In the latter case it deletes and continues. At minimum I think we should make the error_mode part of the queue config and default to ERROR_ABORT such that people have to choose ERROR_CONTINUE deliberately (and understand the risk of data loss) Separately there is the question of whether to re-queue - I prefer being able to configure the queue name to which it is requeued (and in practice can see that we would sometimes requeue to a generic failed/dead_letter queue and in other cases to specific ones). That does raise the question of how that config would be stored Note that one the can-be-later things
|
cae4a9a
to
88cf0c0
Compare
The error policy stuff is WIP, but I rebased and added commits to define the schema for |
@totten sadly the upgrade script is conflicted now - maybe you should pull that schema change into a separate PR for a quick more if the rest is WIP? |
1eaddd0
to
cd22596
Compare
cd22596
to
0875af1
Compare
$claimAndRun('abort', 1); | ||
$this->assertEquals(FALSE, $queue->isActive()); | ||
break; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eileenmcnaughton This bit shows how the error=>delete
and error=>abort
lead to different outcomes (after a series of retries).
Does this need rebasing now the other is merged - I'll try r-running tomorrow |
Note: The return types are a bit wonky here. Unfortunately, as I recall, the original specification for all the methods here allowed backend-specific return types -- anti-standardized return-types.
Before: Whenever you `claimItem()` from the queue, it marks the item `release_time`. After: Whenever you `claimItem()` from the queue, it marks _both_ the `release_time` and the `run_count`. Comments: * This is the basis for enforcing a `retry_limit` policy. * This doesn't require any extra queries or joins - it fits into the existing update query.
Background: * A queue runner should call `releaseItem()` if it tries and aborts some task. * The `retry_interval` is defined as the extra time to wait before trying again. Before: The `releaseItem()` always releases for immediate execution. After: The `releaseItem()` checks `retry_interval`. If it's set, then it will add an extra delay before retrying.
Before: The lease-time is one of the following: 1. A value requested at runtime 2. The constant 3600 After: The lease-time is either supplied as 1. A value requested at runtime 2. A value configured for the queue 3. The constant 3600
Before ------ Each of the `CRM_Queue_Queue_*` drivers supports a set of methods for claiming/releasing one queue-item at a time (eg `claimItem()`, `releaseItem()`, `deleteItem()`). After ----- All drivers still support the same queue-item methods. Additionally, the `SqlParallel` driver supports batch-oriented equivalents (`claimItems()`, `deleteItems()`, etc). Comments -------- I initially looked at updating all of the drivers to support queues. There were a few obstacles. Firstly, batched-claims seem semantically questionable for queues that run 1-by-1 (`Sql`, `Memory`). Secondly, there are a few extensions in contrib that extend these classes and override methods (consequently, they're looking for stable signatures). The approach here seemed to resolve those two concerns in an OOP-safe way: define an optional interface (`BatchQueueInterface`) which can be used to mark enhanced functionality on queues where it makes sense (eg `SqlParallel`).
0875af1
to
8369e17
Compare
@eileenmcnaughton OK, rebased. Also added the required |
The issue that was blocking for me (the ability to non-deliberately create a queue that discards failed items) is resolved so I'm merging this now |
Overview
This adds two APIs,
Queue.claimItems
andQueue.runItems
. These APIs may be used by background agents to reserve and execute tasks in a persistent queue. This PR is a major step towards providing generic support for background processing.As described in previous PRs (esp #22812), application-developers can specify queue behavior by setting various options, eg:
Background agents call the
Queue.runItems
API (and/orQueue.claimItems
). TheQueue
API respects the options from above. Tasks may be executed like so:# Example 1: Claim and run a task - single API call, bash cv api4 Queue.runItems queue=my-stuff
Cross-Reference
runner
,lease_time
,batch_limit
,retry_limit
).CRM_Queue_Task
. You may still fill a queue with tasks, but you may also fill queues with bespokearray()
data.hook_queueRun_{runner}
as a way to run queued items with bespoke and/or batched data.Before
The
CRM_Queue
allows you to build a queue or task-list. It encourages task-running in the foreground but not in the background:CRM_Queue_Runner
. (Note: it was primarily tuned for executing database upgrades in the web UI.)After
You may still use all the same patterns as before. Additionally, you may register a queue that will run in the background. The
Queue.runItems
API executes them. Here are a few examples:Technical Details
The present PR only adds the
Queue
APIs (building-blocks) -- there must also be some agent that uses the APIs. Here are a few ways to create an agent:Queue.get +w 'runner IS NOT EMPTY'
and thenQueue.runItems
. Here is pseudocode: https://gist.github.com/totten/d7a7e29de238b7892d49366966787759Further notes:
hook_civicrm_queueTaskError()
. You can use this hook to inspect the error, to enqueue an alternative action, to send a notification, etc. The following example discriminates based on the specific error - for some errors, it retries; for others, it gives up:runItems()
) or 2-steps (claimItems()
+runItems()
). Why? The 1-step protocol is more convenient in some cases (egcv api4 Queue.runItems...
). However, incoworker
s architecture, the 2-step allows it to better balance work across multiple processes.