You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The pipeline run reconciler builds a pipeline run state on every
run, which resolves task references, expands result and processes
matrix fan outs.
The current process is incremental in a single loop, where each
new PipelineTask resolution depends on the state of PipelineTasks
resolved before. This is problematic because tasks are not
necessarily defined in the pipeline in order of execution (which
is undefined, given that pipelines are DAGs).
Since this PR is a fix to a regression, it aims to be as minimal
as possible. The smallest solution available is to implement some
sorting in the list of tasks, so that the incremental state
can work correctly.
This PR splits the process into two runs, one for tasks that have
been already started (and possibly completed), and a second one
that includes all remaining tasks. The first group of task does
not need matrix fan outs (they have already been processed) or
result resolution, so its state can be safely build incrementally.
The second group is executed starting from the state of the second
group. Any task that is a candidate for execution in this this
reconcile cycle must have its results resolved through the state
of the first group.
Testing with the current code arrangement is a bit challenging,
as we ignore result resolution errors in the code, which is ok
only in some cases:
- result resolution due to task not found or result not defined
is permanent and should not be ignored
- result resolution due to a result not being available yet is
ephemeral (possibly) and should not cause a failure
Currently one function checks for all these conditions and
returns one error, so it's not possible to safely distinguish
them. This will require some refactoring to be fixed in a follow
up patch.
For now, a reconcile unit test (TBD) may be able to test the fix.
Fixes: #7103
Signed-off-by: Andrea Frittoli <[email protected]>
0 commit comments