Pipeline refactoring proposal #512

r-c-n · 2024-03-28T13:51:14Z

While thinking of ways of making KernelCI more flexible for future requirements, I identified some design elements as the sources of the hardships we've been finding so far during development and as potential flexibility limitations:

The current pipeline stages are separated but aren't really working in an independent way: even if they're independent stages, the way they are meant to work right now is as linear pipeline stages: ie. the output of one stage feeds into the next and some stages expect an input from the stage right before.
The scheduler stage conflates scheduling logic and job generation/running, so it's not trivial to test-run a job generation without involving the whole scheduler machinery and configuration.

I think this pipeline design was probably done as a first sketch and then we've been stacking patches on top of it, but I think the KernelCI API design is meant to be used in a different way by the pipeline and other clients. As it often happens during rushed designs, once we got a hammer we started using it for hammering nails, but also for screws and to punch in holes in the walls, instead of finding ways to get us a screwdriver and a drill as well.

So this is a proposal for change to overcome these limitations and make KernelCI both easier to develop on and easier to adapt to future use cases.

Current test flow

As I understand it, the current flow for running a test looks something like this (the numbers in parentheses show the event order):

Proposal

Introduce an additional "Runner" stage that generates jobs and runs it. That is, move the second half of the current "Scheduler" to a separate stage and leave the "Scheduler" to handle exclusively the logic of which jobs to run.
Make a better use of events to communicate clients together. If we introduce a new type of event, such as run (this is free to do afaik) we can use it to kickstart processes in individual stages selectively from any source. This event may contain all the necessary data to describe the event and for the target stage to perform the task.
Introduce a new (maybe optional) "Job Dispatcher" stage that can take high-level job descriptions and then trigger the appropriate stages to get a result. A job description could be simply a json definition to "run a test in a specific platform on a specific kernel version and setup", and the dispatcher would decompose that into specific stage runs to fetch the kernel code, build it and run the test.

An example scenario would look like this (note: this isn't a sequence diagram, the order of API interactions between components doesn't matter):

So, the current test flow could still work just the same, the sequence would be:

"Trigger" detects a repo change, or it receives a run event addressed to it. Then it does its usual tasks, submits the checkout node and notifies the "tarball" stage by sending a run event.
"Tarball" starts when it receives a run event addressed to it, it does the usual tasks and submits the node update
The "Scheduler" works as usual, receiving node events, with the difference that instead of running the jobs itself, it decides which jobs to run based on the pipeline configuration and notifies the "Runner" stage to run them, passing all the necessary information about the job in the run event.
The "Runner" generates the jobs and run them as it receives them via run events.

Additionally, this decoupling of functionalities allows other use cases that we can't do at the moment:

A client can trigger an individual stage to have KernelCI perform an operation on demand: build a kernel remotely and upload it (running a standalone kbuild), run a test with a custom kernel build
External tools can plug into the API at any stage
Higher-level processes and logic can be built on top of these primitive stages

The text was updated successfully, but these errors were encountered:

padovan · 2024-03-28T14:54:31Z

From a high high level perspective that makes sense. When interacting with kernel test ecosystem, I can see the following needs:

We need a easy way to eventually fit things coming from GitLabCI. It may be either build request or just Test run request with kernel image attached.
Or trigger step may evolve in trigger to more that just KernelCI. Other services could rely on it. The idea here is that KernelCI would start build a more central place for triggering jobs for various CIs, so from a maintainer perspective life becomes easier.
All the test specs should also move out of KernelCI v2 one day too, so other CI can just re-use that. Some of that information should go to mainline even.

And maybe some of these pipeline steps may eventually become a service on it own.

r-c-n · 2024-03-28T15:19:25Z

We need a easy way to eventually fit things coming from GitLabCI. It may be either build request or just Test run request with kernel image attached.

Having a description of the expected inputs and outputs of GitlabCI would be a good start. If there's a well-defined interface then we can check right away if this proposal fits the problem or if we need to adapt the pipeline further. I'll ask Helen about it.

Or trigger step may evolve in trigger to more that just KernelCI. Other services could rely on it. The idea here is that KernelCI would start build a more central place for triggering jobs for various CIs, so from a maintainer perspective life becomes easier.

Yes, the current "trigger" stage is very simple and does only one thing, it's by no means a generic trigger.

All the test specs should also move out of KernelCI v2 one day too, so other CI can just re-use that. Some of that information should go to mainline even.

From this and the previous point I gather you mean that all test definitions could adhere to a common schema for all CI systems to follow and that the KernelCI pipeline could be used to interoperate with other CI systems? (launching tests for other systems, getting result callbacks from tests launched by others). This sounds like an honorable initiative but almost utopian viability-wise. That doesn't mean we shouldn't push for it, though.

And maybe some of these pipeline steps may eventually become a service on it own.

They should be. The only reason they aren't right now is because they're tightly coupled to each other and they're ready to deal only with intra-KernelCI requests. But if we opened their interface properly they will be essentially independent services talking between each other and with external clients using the API as the communication channel.

r-c-n mentioned this issue Jun 28, 2024

Design and implementation ideas for job retries kernelci/kernelci-api#509

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pipeline refactoring proposal #512

Pipeline refactoring proposal #512

r-c-n commented Mar 28, 2024 •

edited

Loading

padovan commented Mar 28, 2024

r-c-n commented Mar 28, 2024

Pipeline refactoring proposal #512

Pipeline refactoring proposal #512

Comments

r-c-n commented Mar 28, 2024 • edited Loading

Current test flow

Proposal

padovan commented Mar 28, 2024

r-c-n commented Mar 28, 2024

r-c-n commented Mar 28, 2024 •

edited

Loading