Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline refactoring proposal #512

Open
r-c-n opened this issue Mar 28, 2024 · 2 comments
Open

Pipeline refactoring proposal #512

r-c-n opened this issue Mar 28, 2024 · 2 comments

Comments

@r-c-n
Copy link
Contributor

r-c-n commented Mar 28, 2024

While thinking of ways of making KernelCI more flexible for future requirements, I identified some design elements as the sources of the hardships we've been finding so far during development and as potential flexibility limitations:

  • The current pipeline stages are separated but aren't really working in an independent way: even if they're independent stages, the way they are meant to work right now is as linear pipeline stages: ie. the output of one stage feeds into the next and some stages expect an input from the stage right before.
  • The scheduler stage conflates scheduling logic and job generation/running, so it's not trivial to test-run a job generation without involving the whole scheduler machinery and configuration.

I think this pipeline design was probably done as a first sketch and then we've been stacking patches on top of it, but I think the KernelCI API design is meant to be used in a different way by the pipeline and other clients. As it often happens during rushed designs, once we got a hammer we started using it for hammering nails, but also for screws and to punch in holes in the walls, instead of finding ways to get us a screwdriver and a drill as well.

So this is a proposal for change to overcome these limitations and make KernelCI both easier to develop on and easier to adapt to future use cases.

Current test flow

As I understand it, the current flow for running a test looks something like this (the numbers in parentheses show the event order):

current_flow

Proposal

  1. Introduce an additional "Runner" stage that generates jobs and runs it. That is, move the second half of the current "Scheduler" to a separate stage and leave the "Scheduler" to handle exclusively the logic of which jobs to run.
  2. Make a better use of events to communicate clients together. If we introduce a new type of event, such as run (this is free to do afaik) we can use it to kickstart processes in individual stages selectively from any source. This event may contain all the necessary data to describe the event and for the target stage to perform the task.
  3. Introduce a new (maybe optional) "Job Dispatcher" stage that can take high-level job descriptions and then trigger the appropriate stages to get a result. A job description could be simply a json definition to "run a test in a specific platform on a specific kernel version and setup", and the dispatcher would decompose that into specific stage runs to fetch the kernel code, build it and run the test.

An example scenario would look like this (note: this isn't a sequence diagram, the order of API interactions between components doesn't matter):

proposal

So, the current test flow could still work just the same, the sequence would be:

  1. "Trigger" detects a repo change, or it receives a run event addressed to it. Then it does its usual tasks, submits the checkout node and notifies the "tarball" stage by sending a run event.
  2. "Tarball" starts when it receives a run event addressed to it, it does the usual tasks and submits the node update
  3. The "Scheduler" works as usual, receiving node events, with the difference that instead of running the jobs itself, it decides which jobs to run based on the pipeline configuration and notifies the "Runner" stage to run them, passing all the necessary information about the job in the run event.
  4. The "Runner" generates the jobs and run them as it receives them via run events.

Additionally, this decoupling of functionalities allows other use cases that we can't do at the moment:

  • A client can trigger an individual stage to have KernelCI perform an operation on demand: build a kernel remotely and upload it (running a standalone kbuild), run a test with a custom kernel build
  • External tools can plug into the API at any stage
  • Higher-level processes and logic can be built on top of these primitive stages
@padovan
Copy link

padovan commented Mar 28, 2024

From a high high level perspective that makes sense. When interacting with kernel test ecosystem, I can see the following needs:

  • We need a easy way to eventually fit things coming from GitLabCI. It may be either build request or just Test run request with kernel image attached.
  • Or trigger step may evolve in trigger to more that just KernelCI. Other services could rely on it. The idea here is that KernelCI would start build a more central place for triggering jobs for various CIs, so from a maintainer perspective life becomes easier.
  • All the test specs should also move out of KernelCI v2 one day too, so other CI can just re-use that. Some of that information should go to mainline even.

And maybe some of these pipeline steps may eventually become a service on it own.

@r-c-n
Copy link
Contributor Author

r-c-n commented Mar 28, 2024

We need a easy way to eventually fit things coming from GitLabCI. It may be either build request or just Test run request with kernel image attached.

Having a description of the expected inputs and outputs of GitlabCI would be a good start. If there's a well-defined interface then we can check right away if this proposal fits the problem or if we need to adapt the pipeline further. I'll ask Helen about it.

Or trigger step may evolve in trigger to more that just KernelCI. Other services could rely on it. The idea here is that KernelCI would start build a more central place for triggering jobs for various CIs, so from a maintainer perspective life becomes easier.

Yes, the current "trigger" stage is very simple and does only one thing, it's by no means a generic trigger.

All the test specs should also move out of KernelCI v2 one day too, so other CI can just re-use that. Some of that information should go to mainline even.

From this and the previous point I gather you mean that all test definitions could adhere to a common schema for all CI systems to follow and that the KernelCI pipeline could be used to interoperate with other CI systems? (launching tests for other systems, getting result callbacks from tests launched by others). This sounds like an honorable initiative but almost utopian viability-wise. That doesn't mean we shouldn't push for it, though.

And maybe some of these pipeline steps may eventually become a service on it own.

They should be. The only reason they aren't right now is because they're tightly coupled to each other and they're ready to deal only with intra-KernelCI requests. But if we opened their interface properly they will be essentially independent services talking between each other and with external clients using the API as the communication channel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

2 participants