Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelising work in CI systems #505

Open
tomato42 opened this issue Jan 15, 2021 · 3 comments
Open

Parallelising work in CI systems #505

tomato42 opened this issue Jan 15, 2021 · 3 comments

Comments

@tomato42
Copy link
Contributor

Both Gitlab CI and Github Actions allow having tasks that depend on each-other: https://docs.github.com/en/free-pro-team@latest/actions/learn-github-actions/migrating-from-gitlab-cicd-to-github-actions#dependencies-between-jobs

It would be nice to have ability to:

  1. prepare work for N workers in one job
  2. start N workers to process the mutation runs, each getting a single file with mutations to execute
  3. have a summary task that combines results from the N workers and provides the overall mutation score
@tomato42 tomato42 changed the title Paralelising work in CI systems Parallelising work in CI systems Jan 15, 2021
@abingham
Copy link
Contributor

abingham commented Jan 17, 2021

There are few ways you might approach this that come to mind. First, you could have each worker handle a particular module (or subset of modules). For each worker, the cosmic-ray.module-path config option would tell it which modules to mutate/test. If you wanted to get a unified result at the end, you'd need some method to combine their WorkDBs afterward; this shouldn't be difficult, and might be a generally useful tool for CR to have.

Another option is to give each worker access to the entire set of mutations for your project, but to have them only actually perform a subset of them. So if a worker knew, for example, that it was number 3 out of 5, then it would only work on the third fifth of all mutations...or something like that, I'm glossing over details. As before, you might want some way to combine all of the results to get a unified report.

Of course, if the workers are actually able to communicate with one another, you could also use e.g. the celery execution engine to distribute work among them. I'm not sure if this is possible or not.

So I think we already have most of what you need to do this. It'll require a little creativity, and we might find that there are even better ways, e.g. perhaps a new execution engine. I'm happy to help you work on a solution (though I don't have much bandwidth to actually implement something right now).

@tomato42
Copy link
Contributor Author

doesn't celery require real-time access between controller and runners? I'm thinking files as those are typically handled well by CI systems (as build artefacts), so they would be runnable even if workers don't have access to network.

I'm thinking that the split should happen on a single machine, with job files having a subset of mutations to execute, as we probably want to preserve the runner that executes mutations in random order—so that we can kill workers after specific amount of time, not when they finish the job (for CI we want results quickly, even if they are incomplete).

While using filtering and module-path would work, rarely modules are same size or complexity, so split like that would be rather rough. More of a crutch than a solution.

Build artefacts handling is also why I'm thinking that combining should use files as inputs.

So basically, I think that we need something that splits the sqlite file into N files with random mutations that can be executed by the existing runner, and then something that takes all the files after the runners are done with them and melds it together.

@abingham
Copy link
Contributor

abingham commented Jan 19, 2021

doesn't celery require real-time access between controller and runners?

That's right, hence the caveat about the workers needing to be able to communicate. I figured this was not likely to be possible, but I thought I should include it for completeness.

More of a crutch than a solution.

I agree, this is a pretty crude approach. It's primary benefit is its simplicity, but it's not so much simpler than other approaches that I'd try it first.

I think that we need something that splits the sqlite file into N files with random mutations that can be executed by the existing runner, and then something that takes all the files after the runners are done with them and melds it together.

Right, I think this is the best way to start. I think we even have most of the parts we need. I'm not sure what channels there are for communicating between the workers, but I guess we'll need some way of serializing WorkDBs or WorkItems between them. This should be pretty straightforward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants