-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use multiple self-hosted runner instances. #2722
Merged
rectified95
merged 11 commits into
microsoft:main
from
rectified95:multiple_ebpf_runners
Aug 2, 2023
Merged
Use multiple self-hosted runner instances. #2722
rectified95
merged 11 commits into
microsoft:main
from
rectified95:multiple_ebpf_runners
Aug 2, 2023
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
rectified95
force-pushed
the
multiple_ebpf_runners
branch
from
August 1, 2023 08:14
d69968b
to
c4eebf3
Compare
rectified95
force-pushed
the
multiple_ebpf_runners
branch
from
August 1, 2023 17:17
7760539
to
fd8ea66
Compare
rectified95
force-pushed
the
multiple_ebpf_runners
branch
from
August 2, 2023 04:10
ecd35d0
to
d8ee2a0
Compare
rectified95
requested review from
dthaler,
poornagmsft,
Alan-Jowett,
saxena-anurag,
shankarseal,
dv-msft,
delaramamiri,
gtrevi and
shpalani
as code owners
August 2, 2023 05:42
rectified95
changed the title
Multiple ebpf runners
Use multiple self-hosted runner instances.
Aug 2, 2023
nit: can you reformat the PR description with the template? |
dthaler
previously approved these changes
Aug 2, 2023
Alan-Jowett
previously approved these changes
Aug 2, 2023
gtrevi
previously approved these changes
Aug 2, 2023
gtrevi
approved these changes
Aug 2, 2023
@dthaler / @Alan-Jowett would you mind approving again? I fixed up the labels everywhere and now all checks are again passing for me. |
Alan-Jowett
approved these changes
Aug 2, 2023
dthaler
approved these changes
Aug 2, 2023
This was referenced Aug 2, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Fixes #2229
This PR modifies the CI so it can use multiple GH runner service instances on the self-hosted runner machines running the kernel tests (
driver
anddriver_native_only
jobs).For now, we're doubling the capacity by having 2 GH service instances per host. This change capitalizes on #2699, which converted our multi-VM tests to only need one VM.
Key change: runner to VM mapping
Each of the runner machines hosts 2 VMs named
vm1
andvm2
as before. The key mechanism is the mapping of known runner service names to VM names, resolved at runtime by passing therunner.name
from the GH context via the new test script parameterSelfHostedRunnerName
. This can scale to any number of machines, subject to available RAM on the hosts, only requiring an update totest_execution.json
.Runner service separation was achieved by installing them in separate working directories. Since our test collateral is copied onto the VMs and run there, there is little potential for state leaking between jobs running on the same host. The only part during which jobs store their state in a shared location is during artifact compression, because they continue using
[System.IO.Path]::GetTempFileName()
to create the file name. However, that function guarantees name uniqueness for over 65k invocations, and our jobs remove the files they create regardless.Ref: https://learn.microsoft.com/en-us/dotnet/api/system.io.path.gettempfilename?view=net-7.0#remarks
Bugs fixed and other changes
connect_redirect
test pass. I exposed this bug accidentally when deleting the c:\eBPF folder from the runners and updating the checkpoints.APPLICATION_VERIFIER_HANDLES_INVALID_HANDLE
crash insocket_tests.exe
due to invalidated socket handles being closed again in the destructor. Since AppVerifier was enabled on the runners long ago, this showing up now may have to do with me deleting the stale binaries and also forcing a less predictable (serialized) assignment of runners to jobs.self-hosted
label, while the new ones don't have theebpf_cicd_label
; other PRs will not be assigned the new runners until this PR is merged, after which all runners will be used.Testing
Verified in the 'Set up job' sections of the CI output that the
driver
anddriver_native_only
jobs are scheduled in parallel using 4 different runners.Documentation
Cosmetic change to
SelfHostedRunnerSetup.md
Installation
N/A