-
-
Notifications
You must be signed in to change notification settings - Fork 134
build06: init #1807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
build06: init #1807
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,11 @@ | ||
| -----BEGIN CERTIFICATE----- | ||
| MIIBiDCCATqgAwIBAgIUTTdhCYYDhAXynqPBmh+1/ucQ5C4wBQYDK2VwMD4xHDAa | ||
| BgNVBAoME05peCBDb21tdW5pdHkgSW5mcmExHjAcBgNVBAMMFWh5ZHJhLXF1ZXVl | ||
| LXJ1bm5lci1jYTAgFw0yNTA5MDIwOTU4NTJaGA8yMDc1MDgyMTA5NTg1MlowRDEc | ||
| MBoGA1UECgwTTml4IENvbW11bml0eSBJbmZyYTEkMCIGA1UEAwwbaHlkcmEtcXVl | ||
| dWUtYnVpbGRlci1idWlsZDA2MCowBQYDK2VwAyEApKDc0kAVdrLZumtqYtjwA+KM | ||
| JDSP7hF7pDjE1mmEXsyjQjBAMB0GA1UdDgQWBBRtyDT1KiqnAHpKunWhlsdFYtwm | ||
| eDAfBgNVHSMEGDAWgBSs13lAhWgE2ji+4Yvm6b5bCI9pYjAFBgMrZXADQQCgd7FL | ||
| Y8S8lhHZIh5vUhNG3qsaTzFAvFgLoLqUf5lArjTEti/1cbGpzPn2iurP6P5J3I1U | ||
| AdNLUPYWxHWeGH0G | ||
| -----END CERTIFICATE----- |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,22 @@ | ||
| { inputs, ... }: | ||
| { | ||
| imports = [ | ||
| # currently only works with new hydra-queue-builder, not hercules or buildbot (nix.distributedBuilds) | ||
| ./nvidia.nix | ||
| inputs.self.nixosModules.cgroups | ||
| inputs.self.nixosModules.ci-builder | ||
| inputs.self.nixosModules.disko-zfs | ||
| inputs.srvos.nixosModules.hardware-hetzner-online-intel | ||
| ]; | ||
|
|
||
| nixCommunity.hydra-queue-builder-v2 = { | ||
| maxJobs = 2; | ||
| mandatoryFeatures = [ "cuda" ]; | ||
| }; | ||
|
|
||
| nix.settings.max-jobs = 14; | ||
|
|
||
| systemd.network.networks."10-uplink".networkConfig.Address = "1.2.3.4"; | ||
|
|
||
| system.stateVersion = "24.11"; | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,18 @@ | ||
| { | ||
| hardware.graphics = { | ||
| enable = true; | ||
| }; | ||
|
|
||
| hardware.nvidia = { | ||
| open = true; | ||
| }; | ||
|
|
||
| programs.nix-required-mounts = { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We'll need to hack around NixOS/nix#9272, my tentative plan was to see if it's enough to ad hoc patch Nix just on the remote side without touching the requesting side
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can try that but as long as it can eventually be upstreamed patching both sides is fine. @Mic92 What do you think a fix for NixOS/nix#9272 would look like?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As the only builds that'll run on this machine are the cuda tests could we set the sandbox paths directly instead of using the hook?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is the issue here? Is this needed for some cachix pre-build hook?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Did you read the issue linked in the previous comments?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Everything built gets pushed to our cachix cache. To avoid that we'd need to have a separate hydra instance just for running the tests. That then has problems, either we'd need to build the non-test derivations twice or have the test derivations on a much slower schedule to ensure that they have already been build on the main hydra (and still have the problem of derivations that failed on the main hydra being attempted a second time on the test hydra).
Yes. As long as the feature is a nix default (which big-parallel is) and also correct (I think big-parallel does apply for most of them) I don't see an issue?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
What I meant to say is: cachix contents eventually get garbage-collected...
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Ah, I see. Clever. This should work?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. https://github.com/helsinki-systems/hydra-queue-runner Using the new queue runner seems like it would address this problem. Working on this in #1912, not useable yet. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This is a little tricky, as the hook performs operations like symlink resolution that cannot be done at evaluation time. Behold, my solution: programs.nix-required-mounts.extraWrapperArgs = [
"--run shift"
"--add-flag '${builtins.unsafeDiscardOutputDependency (derivation { name = "needs-cuda"; builder = "_"; system = "_"; requiredSystemFeatures = [ "cuda" ]; }).drvPath}'"
]; |
||
| enable = true; | ||
| presets.nvidia-gpu.enable = true; | ||
| }; | ||
|
|
||
| services.xserver.videoDrivers = [ "nvidia" ]; | ||
|
|
||
| #services.telegraf.extraConfig.inputs.nvidia_smi.bin_path = "/run/current-system/sw/bin/nvidia-smi"; | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,61 @@ | ||
| queue-runner-client-key: ENC[AES256_GCM,data:E9EUEo+qa0ZgRLrPWWJ9ghbb5l4helRyYjl3EtGKEtuFiKDRQLZ2zyN6tyYQr8v5NcoqC/t9vFKZbpRl1P/5LPUS84ZdYDMPvul7cygvlnl9uH2kEpzoYMaWaYdr5UknBOC0znYOp3+ghkcD5Bdkg0fAZyHbXNs=,iv:NmOwnOVN9BELgCWvBONSfvSDpkfvSzOHrO8wikz4shM=,tag:IMpFRvs32hV6MlHx1yUK/A==,type:str] | ||
| sops: | ||
| age: | ||
| - recipient: age1rxh5g2ckvgtfwgwsrjxcl6kzx6esqmzkpswc6r3984uzgjj9eg3q9arjzc | ||
| enc: | | ||
| -----BEGIN AGE ENCRYPTED FILE----- | ||
| YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBIc2lvVnAzdW5EZnFvdnM2 | ||
| Mkx3bzVXUTVwQlJDVER1WnBrejlWU3gyTnpjCkVUMnhkT1Y2ZWRtZ1hPUUptVGRa | ||
| VkduWGVxbTZpRkJGUWpIS1lycnVtVDgKLS0tIEI0bmY3VHdkNTgyVTNnOE9DZkh2 | ||
| SzlNaHROd2NwMFU0T3F3UEZzY3FwZTAKpy8wPYNKV1QYlgLf+JoTZRR3mHjNmguA | ||
| 0wbwv3rr+AulWk0/lzm3dErq3WMTOcnDDCK/qG8mBjyuMLwTPRM+pw== | ||
| -----END AGE ENCRYPTED FILE----- | ||
| - recipient: age1dzvjjum2p240qtdt2qcxpm7pl2s5w36mh4fs3q9dhhq0uezvdqaq9vrgfy | ||
| enc: | | ||
| -----BEGIN AGE ENCRYPTED FILE----- | ||
| YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBpL0tWeVBOZ0Vsb1BPTWNB | ||
| UTI4QXRtenltbjU3OHhQYUh0S3NSdnRHNmtNCkZYbW9xZkN0czBQbkdDcHZCWmxC | ||
| Y0pVNFIvZ2hxWi9UVFErdHd0Qjk3RjgKLS0tIHU5TytUanBoSWl6LzRzK0Z0YWdF | ||
| dmhZcVN1YUo4Z1g0QmdJMEVkVEd0Mk0KGZTTCXpBIJUUeWc1VKCrC2c7hfZiPcMx | ||
| r4ZGADYea2x+t+9drFKX4qhk5tLPOhn0LChhmMgttXPnxKha8hoecQ== | ||
| -----END AGE ENCRYPTED FILE----- | ||
| - recipient: age17n64ahe3wesh8l8lj0zylf4nljdmqn28hvqns2g7hgm9mdkhlsvsjuvkxz | ||
| enc: | | ||
| -----BEGIN AGE ENCRYPTED FILE----- | ||
| YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBpRnhsa0FzWG5vdlRXazNu | ||
| VGN2WkhacDF1QzFwY0NxalhPcjNRWENtYmxJCmdodVJMUkh3bk5VSHBWZlZ4NS9n | ||
| dzJabEpFVy9ISTlOZnBFS2pVWHFaaXcKLS0tIGRuWG9tQVNmazM4YkV0VDR5TWN6 | ||
| bXU4TkNvby92OWhiNnk0S1RSZDI0YkkKQOtZ23IQiFeXscTGQXbD7HmBknGwAziM | ||
| oaluOn4Gm6rXBkpzyStwC45VsG9H25NKuALm2pfkq58hB9kRdJEk7w== | ||
| -----END AGE ENCRYPTED FILE----- | ||
| - recipient: age1d87z3zqlv6ullnzyng8l722xzxwqr677csacf3zf3l28dau7avfs6pc7ay | ||
| enc: | | ||
| -----BEGIN AGE ENCRYPTED FILE----- | ||
| YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBhNkk5Q1NyTk84Vjc3bHVi | ||
| Q1Bsc0FUSEhJVDVBU1RxaGhvd3NnaXJZbGhJCkFmOXlqR3dNK1lUNWFVa0NxUXJZ | ||
| VVBVcGg1Qk1iM1JXNVA4cklSb3JZRTgKLS0tIGEwTDVBWm9DVmg2NmJYbkNyejVj | ||
| SnpuckIyN1FCYzdzc2pZVWZtRXp5eWsKrtDJI1bnctBI0FkenWxSOZzSSh+IvkAz | ||
| 1dIlaEZ/TQpDRHI3sJihZcW8sHRBhs4AYLGZMuAIJi2CUIRuav+Csg== | ||
| -----END AGE ENCRYPTED FILE----- | ||
| - recipient: age1jrh8yyq3swjru09s75s4mspu0mphh7h6z54z946raa9wx3pcdegq0x8t4h | ||
| enc: | | ||
| -----BEGIN AGE ENCRYPTED FILE----- | ||
| YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBtYUZaSHE5aWEwTDg5Wk9G | ||
| S1MrR1dFeWl1N0s4WmdoK1d1R3dWM2s3RndBClpqa0d2VjRnc0JwdFhvVFAvWEVn | ||
| WnNsZitqVzZWRWRqU0htSEtFcTEwMm8KLS0tIHJoNzliSkxDNWZaUU81WXNkcUlR | ||
| c293UXBHdkp2bWRjZmlhYUF2REZvZXcKpNXsBb0TnyY8BmEHVuIGcu+zElooKW0A | ||
| tXTQBPZ2gSZHTZUBk7oUcJPF4XKVpyEVLFlJQsCSZGDZSqsSj/uIBQ== | ||
| -----END AGE ENCRYPTED FILE----- | ||
| - recipient: age1m7xhem3qll35d539f364pm6txexvnp6k0tk34d8jxu4ry3pptv7smm0k5n | ||
| enc: | | ||
| -----BEGIN AGE ENCRYPTED FILE----- | ||
| YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBoZ1BJLzlZcUgxczFZY0xm | ||
| WVdyUXpBb3JGa0d4SW1NVE9CTmhjZU9kUzNZClQrTHhVc0N6WjFWYjJ4eklmS0F3 | ||
| WHQzVmtYM2c2b29yVXB5SUswenJqQkUKLS0tIHpycWRwWkFOOVBSUEF3VENTc092 | ||
| V3UyRXJCQmY3bjkvK01EcEk5WGMxZU0KEKUGD9ne9mHjKmRWWeMMnfgw9YrBBGbl | ||
| E2Akm7TcYHPkn7SogUUv1TVcyjhz7j11skpgCaMmOpQ29QAPIgNOfQ== | ||
| -----END AGE ENCRYPTED FILE----- | ||
| lastmodified: "2025-09-02T10:07:51Z" | ||
| mac: ENC[AES256_GCM,data:ZcBCFYZCO+HxQhup+lSO1gSK2dWUaPf6KDqFKbobyDdVr/EbrRQW9jaGROqsBhNRPHCwX/P2rQ/jOYrRcO3jkXhH5QrOlzQS84qdCwjS9FZnDQuE41Hn829e2ElWnKsK+qmEt9/EtVmaaDKINBdX65CK6KxPrU8h/zrVkS6N3bs=,iv:/50PalAkRkMuouKTpsSD7hX0oBake7x7c/ubB/OEbv8=,tag:1lQbSTZcImbh6dH7TGpG6A==,type:str] | ||
| unencrypted_suffix: _unencrypted | ||
| version: 3.10.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend setting this to two -- both in the case this is used as an actual builder and because running a bunch of GPU tests simultaneously could cause an OOM event.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏻 this is the place where we suffer very much from lack of a "real" scheduler for Nix, with support for negative affinity and resource constraints like in SLURM. We'll definitely run into issues running things like pytorch or pytorch-lightning test-suites, where you can have matrices of tests run in parallel