Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disk full errors should be spurious #700

Open
Noratrieb opened this issue Sep 20, 2023 · 11 comments
Open

Disk full errors should be spurious #700

Noratrieb opened this issue Sep 20, 2023 · 11 comments

Comments

@Noratrieb
Copy link
Member

https://crater-reports.s3.amazonaws.com/pr-115235/try%235a3d3b91048a0adf280e7a4e589c1dda6443f172/gh/SapientAsh.solstice_calculator/log.txt

This is a spurious error but isn't marked as such.

[INFO] [stdout]   = note: /usr/bin/ld: final link failed: No space left on device
[INFO] [stdout]           collect2: error: ld returned 1 exit status
@RalfJung
Copy link
Member

RalfJung commented Oct 2, 2023

https://crater-reports.s3.amazonaws.com/pr-116284/index.html also has quite a few of these.

bors added a commit that referenced this issue Nov 25, 2023
detect new spuroius error and fix a regression considered spurious

- change from TestFail to non-spuriouse BuildFail is a regression (#703)
- no space left on drive is spuriouse (#700)
@tbu-
Copy link

tbu- commented May 13, 2024

Also happened here: https://crater.rust-lang.org/ex/pr-124636.

@Skgland
Copy link
Contributor

Skgland commented May 31, 2024

Also happened here: https://crater.rust-lang.org/ex/pr-124636.

I see 213 entries correctly sorted under spurious-regressed with build no space left on device.
And 9 or 10 under regressed with build failed (unknown)

For the 10 the errors are
  • [INFO] [stderr] Error response from daemon: mkdir /var/lib/docker/overlay2/d0392f4632b22de6528ebe4efcc10f002656b8723c4a1b710ed4458e6b02ea3e-init: no space left on device
  • [INFO] [stderr] Error response from daemon: mkdir /var/lib/docker/overlay2/bd2885f2ca702c73f2089fd560ca5812020f20a703dd920f909fcf68b043099e/diff: no space left on device
  • [INFO] [stderr] Error response from daemon: symlink ../1744f2c6d6c1b7751a30adc89afb973fac6960b6b240c59be797348b8d713855/diff /var/lib/docker/overlay2/l/YQMKEEA6H4GSSXCHI2Y5LO3XBH: no space left on device
  • [INFO] [stderr] Error response from daemon: mkdir /var/lib/docker/overlay2/0fa118698fba3bdf766053a429fda884e20458aa114597c09bd675a92e77adcb-init: no space left on device
  • [INFO] [stderr] Error response from daemon: mkdir /var/lib/docker/overlay2/35445b29951a97ad96f344bd33abb3c7976dcd0a7db9620eb7ebfaed05284b94-init: no space left on device
  • [INFO] [stderr] Error response from daemon: mkdir /var/lib/docker/overlay2/5cd8254e4074e5a5dc108cf93c21d54d17667b05d9d70dd70a7404d4a2600f55-init: no space left on device
  • [INFO] [stderr] Error response from daemon: mkdir /var/lib/docker/overlay2/d1f525505d6a64c02a2fc42e1e7e0fa163877b4776a8c7ca3c94d69358ca848e-init: no space left on device
  • [INFO] [stderr] Error response from daemon: write /var/lib/docker/containers/08cdd65fbdc00b34e781e9ffe505de039f45012ef9e480a41d13c7b4780739d5/.tmp-config.v2.json2346282875: no space left on device
  • log truncated, no error show, so uncertain if this is even a no space left on device error
  • [INFO] [stderr] Error response from daemon: mkdir /var/lib/docker/overlay2/9404567f07eaeadab3dc07a00c6d4ea16806738d6eaa4b83b6ed78c401868307-init: no space left on device

So it looks like for all these the docker daemon failed to perform some action due to the full device.
Another source not seen here is described in #715. i.e. when a dependency fails to compile it is always a DependsOn failure even if a no space left on device error occurred.
This can be seen in crater result posted above by RalphJung under regressed: dependencies

https://crater-reports.s3.amazonaws.com/pr-116284/index.html also has quite a few of these.

@tbu-
Copy link

tbu- commented May 31, 2024

What leads to the disk being full for crater?

@Skgland
Copy link
Contributor

Skgland commented May 31, 2024

To reduce computation time crater does not reset to a pristine environment between crates so that the builds of dependencies can be reused. So build artifacts accumulate and can fill up the disk.

There is supposed to be a disk space every now and then see this comment https://github.com/rust-lang/crater/blob/master/src/runner/tasks.rs#L172-L185 that is supposed to cleanup when that happens.
The disc-space-watcher is started over here https://github.com/rust-lang/crater/blob/master/src/runner/mod.rs#L98 and appear to be currently configured to run every 30 seconds with a threshold of 80%.

@tbu-
Copy link

tbu- commented May 31, 2024

I see, thank you.

So in a sense, disk-full errors are spurious, but shouldn't be ignored, but and the crates maybe retried after cleaning up some disk space.

@RalfJung
Copy link
Member

https://crater-reports.s3.amazonaws.com/pr-134300/index.html is an example of a report that says there are 110 regressions, but as far as I can tell every single one of them is "no disk space left on device", or linker failures. None of them are related to the actual compiler change being tested. In that sense it'd be great if crater could mark them as spurious automatically, that would save a bunch of work that currently I have to do manually.

@Skgland
Copy link
Contributor

Skgland commented Dec 19, 2024

The linker failures should be simple by adding a new case at https://github.com/rust-lang/crater/blob/master/src/runner/test.rs#L150-L159 looking for collect2: fatal error: ld terminated with signal 7 [Bus error] and the sorting it into an appropriate (new) failure reason below at https://github.com/rust-lang/crater/blob/master/src/runner/test.rs#L234, the no disk space left on device is more difficult as it originates in docker rather than cargo so the logs don't appear to be processed in the same location otherwise they should have already been sorted.

@Skgland
Copy link
Contributor

Skgland commented Dec 19, 2024

https://crater-reports.s3.amazonaws.com/pr-134300/index.html is an example of a report that says there are 110 regressions, but as far as I can tell every single one of them is "no disk space left on device", or linker failures. None of them are related to the actual compiler change being tested. In that sense it'd be great if crater could mark them as spurious automatically, that would save a bunch of work that currently I have to do manually.

Running https://github.com/Skgland/Crater-Analysis over the crater run gives me

  • 95 times no disk space left on device
  • 14 times collect2: fatal error: ld terminated with signal 7 [Bus error]

and two which have neither

  • massivebird.hex-conversion-game.a266014ea5cda407b8a6ccfde32f856ef2d8ca50 has a different docker error
  • sunkit02.distest.be120251513011124b20b9bea9f21f7b3438ba29 has a different linker error

@Skgland
Copy link
Contributor

Skgland commented Dec 19, 2024

The no disk space left on device appear to all happen during the creation of the docker container in SandboxBuilder::create,
to filter these rustwide would need to catch these and provide sufficient context to crater,
it might be sufficient to to turn CommandError from docker sandbox commands (not including the commands we run in the container via docker) into a CommandError::SandboxCreate(inner_error) so crater can turn this into a spurious FailureReason::Docker in failure_reason.

@Skgland
Copy link
Contributor

Skgland commented Dec 28, 2024

#758, #759 and #760 should eliminate most of the unknown results

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants