Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce variance of perf. CI machine #1450

Closed
Kobzol opened this issue Sep 29, 2022 · 11 comments
Closed

Reduce variance of perf. CI machine #1450

Kobzol opened this issue Sep 29, 2022 · 11 comments

Comments

@Kobzol
Copy link
Contributor

Kobzol commented Sep 29, 2022

Currently, the machine that runs perf. benchmarks on CI uses both turbo-boost and hyper-threading. It would be nice to try to disable these features for some time (e.g. a week or two weeks). We could then observe if disabling them reduces the variance of e.g. wall-time measurements.

Experiment results:

  • Hyperthreading ✖️, Turboboost ✖️
    • 2022-10-01 -> 2022-10-08, results (data shown start a week before the experiment started to see the difference). Wall-time variance has been reduced substantially. CI perf. time increased by ~10%, from 1.2 to 1.3 hours.
  • Hyperthreading ✔️, Turboboost ✖️
    • 2022-10-08 -> 2022-10-16, results (data shown from the start of the whole experiment). Wall-time variance has increased, but just slightly and on some benchmarks. The additional threads helped mostly only in opt builds and for large crates (e.g. cargo). CI perf. time decreased only very slightly, to about 1.28 hours.
  • Hyperthreading ✖️, Turboboost ✔️
    • 2022-10-16 -> 2022-10-22, results (data shown from the start of the whole experiment). CI perf. time was about 1.25 hours.

CC @Mark-Simulacrum

@Kobzol
Copy link
Contributor Author

Kobzol commented Sep 29, 2022

Commands to do this:

Turbo-boost

Disable (apply)

$ sudo bash -c "echo 0 > /sys/devices/system/cpu/cpufreq/boost"

Enable (revert)

$ sudo bash -c "echo 1 > /sys/devices/system/cpu/cpufreq/boost"

Hyper-threading

Disable (apply)

sudo bash -c "echo off > /sys/devices/system/cpu/smt/control"

Enable (revert)

sudo bash -c "echo on > /sys/devices/system/cpu/smt/control"

@Mark-Simulacrum
Copy link
Member

rust-lang/rust@744e397 will (hopefully) be the last commit benchmarked before I apply the changes here.

@Mark-Simulacrum
Copy link
Member

OK, applied:

  • echo 0 | sudo tee /sys/devices/system/cpu/cpufreq/boost
  • echo off | sudo tee /sys/devices/system/cpu/smt/control

@the8472
Copy link
Member

the8472 commented Oct 1, 2022

Is ASLR already disabled, at least for the profiled processes?

@Mark-Simulacrum
Copy link
Member

I think so. We have kernel.randomize_va_space = 0 globally (which disables kernel ASLR), and we also run processes under set arch -R, which I believe disables ASLR (see

).

@Kobzol
Copy link
Contributor Author

Kobzol commented Oct 1, 2022

Yes, it is disabled kernel-wise and also explicitly in code.

@Kobzol
Copy link
Contributor Author

Kobzol commented Oct 8, 2022

The experiment with TB and HT turned off has been concluded (I added the results to the issue description).
@Mark-Simulacrum Please change the configuration to Turbo-boost off, Hyper-threading on (so currently it should be enough to just enable Hyper-threading).

@Mark-Simulacrum
Copy link
Member

Switched hyper-threading on. Last commit benchmarked w/o it is rust-lang/rust@c27948d (on master).

@Mark-Simulacrum
Copy link
Member

Hyper-threading off, turboboost on; last commit benchmarked is rust-lang/rust@8be3ce9.

@Mark-Simulacrum
Copy link
Member

https://perf.rust-lang.org/compare.html?start=bed4ad65bf7a1cef39e3d66b3670189581b3b073&end=bed4ad65bf7a1cef39e3d66b3670189581b3b073-noisy compares a single commit with "A" being turbo + hyperthreads off, and "B" being turbo and hyperthreads on.

We have left the machine with turboboost and hyperthreads both off. I think I managed to make this automatic on boot (/etc/systemd/system/turbo-off.service that disables both).

@tindzk
Copy link

tindzk commented Oct 29, 2022

Here are some of the experiments the Scala team has run to reduce variance of their CI machine: scala/scala-dev#338

Stopping the irqbalance service could be worth looking into. This would prevent hardware interrupts from being distributed across processors.

Also, by setting the kernel parameter nohz_full we could reduce timer interrupts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants