linux/aarch64 Now() should be actually_monotonic() #88652

AGSaidi · 2021-09-04T20:38:41Z

While issues have been seen on arm64 platforms the Arm architecture requires
that the counter monotonically increases and that it must provide a uniform
view of system time (e.g. it must not be possible for a core to receive a
message from another core with a time stamp and observe time going backwards
(ARM DDI 0487G.b D11.1.2). While there have been a few 64bit SoCs that have
bugs (#49281, #56940) which cause time to not monotonically increase, these have
been fixed in the Linux kernel and we shouldn't penalize all Arm SoCs for those
who refuse to update their kernels:
SUN50I_ERRATUM_UNKNOWN1 - Allwinner A64 / Pine A64 - fixed in 5.1
FSL_ERRATUM_A008585 - Freescale LS2080A/LS1043A - fixed in 4.10
HISILICON_ERRATUM_161010101 - Hisilicon 1610 - fixed in 4.11
ARM64_ERRATUM_858921 - Cortex A73 - fixed in 4.12

255a3f3 std: Force Instant::now() to be monotonic added a Mutex to work around
this problem and a small test program using glommio shows the majority of time spent
acquiring and releasing this Mutex. 3914a7b tries to improve this, but actually
makes it worse on big systems as for 128b atomics a ldxp/stxp pair (and successful loop)
for v8.4 systems that don't support FEAT_LSE2 is required which is expensive as a lock
and because of how the load/store-exclusives scale on large Arm systems is both unfair
to threads and tends to go backwards in performance.

A small sample program using glommio improves by 70x on a 32 core Graviton2
system with this change.

While issues have been seen on arm64 platforms the Arm architecture requires that the counter monotonically increases and that it must provide a uniform view of system time (e.g. it must not be possible for a core to receive a message from another core with a time stamp and observe time going backwards (ARM DDI 0487G.b D11.1.2). While there have been a few 64bit SoCs that have bugs (rust-lang#49281, rust-lang#56940) which cause time to not monotonically increase, these have been fixed in the Linux kernel and we shouldn't penalize all Arm SoCs for those who refuse to update their kernels: SUN50I_ERRATUM_UNKNOWN1 - Allwinner A64 / Pine A64 - fixed in 5.1 FSL_ERRATUM_A008585 - Freescale LS2080A/LS1043A - fixed in 4.10 HISILICON_ERRATUM_161010101 - Hisilicon 1610 - fixed in 4.11 ARM64_ERRATUM_858921 - Cortex A73 - fixed in 4.12 255a3f3 std: Force `Instant::now()` to be monotonic added a mutex to work around this problem and a small test program using glommio shows the majority of time spent acquiring and releasing this Mutex. 3914a7b tries to improve this, but actually makes it worse on big systems as for 128b atomics a ldxp/stxp pair (and successful loop) is required which is expensive as a lock and because of how the load/store-exclusives scale on large Arm systems is both unfair to threads and tends to go backwards in performance.

rust-highfive · 2021-09-04T20:38:44Z

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @yaahc (or someone else) soon.

Please see the contribution instructions for more information.

the8472 · 2021-09-05T13:14:49Z

but actually makes it worse on big systems as for 128b atomics a ldxp/stxp pair (and successful loop)
for v8.4 systems that don't support FEAT_LSE2 is required which is expensive as a lock
and because of how the load/store-exclusives scale on large Arm systems is both unfair
to threads and tends to go backwards in performance.

This is unexpected since in the contended case an optimal implementation of a relaxed fetch_max should make better forward progress than a mutex because it can skip the conditional store part when the load returns an equal or larger value than local value. And the uncontended case should be fast either way.
But I don't know much about arm instructions, so idk if better instruction sequences would be available than what llvm is emitting.

AGSaidi · 2021-09-05T14:03:37Z

This is unexpected since in the contended case an optimal implementation of a relaxed fetch_max should make better forward progress than a mutex because it can skip the conditional store part when the load returns an equal or larger value than local value. And the uncontended case should be fast either way.
But I don't know much about arm instructions, so idk if better instruction sequences would be available than what llvm is emitting.

There isn't a 128b un-torn read instruction until arm v8.4, so unless someone (a) has an armv8.4 system (almost none exist in production today); (b) has configured RUSTFLAGS to build for v8.4+ only; (c) LLVM understands FEAT_LSE2 (i'm not sure it does) an atomic read+write is required to architectually guarantee an untorn read (e.g. both the upper and lower 8 bytes of the 128b value were read simultaneously). So even in the unconteded case every core is doing a read+write instruction. For older systems this looks like a ldxp/stxp (and the store must complete successfully for the ldxp value to be an un-torn 128b read) or on newer systems a casp. For some reason LLVM doesn't emit the casp even when configured with RUSTFLAGS="-Ctarget-feature=+lse" but even if it did, that would still be substantially more expensive than the 64b implementation.

I created PR #88651 which uses 64b reads and improves performance by about ~13x in my tests, however without this change, there is still ~5x being left on the table.

AGSaidi · 2021-09-07T14:13:14Z

A little program that demonstrates the performance impact:
instant-now-repro.tar.gz

AGSaidi · 2021-09-14T22:22:35Z

Any feedback?

yaahc · 2021-09-15T19:09:17Z

It looks good to me, I'm checking to see if we have any other ARM / time module experts that should take a look at the PR but assuming nobody else has feedback I'm happy to approve.

library/std/src/sys/unix/time.rs

m-ou-se · 2021-09-15T19:31:06Z

library/std/src/time.rs

+        // SUN50I_ERRATUM_UNKNOWN1 - Allwinner A64 / Pine A64 - fixed in 5.1
+        // FSL_ERRATUM_A008585 - Freescale LS2080A/LS1043A - fixed in 4.10
+        // HISILICON_ERRATUM_161010101 - Hisilicon 1610 - fixed in 4.11
+        // ARM64_ERRATUM_858921 - Cortex A73 - fixed in 4.12


On our platform support page we specifically name Linux 4.2 as the supported kernel. These fixes were all in later versions.

@rust-lang/libs-api How do we feel about merging this? It'd mean that running the code on an older (but supported) kernel version might give an unexpected panic where we'd have to tell the user to update their kernel (or downgrade their Rust).

While these issues were fixed in the kernel versions I mention above, the patches were also adopted into earlier stable kernels. Assuming they're on a maintained branch of a stable kernel they should have the fixes.

For what it is worth, in practice, portable code needs to defend against non-monotonic cases already. For example, here is where we fixed it in Tokio and the associated discussion is here.

It may be worth to consider documenting Instant as "best-effort monotonic".

Documenting the backports too might help. At least we can then point users encountering the error on old distros to patched versions and show that this is considered a kernel bug and there is something they can update to.

I can dig them up, but fundamentally the kernel should guarantee this is fixed. Taking it to an extreme, if a kernel+driver occasionally corrupted packets, I don't think anyone would suggest that rust's stdlib should wrap all data a user sends in an checksummed container to work around it. The answer there and the answer here should be please switch to a kernel that includes the fix.

I totally agree, but the workaround exists and if we take it back and some strange system out there starts breaking it would be helpful to be able to point out that this is an already fixed kernel bug or at least that the linux devs generally see this as their turf.

The strategic problem here is that that the fix is "easy" (if done badly...), the downsides of a fix are only noticed by a few (what's a few nanoseconds here and there?), opening a github issue is lower-friction than joining a mailing list and the libs team is excessively nice about these things because the panics may be encountered by users rather than developers in a position to fix them.

Anything that'll make future bug reports easier to handle will be helpful.

Understood... I've gone digging for backports and here is what I've found. Another approach which I'm a little hesitant to suggest would be to make is_actually_monotonic() dependent on if the system booted via UEFI (common on servers, and this issue is almost certainly fixed because time not working right has a lot of interesting implications) vs device tree (typical on dev boards and similar).

In terms of backports i can find:
HISILICON_ERRATUM_161010101 - i can't find any backports before 4.11, but this HiSilicon chip is for a server and RHEL7/CentOS 7 are the earliest OSes that supported Arm64 and use a 4.11 kernel, so I think were fine there.
FSL_ERRATUM_A008585 - Same for the same logic above.
ARM64_ERRATUM_858921 - This was fixed in 4.12 and the first hardware that used this CPU was released after the 4.12 release, so it's unlikely anyone is using an older kernel on it.
SUN50I_ERRATUM_UNKNOWN1 - The original was back-ported to the stable v4.19 (v4.19.31+) kernel, a follow on fix was back ported to v4.19.198. I've found Debian Buster and Ubuntu Bionic have the workaround.

system booted via UEFI (common on servers,

On x86 the server situation isn't great due to some hypervisors introducing non-monotonoicity in the timestamp counters which should provide monotonic results. At least KVM even has an explicit flag where the host would promise that TSC is reliable and even in those cases there still seem to be a few systems out there that make and then break the promise.
I'm not sure what the exact mechanisms are (emulating the instruction? migrating virtual cores between physical ones?).
So a chunk of monotonicity violations come from virtualized systems.

Is the ARM situation different, i.e. is the time source immune to hypervisor interference?

and this issue is almost certainly fixed because time not working right has a lot of interesting implications

I wouldn't bet on it. Some reporters say actually observing panics due to time going backwards low occurrence rate on their systems. A few times a month or something like that. Applications rarely consist of hot loops doing nothing but calling Instant::now().duration_since(Instant::now()).

Is the ARM situation different, i.e. is the time source immune to hypervisor interference?

It's possible to for the hypervisor to trap it, but at least KVM doesn't. The value is read from the hardware and then offset by a value originally programmed by the hyperivsor for that VM. If the hardware is functional the VM would have to work to break it.

I wouldn't bet on it.

I'm not specifically talking about Rust, but other applications where the time going backwards of forwards would result in certificate errors, and consensus algorithms going awry, etc.

cuviper · 2021-09-19T00:20:19Z

I'm tempted to say we should just make elapsed use saturating subtraction, and leave the monotonicity bugs with the system. Maybe even Sub<Instant> for Instant, just saturate it all the time.

yaahc · 2021-09-20T22:38:07Z

I thought I did this earlier but apparently I did not.

@rustbot ping arm

relevant comment: #88652 (comment)

rustbot · 2021-09-20T22:38:08Z

Hey ARM Group! This bug has been identified as a good "ARM candidate".
In case it's useful, here are some instructions for tackling these sorts of
bugs. Maybe take a look?
Thanks! <3

cc @adamgemmell @hug-dev @jacobbramley @JamieCunliffe @joaopaulocarreiro @raw-bin @Stammark

yaahc · 2021-09-23T22:51:18Z

Absent any concerns I want to treat this like any change to our stable API since it changes observable behavior.

@rfcbot merge

rfcbot · 2021-09-23T22:51:18Z

Team member @yaahc has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

BurntSushi · 2021-09-30T15:16:39Z

Could someone from the ARM group sign off on this? I'd feel more comfortable if we had a domain expert chime in with their support. Pinging them again: @adamgemmell @hug-dev @jacobbramley @JamieCunliffe @joaopaulocarreiro @raw-bin @Stammark

jacobbramley · 2021-09-30T19:31:00Z

Hi! Sorry, I'd got my threads mixed up.

I'm not a kernel/hypervisor expert by any stretch, but I've been assured (by one of the timer driver maintainers) that the kernel is the proper place to fix any issues here. He pointed out that the kernel will trap and emulate if necessary (to work around errata) so if we do the same in userspace, we're just doing it twice.

Overall, I think @AGSaidi's reasoning is sound, and I'm happy to approve this from the Arm perspective.

AGSaidi · 2021-10-01T13:45:33Z

@BurntSushi does that address your concern?

BurntSushi · 2021-10-05T12:44:03Z

SGTM!

rfcbot · 2021-10-05T15:59:16Z

🔔 This is now entering its final comment period, as per the review above. 🔔

rfcbot · 2021-10-15T16:03:55Z

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

This will be merged soon.

yaahc · 2021-10-15T19:14:48Z

@bors r+

bors · 2021-10-15T19:14:50Z

📌 Commit a333b91 has been approved by yaahc

…ually-monotonic, r=yaahc linux/aarch64 Now() should be actually_monotonic() While issues have been seen on arm64 platforms the Arm architecture requires that the counter monotonically increases and that it must provide a uniform view of system time (e.g. it must not be possible for a core to receive a message from another core with a time stamp and observe time going backwards (ARM DDI 0487G.b D11.1.2). While there have been a few 64bit SoCs that have bugs (rust-lang#49281, rust-lang#56940) which cause time to not monotonically increase, these have been fixed in the Linux kernel and we shouldn't penalize all Arm SoCs for those who refuse to update their kernels: SUN50I_ERRATUM_UNKNOWN1 - Allwinner A64 / Pine A64 - fixed in 5.1 FSL_ERRATUM_A008585 - Freescale LS2080A/LS1043A - fixed in 4.10 HISILICON_ERRATUM_161010101 - Hisilicon 1610 - fixed in 4.11 ARM64_ERRATUM_858921 - Cortex A73 - fixed in 4.12 255a3f3 std: Force `Instant::now()` to be monotonic added a Mutex to work around this problem and a small test program using glommio shows the majority of time spent acquiring and releasing this Mutex. 3914a7b tries to improve this, but actually makes it worse on big systems as for 128b atomics a ldxp/stxp pair (and successful loop) for v8.4 systems that don't support FEAT_LSE2 is required which is expensive as a lock and because of how the load/store-exclusives scale on large Arm systems is both unfair to threads and tends to go backwards in performance. A small sample program using glommio improves by 70x on a 32 core Graviton2 system with this change.

bors · 2021-10-16T22:09:55Z

⌛ Testing commit a333b91 with merge ba810332e0a6a101d3a86d41cf7562d21ed3e71e...

bors · 2021-10-17T02:24:56Z

💥 Test timed out

rust-log-analyzer · 2021-10-17T04:11:13Z

A job failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

matthiaskrgr · 2021-10-17T06:27:02Z

@bors retry

bors · 2021-10-17T09:30:33Z

⌛ Testing commit a333b91 with merge 1d6f242...

GuillaumeGomez · 2021-10-17T10:41:55Z

It failed in #89959.

bors · 2021-10-17T12:27:50Z

☀️ Test successful - checks-actions
Approved by: yaahc
Pushing 1d6f242 to master...

rust-timer · 2021-10-17T14:00:35Z

Finished benchmarking commit (1d6f242): comparison url.

Summary: This benchmark run did not return any relevant changes.

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

@rustbot label: -perf-regression

rust-highfive assigned yaahc Sep 4, 2021

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Sep 4, 2021

the8472 mentioned this pull request Sep 5, 2021

Use the 64b inner:monotonize() implementation not the 128b one for aarch64 #88651

Merged

the8472 added T-libs Relevant to the library team, which will review and decide on the PR/issue. A-time Area: Time labels Sep 8, 2021

dtolnay added the O-Arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state label Sep 15, 2021

m-ou-se reviewed Sep 15, 2021

View reviewed changes

library/std/src/sys/unix/time.rs Show resolved Hide resolved

m-ou-se reviewed Sep 15, 2021

View reviewed changes

yaahc added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Sep 20, 2021

rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. labels Sep 23, 2021

rfcbot added the final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. label Oct 5, 2021

rfcbot removed the proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. label Oct 5, 2021

rfcbot added to-announce Announce this issue on triage meeting and removed final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. labels Oct 15, 2021

cuviper mentioned this pull request Oct 15, 2021

add env var to override sys::Instant::actually_monotonic() for windows and unix #84448

Closed

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 15, 2021

GuillaumeGomez mentioned this pull request Oct 16, 2021

Rollup of 7 pull requests #89959

Closed

bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Oct 17, 2021

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 17, 2021

bors added the merged-by-bors This PR was explicitly merged by bors. label Oct 17, 2021

bors merged commit 1d6f242 into rust-lang:master Oct 17, 2021

rustbot added this to the 1.57.0 milestone Oct 17, 2021

bors mentioned this pull request Oct 17, 2021

make Instant::{duration_since, elapsed, sub} saturating and remove workarounds #89926

Merged

apiraino removed the to-announce Announce this issue on triage meeting label Oct 21, 2021

pacoxu mentioned this pull request Dec 11, 2023

Minimal kernel version plan: probably 4.18/4.19+ kubernetes/kubernetes#116799

Closed

27 tasks

linux/aarch64 Now() should be actually_monotonic() #88652

linux/aarch64 Now() should be actually_monotonic() #88652

Conversation

AGSaidi commented Sep 4, 2021

Uh oh!

rust-highfive commented Sep 4, 2021

Uh oh!

the8472 commented Sep 5, 2021

Uh oh!

AGSaidi commented Sep 5, 2021

Uh oh!

AGSaidi commented Sep 7, 2021

Uh oh!

AGSaidi commented Sep 14, 2021

Uh oh!

yaahc commented Sep 15, 2021

Uh oh!

Uh oh!

m-ou-se Sep 15, 2021

Choose a reason for hiding this comment

Uh oh!

AGSaidi Sep 15, 2021

Choose a reason for hiding this comment

Uh oh!

carllerche Sep 15, 2021

Choose a reason for hiding this comment

Uh oh!

the8472 Sep 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AGSaidi Sep 18, 2021

Choose a reason for hiding this comment

Uh oh!

the8472 Sep 18, 2021

Choose a reason for hiding this comment

Uh oh!

AGSaidi Sep 18, 2021

Choose a reason for hiding this comment

Uh oh!

the8472 Sep 18, 2021

Choose a reason for hiding this comment

Uh oh!

AGSaidi Sep 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cuviper commented Sep 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yaahc commented Sep 20, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rustbot commented Sep 20, 2021

Uh oh!

yaahc commented Sep 23, 2021

Uh oh!

rfcbot commented Sep 23, 2021 • edited by dtolnay Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BurntSushi commented Sep 30, 2021

Uh oh!

jacobbramley commented Sep 30, 2021

Uh oh!

AGSaidi commented Oct 1, 2021

Uh oh!

BurntSushi commented Oct 5, 2021

Uh oh!

rfcbot commented Oct 5, 2021

Uh oh!

rfcbot commented Oct 15, 2021

Uh oh!

yaahc commented Oct 15, 2021

Uh oh!

bors commented Oct 15, 2021

Uh oh!

bors commented Oct 16, 2021

Uh oh!

bors commented Oct 17, 2021

Uh oh!

the8472 Sep 15, 2021 •

edited

Loading

AGSaidi Sep 18, 2021 •

edited

Loading

cuviper commented Sep 19, 2021 •

edited

Loading

yaahc commented Sep 20, 2021 •

edited

Loading

rfcbot commented Sep 23, 2021 •

edited by dtolnay

Loading