Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
linux/aarch64 Now() should be actually_monotonic() #88652
linux/aarch64 Now() should be actually_monotonic() #88652
Changes from all commits
a333b91
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On our platform support page we specifically name Linux 4.2 as the supported kernel. These fixes were all in later versions.
@rust-lang/libs-api How do we feel about merging this? It'd mean that running the code on an older (but supported) kernel version might give an unexpected panic where we'd have to tell the user to update their kernel (or downgrade their Rust).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While these issues were fixed in the kernel versions I mention above, the patches were also adopted into earlier stable kernels. Assuming they're on a maintained branch of a stable kernel they should have the fixes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For what it is worth, in practice, portable code needs to defend against non-monotonic cases already. For example, here is where we fixed it in Tokio and the associated discussion is here.
It may be worth to consider documenting
Instant
as "best-effort monotonic".There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documenting the backports too might help. At least we can then point users encountering the error on old distros to patched versions and show that this is considered a kernel bug and there is something they can update to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can dig them up, but fundamentally the kernel should guarantee this is fixed. Taking it to an extreme, if a kernel+driver occasionally corrupted packets, I don't think anyone would suggest that rust's stdlib should wrap all data a user sends in an checksummed container to work around it. The answer there and the answer here should be please switch to a kernel that includes the fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I totally agree, but the workaround exists and if we take it back and some strange system out there starts breaking it would be helpful to be able to point out that this is an already fixed kernel bug or at least that the linux devs generally see this as their turf.
The strategic problem here is that that the fix is "easy" (if done badly...), the downsides of a fix are only noticed by a few (what's a few nanoseconds here and there?), opening a github issue is lower-friction than joining a mailing list and the libs team is excessively nice about these things because the panics may be encountered by users rather than developers in a position to fix them.
Anything that'll make future bug reports easier to handle will be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood... I've gone digging for backports and here is what I've found. Another approach which I'm a little hesitant to suggest would be to make
is_actually_monotonic()
dependent on if the system booted via UEFI (common on servers, and this issue is almost certainly fixed because time not working right has a lot of interesting implications) vs device tree (typical on dev boards and similar).In terms of backports i can find:
HISILICON_ERRATUM_161010101 - i can't find any backports before 4.11, but this HiSilicon chip is for a server and RHEL7/CentOS 7 are the earliest OSes that supported Arm64 and use a 4.11 kernel, so I think were fine there.
FSL_ERRATUM_A008585 - Same for the same logic above.
ARM64_ERRATUM_858921 - This was fixed in 4.12 and the first hardware that used this CPU was released after the 4.12 release, so it's unlikely anyone is using an older kernel on it.
SUN50I_ERRATUM_UNKNOWN1 - The original was back-ported to the stable v4.19 (v4.19.31+) kernel, a follow on fix was back ported to v4.19.198. I've found Debian Buster and Ubuntu Bionic have the workaround.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On x86 the server situation isn't great due to some hypervisors introducing non-monotonoicity in the timestamp counters which should provide monotonic results. At least KVM even has an explicit flag where the host would promise that TSC is reliable and even in those cases there still seem to be a few systems out there that make and then break the promise.
I'm not sure what the exact mechanisms are (emulating the instruction? migrating virtual cores between physical ones?).
So a chunk of monotonicity violations come from virtualized systems.
Is the ARM situation different, i.e. is the time source immune to hypervisor interference?
I wouldn't bet on it. Some reporters say actually observing panics due to time going backwards low occurrence rate on their systems. A few times a month or something like that. Applications rarely consist of hot loops doing nothing but calling
Instant::now().duration_since(Instant::now())
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible to for the hypervisor to trap it, but at least KVM doesn't. The value is read from the hardware and then offset by a value originally programmed by the hyperivsor for that VM. If the hardware is functional the VM would have to work to break it.
I'm not specifically talking about Rust, but other applications where the time going backwards of forwards would result in certificate errors, and consensus algorithms going awry, etc.