-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: time goes backwards on windows-arm64 (frequent TestGcLastTime failures) #48072
Comments
These failures are new as of Go 1.17 because (If we can determine that this test failure does not demonstrate a significant problem on the new platform, it should be skipped on that platform and the issue can be moved to the backlog.) CC @golang/release |
I looked through some of the logs and they all show time (as measured by What's interesting is they all appear to roll over in the last 8 digits. The more significant digits are the same between the two reads. To me this suggests a shear when reading the time. 1630030642099881100 => 1630030642000853600 But I can't figure out how we would get shear at 8 decimal digits specifically. |
As an experiment, I did the same comparison in hex. There's no obvious roll-over pattern in hex, so this definitely seems to be related to a decimal representation of time.
|
Also cc @bufflig and @alexbrainman for thoughts. I'm not worried about the test itself, but this seems to indicate a more fundamental issue with reading time on windows/arm64. |
Actually, it's possible this was caused by #48476, which was fixed three days ago. |
Nevermind, #48476 only applied to arm32. |
It could be a bug in OS. Not many people use windows-arm64 OS. Alex |
cc @mknyszek |
Ping @bufflig |
CC @jeremyfaller. |
@jstarks Would you be able to also take a look at this issue? Thanks. |
In runtime·nanotime1, the high-low-high algorithm is used to read the current 64-bit interrupt time as three 32-bit values. This code was probably copied from the AMD64 version. However, ARM64 has a looser memory model than x86/AMD64, so you need some extra memory barriers to make the algorithm work. These are missing (as @aclements points out). The data points above are consistent with a reordering of the loads, such that the low 32 bits is not correctly matched to the high 32 bits. This can cause the low 32 bits to move backward between measurements, or more generally be wrong when the high bits are changing. The one confusing entry was where it appeared that high value was moving backward too, but that's because aclements's values were multiplied by 100 after they were read. The fix is not to add memory barriers but to do a single 64-bit read of the time in one instruction. That's what we do in Windows (and I recommend changing the AMD64 code to match, even though there's no race condition due to the stricter memory model). |
I missed the ping due to messed up mail notification config. I'll pick it up. Thanks for the analysis! |
Change https://golang.org/cl/361057 mentions this issue: |
@bufflig I submitted a CL already: https://go-review.googlesource.com/c/go/+/361057 |
Ah, awesome. Thanks! |
That's not really thoroughly tested, so if you want to run it through the paces on various hardware, feel free. |
Ack, I will run it on anything I can get my hands on :) |
Dividing by 100 (decimal) and then comparing in hex does indeed reveal a much clearer pattern: 0x39e90ad5f4229b => 0x39e90ad5e50658 What I'm still missing is that roll-over seems to be happening at 24 bits. If the two 32-bit reads were inconsistent, shouldn't we see the roll-over at 32 bits? (Using a single 64-bit load is a clear improvement and may well fix this bug, I'm just not quite seeing how all the dots connect yet. I blame any slowness on my part on baby-induced sleep deprivation. :) |
Is this just a matter of the actual timing of the race and the precision of the clock? That is, maybe those top 8 bits just didn't change between reads, because that takes longer? |
On 64-bit, this is more efficient, and on ARM64, this prevents the time from moving backwards due to the weaker memory model. On ARM32 due to the weaker memory model, we issue a memory barrier. Updates #48072. Change-Id: If4695716c3039d8af14e14808af217f5c99fc93a Reviewed-on: https://go-review.googlesource.com/c/go/+/361057 Trust: Jason A. Donenfeld <[email protected]> Run-TryBot: Jason A. Donenfeld <[email protected]> TryBot-Result: Go Bot <[email protected]> Reviewed-by: Austin Clements <[email protected]>
I believe that this did not actually fix the problem. On arm64 time still moves backwards. Looking at the assembler, it appears to me that it's rather a typo - it calculates (x / 1000000000, x % 100000000) instead of (x / 1000000000, x % 1000000000) (one zero to few in the x % expression). That I think fits perfectly with the symptoms as well. Or am I reading it completely wrong? |
Haha! You're right. The comment has the right constant but the code has the wrong one. go/src/runtime/time_windows_arm64.s Line 45 in cfd016d
The memory ordering fix is still necessary for correctness but I guess it's not actually hitting in practice. |
Your reading of the assembly looks correct. And actually, this matches what's in the comment too: // Code stolen from compiler output for:
//
// var x uint64
// func f() (sec uint64, nsec uint32) { return x / 1000000000, uint32(x % 100000000) }
// One less zero in that second expression. But actually, that code is all over the place:
Seems like a good catch? |
Wait, they both look wrong to me... |
The comment has 9 zeroes, the constant has 8. |
Looks like on the remaining two platforms, the constant is fine and doesn't match the comment. But on ARM64 the constant and comment are both wrong. Sending a patch. |
Gah, thanks for spotting that. Perhaps the assembler should support the |
Change https://golang.org/cl/361474 mentions this issue: |
@gopherbot please backport this to 1.17 |
Backport issue(s) opened: #49369 (for 1.17). Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases. |
Change https://golang.org/cl/361476 mentions this issue: |
Change https://golang.org/cl/361475 mentions this issue: |
…nstruction or issue barrier On 64-bit, this is more efficient, and on ARM64, this prevents the time from moving backwards due to the weaker memory model. On ARM32 due to the weaker memory model, we issue a memory barrier. Updates #48072. Updates #49369. Change-Id: If4695716c3039d8af14e14808af217f5c99fc93a Reviewed-on: https://go-review.googlesource.com/c/go/+/361057 Trust: Jason A. Donenfeld <[email protected]> Run-TryBot: Jason A. Donenfeld <[email protected]> TryBot-Result: Go Bot <[email protected]> Reviewed-by: Austin Clements <[email protected]> Reviewed-on: https://go-review.googlesource.com/c/go/+/361475 Reviewed-by: Patrik Nyblom <[email protected]>
…nsec remainder A code comment on amd64 for windows and plan9 contained a snippet for splitting apart the sec and nsec components of a unix timestamp, with produced assembly below, which was then cleaned up by hand. When arm64 was ported, that code snippet in the comment went through the compiler to produce some code that was then pasted and cleaned up. Unfortunately, the comment had a typo in it, containing 8 zeros instead of 9. This resulted in the constant used in the assembly being wrong, spotted by @bufflig's eagle eyes. So, this commit fixes the comment on all three platforms, and the assembly on windows/arm64. Updates #48072. Fixes #49369. Change-Id: I786fe89147328b0d25544f52c927ddfdb9f6f1cf Reviewed-on: https://go-review.googlesource.com/c/go/+/361474 Trust: Jason A. Donenfeld <[email protected]> Run-TryBot: Jason A. Donenfeld <[email protected]> Reviewed-by: Patrik Nyblom <[email protected]> TryBot-Result: Go Bot <[email protected]> Reviewed-on: https://go-review.googlesource.com/c/go/+/361476
2021-08-26T19:07:27-1f8d456/windows-arm64-10
2021-08-22T13:14:19-bd68459/windows-arm64-10
2021-08-19T09:11:02-3bdc179/windows-arm64-10
2021-08-16T23:03:03-df9c5d8/windows-arm64-10
2021-08-13T00:20:28-4c8ffb3/windows-arm64-10
2021-06-29T19:30:24-e294b8a/windows-arm64-10
2021-06-08T05:02:19-c20bcb6/windows-arm64-aws
2021-06-05T19:52:26-e1fa260/windows-arm64-aws
The text was updated successfully, but these errors were encountered: