Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race condition causing ADCs to read stale values #203

Merged
merged 1 commit into from
Apr 20, 2020

Conversation

TheZoq2
Copy link
Member

@TheZoq2 TheZoq2 commented Apr 20, 2020

Fixes #201

This was a fun bug to track down.

Some decided that it would be a good idea to make one way of starting the ADC conversion to write a 1 to the adon bit if the adon bit is already one. But that might trigger a read when you want to modify other parts of the register, so this behaviour only happens if no other bits are modified. Thus, modify on the cr2 register will start a conversion unless at least one bit is actually modified.

This was accidentally done as the first part of the convert function, which started a conversion which would be ready roughly by the time the CPU starts checking for eoc bits.

The race condition occurs if the CPU gets to the EOC check, before the ADC has time to reset the bit as a result of the swstart trigger. This relies on the CPU being faster than the ADC clock which explains the behaviour at high prescalers.

Finally, it seems like the line

while self.rb.cr2.read().swstart().bit_is_set() {}

should have prevented this as the ADC should have taken care of the swstart and reset the EOC before that line. For whatever reason, that did not happen

Thnaks @adamgreig for helping debug this :), and for a more detailed log of the steps we used to track it down, see the commit message.

This was a *fun* bug to track down. Initially, it appeared as an out of
order read of multiple ADC channels (the second would read the first
channel, the third from the second, and so on). This only happened with
a prescaler value of `8`, for other values, the corect behaviour was
observed.

`git bisect` traced the issue back to a version bump of the `pac` crate
which shouldn't have affected things. Further git bisection hinted at
the commit bumping version number in the PAC was the issue. This is of
course nonsensical, and the key change that caused the **symptoms** was
a bump of the svd2rust crate, from 0.14 to 0.15.

Dumping the memory in the ADC and RCC peripherals showed barely any
difference, apart from the newer `pac` version randomly having the
`swstart` and `eoc` bits set after the second line of the convert
function. After further single stepping, those bits were set after a
write of `0001e0001` to `cr2`. This was the value that was already in
the register, so why would that start a conversion?

Well, someone decided that it would be a good idea to make one way of
starting the ADC conversion to write a `1` to the `adon` bit if the
`adon` bit is already one. But that might trigger a read when you want
to modify other parts of the register, so this behaviour only happens if
no other bits are modified. Thus, `modify` on the `cr2` register will
start a conversion unless at least one bit is *actually* modified.

This was accidentally done as the first part of the `convert` function,
which started a conversion which would be ready roughly by the time the
CPU starts checking for `eoc` bits.

The race condition occurs if the CPU gets to the `EOC` check, before the
ADC has time to reset the bit as a result of the `swstart` trigger. This
relies on the CPU being faster than the ADC clock which explains the
behaviour at high prescalers.

Finally, it seems like the line
```rust
while self.rb.cr2.read().swstart().bit_is_set() {}
```
*should* have prevented this as the ADC should have taken care of the
`swstart` and reset the `EOC` before that line. For whatever reason,
that did not happen

Co-authored-by: Adam Greig <[email protected]>
@TheZoq2
Copy link
Member Author

TheZoq2 commented Apr 20, 2020

I forgot to mention that we think this commit fixes the symptoms of this issue, but it is still possible for it to occur.

There is no way to tell if the ADC is currently doing a conversion, so if a conversion is accidentally started right before calling convert, it might not be done when the EOC is cleared on the first line. But in practice, this should not be a problem, and we can avoid it by not writing to cr2 right before convert

Copy link
Member

@therealprof therealprof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@therealprof therealprof merged commit 23cf91a into stm32-rs:master Apr 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ADC reads are out of order at certain frequencies
2 participants