Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Microsecond round up code doesn't round up. #12

Closed
skliper opened this issue Sep 13, 2019 · 16 comments · Fixed by #406 or #433
Closed

Microsecond round up code doesn't round up. #12

skliper opened this issue Sep 13, 2019 · 16 comments · Fixed by #406 or #433
Assignees
Milestone

Comments

@skliper
Copy link
Contributor

skliper commented Sep 13, 2019

Describe the bug
Spawned from #1

The code comment claims it rounds up to never return zero. The formula implemented doesn’t actually round up in all cases, since generally when casting a float/double to an int you lose the fractional part (truncation, not rounding). So the code is not self-consistent. It’s not a POSIX or OS issue, it’s that the code doesn’t do what it says it does. The API document doesn’t specify a non-zero guarantee.

osal/src/os/posix/ostimer.c

Lines 284 to 290 in bfa7a33

/*
* Calculate microseconds per tick - Rounding UP
* - If the ratio is not an integer, this will choose the next higher value
* - The result is guaranteed not to be zero.
*/
OS_SharedGlobalVars.MicroSecPerTick = (1000000 + (OS_SharedGlobalVars.TicksPerSecond / 2)) /
OS_SharedGlobalVars.TicksPerSecond;

Similar misleading comment at:

osal/src/os/posix/ostimer.c

Lines 231 to 232 in bfa7a33

/* Round to the nearest microsecond */
POSIX_GlobalVars.ClockAccuracyNsec = (uint32)(clock_resolution.tv_nsec);

For what it’s worth, on Linux (our Ubuntu dev system) this code reports 100 ticks per second, and 10000 usec per tick. But if you pass in high values for ticks per second, it does return zero when it claims to round up (try 2000000 ticks per second).

To Reproduce
Steps to reproduce the behavior:

  1. Compile
#include <stdio.h>
void main()
{
  float num = 0.7;
  printf("float = %f, cast = %d\n", num, (int)num);
}
  1. Execute:
float = 0.700000, cast = 0

Expected behavior
Expected code to match comment, round up to not equal zero. Algorithm doesn't work as claimed in comment.

System observed on:

Additional context
Add any other context about the problem here.

Reporter Info
Jacob Hageman/NASA-GSFC

@jphickey
Copy link
Contributor

OK, I reviewed this code again, and I concur that the comment is not quite accurate. It is simply rounding to the nearest value, not necessarily "up", and the result can be zero if there is less than 0.5us per tick or >2M ticks per second.

That being said, tick rates that fast are arguably very impractical, and it is pretty unusual to have a system with a tick rate faster than 1ms (1000us period or 1000 ticks per second) in practice. OSAL has always historically done its time calculations using this "micro seconds per tick" value, so if we actually do need to support systems that actually have a tick rate of less than 1us per tick, then the whole calculation needs to change to support the higher resolution. This I would think would be a requirements-driven change.

Note that even though the comment (erroneously) says that the result cannot be zero, this is actually enforced later before OS_API_Init() returns, i.e.

   /*
    * Confirm that somewhere during initialization,
    * the time variables got set to something valid
    */
   if (return_code == OS_SUCCESS &&
         (OS_SharedGlobalVars.MicroSecPerTick == 0 ||
         OS_SharedGlobalVars.TicksPerSecond == 0))
   {
      OS_DEBUG("Implementation failed to initialize tick time globals\n");
      return_code = OS_ERROR;
   }

Therefore I think the code is totally safe, the result is guaranteed to not be zero, just not through the calculation itself but rather a check/verification afterward.

The only change here might be to rephrase the comment to better reflect what actually happens.

@jphickey jphickey self-assigned this Sep 16, 2019
@skliper
Copy link
Contributor Author

skliper commented Sep 16, 2019

Thanks for reviewing. I recommend simplifying the code also since the complexity doesn't do anything. Concur with updating the comment (and like worth noting in the comment that zero gets checked later).

Likely worth documenting somewhere at a top design level this current limitation (1000 Hz). I wonder if the drone folks ran into this, or anyone else trying to run high rate control systems.

@jphickey
Copy link
Contributor

Note there is a factor of 1000 difference here. The "limitation" is not at 1000Hz, but 1000000Hz (1 MHz) tick frequency, or 1us per tick period.

The major thing to note is that since OSAL does delay calculations based on "micro seconds per tick" that anything which is not a whole/integer number of microseconds per tick will be subject to rounding errors. There is no way around this. But at rats <= 1000Hz the rounding errors will be relatively small and probably not noticeable. But at higher rates, it could become more evident (i.e. in the extreme case, if a calculation used 2usec/tick instead of 1.5usec/tick, the delay would be 33% longer than expected).

@jphickey
Copy link
Contributor

recommend simplifying the code also since the complexity doesn't do anything.

Can you clarify? The "rounding" action in the code is useful, I think, to minimize the impact of tick rates that don't equate to an integer number of microseconds per tick. For instance, again in an extreme case, if the period is 1.9 us/tick then the value will become 2, rather than 1, so at least the delay calculation will be closer to the real value.

@joelsherrill
Copy link

joelsherrill commented Sep 20, 2019 via email

@skliper
Copy link
Contributor Author

skliper commented Sep 23, 2019

recommend simplifying the code also since the complexity doesn't do anything.

Can you clarify? The "rounding" action in the code is useful, I think, to minimize the impact of tick rates that don't equate to an integer number of microseconds per tick. For instance, again in an extreme case, if the period is 1.9 us/tick then the value will become 2, rather than 1, so at least the delay calculation will be closer to the real value.

@jphickey - This code doesn't do that. 1.9 does not become 2, it becomes 1 with or without this extra logic attempting to round up. 1.1 converts to 1, 1.9 converts to 1.

@skliper
Copy link
Contributor Author

skliper commented Sep 23, 2019

@joelsherrill - agreed, debug reporting when a non-exact conversion is made is a great idea.

@jphickey
Copy link
Contributor

jphickey commented Sep 23, 2019

@jphickey - This code doesn't do that. 1.9 does not become 2, it becomes 1 with or without this extra logic attempting to round up. 1.1 converts to 1, 1.9 converts to 1.

@skliper -- are we talking about the same code?

I'm looking at this line:

OS_SharedGlobalVars.MicroSecPerTick = (1000000 + (OS_SharedGlobalVars.TicksPerSecond / 2)) /
             OS_SharedGlobalVars.TicksPerSecond;

If TicksPerSecond, for arguments sake, is something like 525000 (which is an unrealistic number, but consider it anyway), that equates to about 1.905 usec/tick.

If you simply compute 1000000/525000 as integer division, you get 1.

If you compute the value as coded, you get (1000000+262500)/525000, which is 2.

We could warn about an inexact conversion to micro seconds per tick, but I'm not really sure what it buys you. And it would only show up when debug is enabled and you are looking at the console. It certainly isn't something I would advocate returning OS_ERROR about (which is basically a panic)

Edit: Corrected a typo in my original comment. Logic is the same.

@jphickey
Copy link
Contributor

Another Note: Upon further review it is worth mentioning that this MicroSecPerTick value is not actually used for any meaningful purpose by OSAL.

For OS_TaskDelay() implementation, the calculation is done in an OS-specific manner. VxWorks uses TicksPerSecond (the original value) and RTEMS uses nanoseconds per tick for only the fractional part of the delay.

The only thing this is used for is to output a normalized value for the accuracy output of OS_TimerCreate(), just in case the application cares about how accurate its timer is likely to be.

@skliper
Copy link
Contributor Author

skliper commented Sep 23, 2019

@jphickey I think we are both saying the same thing. Any values > 2M return zero. By "doesn't do anything" I should have stated instead "doesn't guarantee non-zero", or "doesn't do anything stated in rational or a requirement". If the code doesn't do what the comment states (guarantee non-zero), I'm curious why the code bothers to round up at all. What is the required behavior, vs just implementing a round up to round up. Is rounding up better than truncating? Both answers aren't exact, which is why I also concur with the request to add user notification if the result isn't exact. A debug notification is better than nothing, and shows intent.

If the rational is to round up such that the reported accuracy bounds the actual accuracy I'd consider that sufficient as long as it gets added to the comment. But without rational or a requirement it's really not obvious to me why a round up is necessary.

@skliper
Copy link
Contributor Author

skliper commented Sep 23, 2019

@jphickey oh, I see now my example wasn't a good one illustrating my concern. I should have said 0.9 becomes 0, in that it doesn't round up to avoid zero (which I guess the comment doesn't say exactly, but it does guarantee non-zero).

@jphickey
Copy link
Contributor

@skliper As I said earlier, I do concur that the code comment is incorrect/inaccurate. It is not a round "up" specifically, it is rounding it to the nearest integer, which could be up or down, in an attempt to produce the closest integer approximation of the real value.

The value of 0.9 will get rounded to 1, not 0. (i.e. any fractional part greater than or equal to 0.5 goes to the next higher integer, whereas any fractional part less than 0.5 goes to the next lower integer).

I'd like to summarize the discussion here with some type of action item, if possible. I self assigned this ticket but I'm still not entirely sure what I'm changing here, aside from the comment itself.

To go back to requirements, perhaps there is one in an internal requirements document somewhere?
In the published OSAL Library API document, for the "accuracy" output of OS_TimerCreate, it only describes it as the following:

*clock_accuracy: A pointer to the variable where the accuracy of the timer is stored. The accuracy is in microseconds. This parameter will give an indication of the minimum clock resolution of the timer.

Likewise, For OS_Tick2Micros, the API document just says it returns "microseconds per operating tick".

In none of these cases does it definitively say what will happen if the operating system tick period is not an exact integer number of microseconds, nor does it even say it cannot be zero. So we don't have a specific "requirement" here to follow.

  • In the current implementation as it stands today, it is rounded to the nearest integer. IMO this is perfectly acceptable/reasonable given the lack of detail in the specification here.

  • It's been proposed to change the code to do the equivalent of floor() (i.e. truncate/next lower int) or do the equivalent of ceil() (i.e. to next higher int). But I'm not convinced these are more correct, given how these outputs are specified.

Here is what I propose as a potential resolution to this issue:

  • Fix the comment to say that it is choosing the nearest integer value. (not that it is rounding up, nor any indication of avoiding zero, at least not here).
  • In the shared layer, Multiply the MicroSecPerTick by TicksPerSecond and confirm that the result is exactly 1000000. If not, generate a OS_DEBUG statement indicating that the value is not exact.

Please confirm if this proposed change will satisfy this concern?

@skliper
Copy link
Contributor Author

skliper commented Sep 23, 2019

@jphickey casting 0.9 to an integer = 0. That's all I'm saying. I agree the logic rounds to the nearest integer, and that behavior is fine as long as it's obvious as to why.

Since we don't have a requirement, with the comment update I'd prefer

  1. adding that it rounds to the nearest integer and why (it's only used internally for reporting accuracy, any rounding sends a debug message) so intent is clear to a user or maintainer
  2. remove the non-zero guarantee language, and state TicksPerSecond values over 2M (impractical number, more typically 1000 or less) will return zero

OS_DEBUG action is perfect.

Would it be helpful to update the OS_TimerCreate clock_accuracy description (in the header and API doc) to state if needed this value is rounded to the nearest integer (vs floor or ceiling to bound accuracy)?

*updated to clarify logic statement (vs "it")

@skliper skliper added this to the 5.1.0 milestone Oct 22, 2019
@skliper
Copy link
Contributor Author

skliper commented Mar 15, 2020

Updated OS_TimerCreate API documentation to reflect the comments above as part of #364 (partial resolution).

Still pending OS_DEBUG action, and the two comment updates noted above.

EDIT - added related issue

skliper added a commit to skliper/osal that referenced this issue Mar 15, 2020
Document/comment changes only
Partially addresses nasa#255, nasa#12, nasa#18, nasa#366
skliper added a commit to skliper/osal that referenced this issue Mar 15, 2020
Document/comment changes only
Partially addresses nasa#255, nasa#12, nasa#18, nasa#366
@skliper
Copy link
Contributor Author

skliper commented Mar 18, 2020

@dmknutsen - want to finish the rest of this one? Comments update and debug message.

@dmknutsen
Copy link
Contributor

Apologies...just realized I missed this. Sure, if it is still open...go ahead and assign it to me.

astrogeco added a commit that referenced this issue Apr 21, 2020
Fix #12, Comment update to correct for microseconds not always rounding up + a…
@astrogeco astrogeco linked a pull request Apr 27, 2020 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment