Microsecond round up code doesn't round up. #12

skliper · 2019-09-13T20:34:07Z

Describe the bug
Spawned from #1

The code comment claims it rounds up to never return zero. The formula implemented doesn’t actually round up in all cases, since generally when casting a float/double to an int you lose the fractional part (truncation, not rounding). So the code is not self-consistent. It’s not a POSIX or OS issue, it’s that the code doesn’t do what it says it does. The API document doesn’t specify a non-zero guarantee.

osal/src/os/posix/ostimer.c

Lines 284 to 290 in bfa7a33

    
                  /* 
        
                   * Calculate microseconds per tick - Rounding UP 
        
                   *  - If the ratio is not an integer, this will choose the next higher value 
        
                   *  - The result is guaranteed not to be zero. 
        
                   */ 
        
                  OS_SharedGlobalVars.MicroSecPerTick = (1000000 + (OS_SharedGlobalVars.TicksPerSecond / 2)) / 
        
                        OS_SharedGlobalVars.TicksPerSecond;

Similar misleading comment at:

osal/src/os/posix/ostimer.c

Lines 231 to 232 in bfa7a33

    
           /* Round to the nearest microsecond */ 
        
           POSIX_GlobalVars.ClockAccuracyNsec = (uint32)(clock_resolution.tv_nsec);

For what it’s worth, on Linux (our Ubuntu dev system) this code reports 100 ticks per second, and 10000 usec per tick. But if you pass in high values for ticks per second, it does return zero when it claims to round up (try 2000000 ticks per second).

To Reproduce
Steps to reproduce the behavior:

Compile

#include <stdio.h>
void main()
{
  float num = 0.7;
  printf("float = %f, cast = %d\n", num, (int)num);
}

Execute:

float = 0.700000, cast = 0

Expected behavior
Expected code to match comment, round up to not equal zero. Algorithm doesn't work as claimed in comment.

System observed on:

cFS development server
OS: 4.4.0-146-generic vxWorks OSAL implementation should use static initialization where possible #172-Ubuntu
Versions: n/a, sample code shows behavior (although I poked in cFE to cause the same thing)

Additional context
Add any other context about the problem here.

Reporter Info
Jacob Hageman/NASA-GSFC

The text was updated successfully, but these errors were encountered:

jphickey · 2019-09-16T16:14:14Z

OK, I reviewed this code again, and I concur that the comment is not quite accurate. It is simply rounding to the nearest value, not necessarily "up", and the result can be zero if there is less than 0.5us per tick or >2M ticks per second.

That being said, tick rates that fast are arguably very impractical, and it is pretty unusual to have a system with a tick rate faster than 1ms (1000us period or 1000 ticks per second) in practice. OSAL has always historically done its time calculations using this "micro seconds per tick" value, so if we actually do need to support systems that actually have a tick rate of less than 1us per tick, then the whole calculation needs to change to support the higher resolution. This I would think would be a requirements-driven change.

Note that even though the comment (erroneously) says that the result cannot be zero, this is actually enforced later before OS_API_Init() returns, i.e.

   /*
    * Confirm that somewhere during initialization,
    * the time variables got set to something valid
    */
   if (return_code == OS_SUCCESS &&
         (OS_SharedGlobalVars.MicroSecPerTick == 0 ||
         OS_SharedGlobalVars.TicksPerSecond == 0))
   {
      OS_DEBUG("Implementation failed to initialize tick time globals\n");
      return_code = OS_ERROR;
   }

Therefore I think the code is totally safe, the result is guaranteed to not be zero, just not through the calculation itself but rather a check/verification afterward.

The only change here might be to rephrase the comment to better reflect what actually happens.

skliper · 2019-09-16T16:26:03Z

Thanks for reviewing. I recommend simplifying the code also since the complexity doesn't do anything. Concur with updating the comment (and like worth noting in the comment that zero gets checked later).

Likely worth documenting somewhere at a top design level this current limitation (1000 Hz). I wonder if the drone folks ran into this, or anyone else trying to run high rate control systems.

jphickey · 2019-09-19T21:43:49Z

Note there is a factor of 1000 difference here. The "limitation" is not at 1000Hz, but 1000000Hz (1 MHz) tick frequency, or 1us per tick period.

The major thing to note is that since OSAL does delay calculations based on "micro seconds per tick" that anything which is not a whole/integer number of microseconds per tick will be subject to rounding errors. There is no way around this. But at rats <= 1000Hz the rounding errors will be relatively small and probably not noticeable. But at higher rates, it could become more evident (i.e. in the extreme case, if a calculation used 2usec/tick instead of 1.5usec/tick, the delay would be 33% longer than expected).

jphickey · 2019-09-19T21:48:10Z

recommend simplifying the code also since the complexity doesn't do anything.

Can you clarify? The "rounding" action in the code is useful, I think, to minimize the impact of tick rates that don't equate to an integer number of microseconds per tick. For instance, again in an extreme case, if the period is 1.9 us/tick then the value will become 2, rather than 1, so at least the delay calculation will be closer to the real value.

joelsherrill · 2019-09-20T01:34:07Z

Ensuring the configured operating system time per clock tick is a common denominator of all periods is a fundamental and important thing to do during system integration. It is critical that mathematical slop from mismatches and rounding be avoided. In general delays and operations measured in ticks are guaranteed to be a minimum of the number of ticks for most operating system API specifications. This usually means that that a delay for n ticks can be up to slightly less than n+1 ticks. I recommend that when using RTEMS, people configure the tick quantum so all the map work out for their delays and periods. Given that this code should be being used in systems where this mathematical and rounding error could be critical, it might be worth it to add a debug mode which flags when the math works out badly. ⁣--joel On Sep 19, 2019, 5:48 PM, at 5:48 PM, Joseph Hickey <[email protected]> wrote: recommend simplifying the code also since the complexity doesn't do anything. Can you clarify? The "rounding" action in the code is useful, I think, to minimize the impact of tick rates that don't equate to an integer number of microseconds per tick. For instance, again in an extreme case, if the period is 1.9 us/tick then the value will become 2, rather than 1, so at least the delay calculation will be closer to the real value. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

skliper · 2019-09-23T14:57:30Z

recommend simplifying the code also since the complexity doesn't do anything.

Can you clarify? The "rounding" action in the code is useful, I think, to minimize the impact of tick rates that don't equate to an integer number of microseconds per tick. For instance, again in an extreme case, if the period is 1.9 us/tick then the value will become 2, rather than 1, so at least the delay calculation will be closer to the real value.

@jphickey - This code doesn't do that. 1.9 does not become 2, it becomes 1 with or without this extra logic attempting to round up. 1.1 converts to 1, 1.9 converts to 1.

skliper · 2019-09-23T15:01:09Z

@joelsherrill - agreed, debug reporting when a non-exact conversion is made is a great idea.

jphickey · 2019-09-23T15:20:58Z

@jphickey - This code doesn't do that. 1.9 does not become 2, it becomes 1 with or without this extra logic attempting to round up. 1.1 converts to 1, 1.9 converts to 1.

@skliper -- are we talking about the same code?

I'm looking at this line:

OS_SharedGlobalVars.MicroSecPerTick = (1000000 + (OS_SharedGlobalVars.TicksPerSecond / 2)) /
             OS_SharedGlobalVars.TicksPerSecond;

If TicksPerSecond, for arguments sake, is something like 525000 (which is an unrealistic number, but consider it anyway), that equates to about 1.905 usec/tick.

If you simply compute 1000000/525000 as integer division, you get 1.

If you compute the value as coded, you get (1000000+262500)/525000, which is 2.

We could warn about an inexact conversion to micro seconds per tick, but I'm not really sure what it buys you. And it would only show up when debug is enabled and you are looking at the console. It certainly isn't something I would advocate returning OS_ERROR about (which is basically a panic)

Edit: Corrected a typo in my original comment. Logic is the same.

jphickey · 2019-09-23T16:01:43Z

Another Note: Upon further review it is worth mentioning that this MicroSecPerTick value is not actually used for any meaningful purpose by OSAL.

For OS_TaskDelay() implementation, the calculation is done in an OS-specific manner. VxWorks uses TicksPerSecond (the original value) and RTEMS uses nanoseconds per tick for only the fractional part of the delay.

The only thing this is used for is to output a normalized value for the accuracy output of OS_TimerCreate(), just in case the application cares about how accurate its timer is likely to be.

skliper · 2019-09-23T17:46:09Z

@jphickey I think we are both saying the same thing. Any values > 2M return zero. By "doesn't do anything" I should have stated instead "doesn't guarantee non-zero", or "doesn't do anything stated in rational or a requirement". If the code doesn't do what the comment states (guarantee non-zero), I'm curious why the code bothers to round up at all. What is the required behavior, vs just implementing a round up to round up. Is rounding up better than truncating? Both answers aren't exact, which is why I also concur with the request to add user notification if the result isn't exact. A debug notification is better than nothing, and shows intent.

If the rational is to round up such that the reported accuracy bounds the actual accuracy I'd consider that sufficient as long as it gets added to the comment. But without rational or a requirement it's really not obvious to me why a round up is necessary.

skliper · 2019-09-23T18:02:21Z

@jphickey oh, I see now my example wasn't a good one illustrating my concern. I should have said 0.9 becomes 0, in that it doesn't round up to avoid zero (which I guess the comment doesn't say exactly, but it does guarantee non-zero).

jphickey · 2019-09-23T20:24:55Z

@skliper As I said earlier, I do concur that the code comment is incorrect/inaccurate. It is not a round "up" specifically, it is rounding it to the nearest integer, which could be up or down, in an attempt to produce the closest integer approximation of the real value.

The value of 0.9 will get rounded to 1, not 0. (i.e. any fractional part greater than or equal to 0.5 goes to the next higher integer, whereas any fractional part less than 0.5 goes to the next lower integer).

I'd like to summarize the discussion here with some type of action item, if possible. I self assigned this ticket but I'm still not entirely sure what I'm changing here, aside from the comment itself.

To go back to requirements, perhaps there is one in an internal requirements document somewhere?
In the published OSAL Library API document, for the "accuracy" output of OS_TimerCreate, it only describes it as the following:

*clock_accuracy: A pointer to the variable where the accuracy of the timer is stored. The accuracy is in microseconds. This parameter will give an indication of the minimum clock resolution of the timer.

Likewise, For OS_Tick2Micros, the API document just says it returns "microseconds per operating tick".

In none of these cases does it definitively say what will happen if the operating system tick period is not an exact integer number of microseconds, nor does it even say it cannot be zero. So we don't have a specific "requirement" here to follow.

In the current implementation as it stands today, it is rounded to the nearest integer. IMO this is perfectly acceptable/reasonable given the lack of detail in the specification here.
It's been proposed to change the code to do the equivalent of floor() (i.e. truncate/next lower int) or do the equivalent of ceil() (i.e. to next higher int). But I'm not convinced these are more correct, given how these outputs are specified.

Here is what I propose as a potential resolution to this issue:

Fix the comment to say that it is choosing the nearest integer value. (not that it is rounding up, nor any indication of avoiding zero, at least not here).
In the shared layer, Multiply the MicroSecPerTick by TicksPerSecond and confirm that the result is exactly 1000000. If not, generate a OS_DEBUG statement indicating that the value is not exact.

Please confirm if this proposed change will satisfy this concern?

skliper · 2019-09-23T20:59:41Z

@jphickey casting 0.9 to an integer = 0. That's all I'm saying. I agree the logic rounds to the nearest integer, and that behavior is fine as long as it's obvious as to why.

Since we don't have a requirement, with the comment update I'd prefer

adding that it rounds to the nearest integer and why (it's only used internally for reporting accuracy, any rounding sends a debug message) so intent is clear to a user or maintainer
remove the non-zero guarantee language, and state TicksPerSecond values over 2M (impractical number, more typically 1000 or less) will return zero

OS_DEBUG action is perfect.

Would it be helpful to update the OS_TimerCreate clock_accuracy description (in the header and API doc) to state if needed this value is rounded to the nearest integer (vs floor or ceiling to bound accuracy)?

*updated to clarify logic statement (vs "it")

skliper · 2020-03-15T17:25:12Z

Updated OS_TimerCreate API documentation to reflect the comments above as part of #364 (partial resolution).

Still pending OS_DEBUG action, and the two comment updates noted above.

EDIT - added related issue

Document/comment changes only Partially addresses nasa#255, nasa#12, nasa#18, nasa#366

skliper · 2020-03-18T20:25:34Z

@dmknutsen - want to finish the rest of this one? Comments update and debug message.

dmknutsen · 2020-04-03T12:00:20Z

Apologies...just realized I missed this. Sure, if it is still open...go ahead and assign it to me.

Fix #12, Comment update to correct for microseconds not always rounding up + a…

skliper mentioned this issue Sep 16, 2019

Avoid division by zero in posix targets #1

Closed

jphickey self-assigned this Sep 16, 2019

skliper added this to the 5.1.0 milestone Oct 22, 2019

skliper added the enhancement label Oct 22, 2019

skliper mentioned this issue Mar 15, 2020

API doxygen scrub #364

Closed

skliper added a commit to skliper/osal that referenced this issue Mar 15, 2020

Fix nasa#364, Scrub user's guide

4d429de

Document/comment changes only Partially addresses nasa#255, nasa#12, nasa#18, nasa#366

skliper mentioned this issue Mar 15, 2020

Fix #364 and others, Scrub user's guide #371

Merged

skliper added a commit to skliper/osal that referenced this issue Mar 15, 2020

Fix nasa#364, Scrub user's guide

a4abbfc

Document/comment changes only Partially addresses nasa#255, nasa#12, nasa#18, nasa#366

skliper assigned dmknutsen Apr 3, 2020

dmknutsen mentioned this issue Apr 6, 2020

Fix #12, Comment update to correct for microseconds not always rounding up + a… #406

Merged

dmknutsen added a commit to dmknutsen/osal that referenced this issue Apr 20, 2020

Fix nasa#12, Add tick to microsec rounding warning, update comments

3d4e486

astrogeco added a commit that referenced this issue Apr 21, 2020

Merge pull request #406 from dmknutsen/issue_12

229af74

Fix #12, Comment update to correct for microseconds not always rounding up + a…

astrogeco mentioned this issue Apr 27, 2020

Integration Candidate: 2020-04-15 #433

Merged

astrogeco linked a pull request Apr 27, 2020 that will close this issue

Integration Candidate: 2020-04-15 #433

Merged

astrogeco closed this as completed in #433 Apr 27, 2020

astrogeco pushed a commit that referenced this issue May 1, 2020

Fix #12, Add tick to microsec rounding warning, update comments

f29fc0c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Microsecond round up code doesn't round up. #12

Microsecond round up code doesn't round up. #12

skliper commented Sep 13, 2019

jphickey commented Sep 16, 2019

skliper commented Sep 16, 2019

jphickey commented Sep 19, 2019

jphickey commented Sep 19, 2019

joelsherrill commented Sep 20, 2019 via email

skliper commented Sep 23, 2019

skliper commented Sep 23, 2019

jphickey commented Sep 23, 2019 •

edited

Loading

jphickey commented Sep 23, 2019

skliper commented Sep 23, 2019

skliper commented Sep 23, 2019

jphickey commented Sep 23, 2019

skliper commented Sep 23, 2019 •

edited

Loading

skliper commented Mar 15, 2020 •

edited

Loading

skliper commented Mar 18, 2020

dmknutsen commented Apr 3, 2020

Microsecond round up code doesn't round up. #12

Microsecond round up code doesn't round up. #12

Comments

skliper commented Sep 13, 2019

jphickey commented Sep 16, 2019

skliper commented Sep 16, 2019

jphickey commented Sep 19, 2019

jphickey commented Sep 19, 2019

joelsherrill commented Sep 20, 2019 via email

skliper commented Sep 23, 2019

skliper commented Sep 23, 2019

jphickey commented Sep 23, 2019 • edited Loading

jphickey commented Sep 23, 2019

skliper commented Sep 23, 2019

skliper commented Sep 23, 2019

jphickey commented Sep 23, 2019

skliper commented Sep 23, 2019 • edited Loading

skliper commented Mar 15, 2020 • edited Loading

skliper commented Mar 18, 2020

dmknutsen commented Apr 3, 2020

jphickey commented Sep 23, 2019 •

edited

Loading

skliper commented Sep 23, 2019 •

edited

Loading

skliper commented Mar 15, 2020 •

edited

Loading