Skip to content

Conversation

@pwhelan
Copy link
Contributor

@pwhelan pwhelan commented Apr 4, 2024

Summary

The flb-it-log is failing sporadically. I believe this is because initially time is counted from 1 second before the message is first cached for both messages being test, TEST_RECORD_1 and TEST_RECORD_2.

[ FAILED ]
  log.c:18: Check (start + timeout) >= now... failed
    clock error, unsuppresed log: now=1712179281, timeout=1712179280, diff=1
  log.c:110: Check ret == 0... failed
  log.c:18: Check (start + timeout) >= now... failed
    clock error, unsuppresed log: now=1712179281, timeout=1712179280, diff=1
  log.c:114: Check ret == 0... failed
Test cache_one_slot...                          
[ OK ]
FAILED: 1 of 2 unit tests has failed.

Only a single failure is registered even though the loop occurs twice. This lines up with the idea that it is an off-by-one caused by starting the clocks one second after initially sending the message(s) to be suppressed by the log cache.

This fix simply starts counting time for each record right before emitting each one.


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • Run local packaging test showing all targets (including any new ones) build.
  • Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • Documentation required for this feature

Backporting

  • Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

@pwhelan
Copy link
Contributor Author

pwhelan commented Apr 4, 2024

Apparently out_http also can suffer from flaky results:

Test in_http...                                 [ FAILED ]
  out_http.c:1157: Check num > 0... failed
    no outputs
FAILED: 1 of 15 unit tests has failed.

I'll see if I can get around to that one later. At the very least this PR did not fail in the flb-it-log test.

@edsiper edsiper merged commit bb510b9 into master Apr 10, 2024
@edsiper edsiper deleted the pwhelan-flb-it-log-timing-fix branch April 10, 2024 22:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants