-
Notifications
You must be signed in to change notification settings - Fork 1.9k
reload: refactor watchdog to be preemptable on windows. #10968
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Phillip Whelan <phillip.whelan@chronosphere.io>
WalkthroughThe hot-reload watchdog in src/flb_reload.c now uses a 100 ms iterative sleep loop instead of a single long sleep, improving cancellation responsiveness. Additional debug/info logs were added around successful reload completion and watchdog cleanup. Abort-on-timeout behavior is unchanged. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Main as Main Thread
participant Watchdog as Watchdog Thread
participant Reload as Reload Logic
Note over Main,Watchdog: Hot-reload initiated
Main->>Watchdog: Start watchdog(timeout_seconds)
Main->>Reload: Begin reload sequence
rect rgba(200,230,255,0.25)
Note over Watchdog: Granular loop: sleep 100ms × (timeout_seconds × 10)
loop 100ms intervals until timeout or cancel
Watchdog-->>Watchdog: Check cancel/completion flag
end
end
alt Reload completes before timeout
Reload-->>Main: Success signal
Main->>Watchdog: Cancel/cleanup request
Note right of Main: Debug: cleaning up watchdog
Watchdog-->>Main: Exit
Main-->>Main: Info: reload completed successfully
else Timeout reached
Watchdog-->>Main: Timeout signal
Main-->>Main: Abort reload path
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
It is necessary to enable hot reloading with the timeout: service:
Hot_Reload.Timeout: 5
Hot_Reload: On
Log_Level: debug
pipeline:
inputs:
- name: dummy
outputs:
- name: stdout
match: "*"Attaching a valgrind log: As well as a debug log: |
Summary
Due to how the watchdog thread works and how preemption of cancelled threads work on windows each hot reload on windows will take at least 5 minutes.
Description
Due to how cancel states and joining works this can delay the process of loading a new configuration by minutes since it is the main thread that does both operations.
The function flb_reload spins up a background watchdog thread (when enabled) which should abort() the process if the configuration does not cleanly load. In the process of deactivating it once the configuration is loaded that same thread is joined via pthread_join in the function flb_reload_watchdog_cleanup. On windows at least the sleep function is not preempted by the thread being cancelled which leads to a delay of 5 minutes after each reload.
To fix this I refactored the watchdog thread to sleep in increments of 100ms. This should be a decent tradeoff between responsiveness and performance.
Enter
[N/A]in the box, if an item is not applicable to your change.Testing
Before we can approve your change; please submit the following in a comment:
Documentation
Backporting
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.
Summary by CodeRabbit