Skip to content

Conversation

@cosmo0920
Copy link
Contributor

@cosmo0920 cosmo0920 commented Oct 6, 2025

Backporting of #10869, #10826, and #10874.


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • Run local packaging test showing all targets (including any new ones) build.
  • Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • Documentation required for this feature

Backporting

  • Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

alecholmes and others added 6 commits October 6, 2025 16:18
Signed-off-by: Alec Holmes <alec@chronosphere.io>
Signed-off-by: Alec Holmes <alec@chronosphere.io>
- Add hot_reload_watchdog_timeout_seconds field to flb_config struct
- Add FLB_CONF_STR_HOT_RELOAD_TIMEOUT configuration string
- Add parsing logic in service_configs array
- Set default timeout to 300 seconds in flb_config_init

Signed-off-by: Bradley Laney <bradley.laney@chronosphere.io>
- Add watchdog thread that monitors hot reload duration
- Thread sleeps for configured timeout then aborts if reload hasn't completed
- Watchdog is started at beginning of reload and cancelled on completion
- Uses pthread_cancel with async cancellation for immediate response
- Properly cleans up thread resources with pthread_join

Signed-off-by: Bradley Laney <bradley.laney@chronosphere.io>
- Add test_hang_on_collect configuration option
- When enabled, dummy plugin hangs during collect to simulate stuck reload
- Used only for testing hot reload watchdog functionality

Signed-off-by: Bradley Laney <bradley.laney@chronosphere.io>
- Add test_reload_watchdog_timeout to verify watchdog functionality
- Test forks a child process that triggers reload with hanging dummy input
- Verifies that watchdog aborts the process after timeout
- Checks that child terminates with SIGABRT signal

Signed-off-by: Bradley Laney <bradley.laney@chronosphere.io>
@coderabbitai
Copy link

coderabbitai bot commented Oct 6, 2025

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch hotreload-watchdog-thread-4.0

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@cosmo0920 cosmo0920 merged commit 0c874ca into 4.0 Oct 6, 2025
55 of 56 checks passed
@cosmo0920 cosmo0920 deleted the hotreload-watchdog-thread-4.0 branch October 6, 2025 08:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants