Skip to content

Conversation

@yaananth
Copy link

@yaananth yaananth commented Nov 26, 2025

Disclaimer: Used copilot to investigate the crash we faced and took it's help to write this

Summary

Fixes a SIGSEGV crash in the Azure Kusto output plugin when processing buffered backlog data on startup.

Root Cause

The ingest_all_chunks() function had nested mk_list_foreach_safe loops that both used the same tmp variable as the iterator:

mk_list_foreach_safe(head, tmp, &ctx->fs->streams) {     // outer loop
    ...
    mk_list_foreach_safe(f_head, tmp, &fs_stream->files) {  // inner loop - SAME tmp!

The mk_list_foreach_safe macro stores the "next" pointer in its second argument for safe iteration during list modification. When the inner loop overwrote tmp, it corrupted the outer loop's iteration state, causing undefined behavior and eventually a SIGSEGV when the outer loop tried to continue iteration with a corrupted pointer.

Crash Stack Trace (from production)

[engine] caught signal (SIGSEGV)
#0 flush_init() at plugins/out_azure_kusto/azure_kusto.c:1169
#1 cb_azure_kusto_flush() at plugins/out_azure_kusto/azure_kusto.c:1257
#2 co_init() at lib/monkey/deps/flb_libco/amd64.c:117

Fix

Add a dedicated f_tmp variable for the inner loop to prevent iterator corruption:

struct mk_list *tmp;
struct mk_list *f_tmp;  // NEW: dedicated inner loop iterator
...
mk_list_foreach_safe(head, tmp, &ctx->fs->streams) {
    ...
    mk_list_foreach_safe(f_head, f_tmp, &fs_stream->files) {  // Use f_tmp

Testing

  • Added flb_test_azure_kusto_buffering_backlog regression test that exercises the buffering/backlog restart code path
  • All existing azure_kusto tests continue to pass
  • Verified build compiles without warnings
Test json_invalid...                            [ OK ]
Test managed_identity_system...                 [ OK ]
Test managed_identity_user...                   [ OK ]
Test service_principal...                       [ OK ]
Test workload_identity...                       [ OK ]
Test buffering_backlog...                       [ OK ]
SUCCESS: All unit tests have passed.

Changes

  • plugins/out_azure_kusto/azure_kusto.c: Add f_tmp iterator variable for inner loop (+2 lines, -1 line)
  • tests/runtime/out_azure_kusto.c: Add buffering backlog regression test (+73 lines)

Summary by CodeRabbit

  • Refactor

    • Internal code cleanup to improve iteration safety (no public API changes).
  • Tests

    • Added a regression test covering buffering and backlog processing with end-to-end runs and cleanup.
    • Added runtime helpers and test scaffolding to support buffered-backlog testing.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Nov 26, 2025

Walkthrough

Updated the Azure Kusto output plugin to use a dedicated inner safety iterator variable; added a new runtime test that verifies buffering/backlog processing with two-phase Fluent Bit runs and accompanying cleanup helpers.

Changes

Cohort / File(s) Summary
Iterator variable fix
plugins/out_azure_kusto/azure_kusto.c
Introduced a local f_tmp variable and switched the inner mk_list_foreach_safe to use f_tmp as the safety iterator while retaining f_head for element access. No public API changes.
Buffering backlog test & helpers
tests/runtime/out_azure_kusto.c
Added flb_kusto_unlink_cb and flb_kusto_rm_rf cleanup helpers, declared and implemented flb_test_azure_kusto_buffering_backlog, and registered the test in TEST_LIST. Implements a two-phase test: run with buffering to write disk backlog, then restart to process backlog, plus buffer-dir setup/cleanup.

Sequence Diagram(s)

%%{init: {"themeVariables": {"primaryColor":"#2b8cbe","tertiaryColor":"#a6bddb","lineColor":"#1f3b57"}}}%%
sequenceDiagram
    participant Test as Test harness
    participant FB1 as Fluent Bit (run 1)
    participant Disk as Buffer dir (disk)
    participant FB2 as Fluent Bit (run 2)
    participant Kusto as Azure Kusto output

    Note over Test,FB1: Phase 1 — buffering enabled
    Test->>FB1: start with buffer_dir configured
    FB1->>Disk: write buffered records to disk
    FB1-->>Test: exit after flush to disk

    Note over Test,FB2: Phase 2 — restart to process backlog
    Test->>FB2: start pointing to same buffer_dir
    FB2->>Disk: read backlog files
    FB2->>Kusto: deliver buffered records
    FB2-->>Test: report success
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Iterator change is localized and simple.
  • New test introduces runtime logic, helpers, and two-phase flow that requires reading test behavior and temporary-file cleanup.
  • Reviewers should pay attention to:
    • Correctness of the mk_list_foreach_safe usage with f_tmp.
    • Test robustness (timing, cleanup, deterministic teardown).
    • Any platform-specific file removal behavior in cleanup helpers.

Suggested labels

backport to v4.0.x

Poem

🐰 I hopped through loops, found f_tmp to store,

No tangled iterators clambering the floor.
I buffered the carrots, then restarted my trail,
Backlog consumed — a floppy-tail tale.
Hooray for clean runs and tests that prevail!

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title precisely describes the main fix: addressing a SIGSEGV crash in nested mk_list_foreach_safe loops in the Azure Kusto plugin, which matches the core change in the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ea67b3e and e9e2389.

📒 Files selected for processing (2)
  • plugins/out_azure_kusto/azure_kusto.c (2 hunks)
  • tests/runtime/out_azure_kusto.c (4 hunks)
🔇 Additional comments (5)
plugins/out_azure_kusto/azure_kusto.c (1)

412-431: Correct fix for the nested loop iterator collision.

The addition of f_tmp as a dedicated safety iterator for the inner mk_list_foreach_safe loop correctly resolves the SIGSEGV. Previously, both loops shared tmp, allowing the inner loop to overwrite the outer loop's saved "next" pointer—causing undefined behavior when the outer loop resumed iteration.

tests/runtime/out_azure_kusto.c (4)

23-27: LGTM on the new includes.

The includes are appropriate for the cleanup helpers and test functionality (unistd.h for getpid/sleep, sys/stat.h for stat/mkdir, ftw.h for nftw, limits.h for PATH_MAX).


31-45: Clean implementation of recursive directory removal.

The helper functions correctly use nftw with FTW_DEPTH | FTW_PHYS for safe depth-first deletion without following symlinks. The stat check before nftw gracefully handles non-existent paths.


55-65: Test properly registered.

The new test function prototype and TEST_LIST entry follow the existing patterns in the file.


239-321: Well-designed regression test for the nested loop fix.

The two-phase test approach correctly exercises the backlog processing path in ingest_all_chunks that triggered the SIGSEGV:

  1. First run buffers data to disk with buffering enabled
  2. Second run restarts and processes the backlog (exercising the fixed nested mk_list_foreach_safe loops)

The PID-based unique directory and cleanup before/after the test ensures proper isolation—addressing the concern from the previous review.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@yaananth yaananth force-pushed the fix/azure-kusto-nested-loop-sigsegv branch from 8d08559 to 55130bb Compare November 26, 2025 11:15
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
tests/runtime/out_azure_kusto.c (1)

217-286: Consider adding test directory cleanup for reliability.

The test uses a hardcoded buffer directory /tmp/fluent-bit-kusto-test/ without cleanup. Leftover files from previous runs could cause flaky behavior or mask test failures.

Consider adding cleanup before and after the test:

 void flb_test_azure_kusto_buffering_backlog(void)
 {
     int i;
     int ret;
     int bytes;
     char sample[] = "{\"k\":\"v\"}";
     size_t sample_size = sizeof(sample) - 1;
     flb_ctx_t *ctx;
     int in_ffd;
     int out_ffd;
+    
+    /* Clean up any leftover files from previous runs */
+    system("rm -rf /tmp/fluent-bit-kusto-test/");
 
     /* First run: enable buffering and write data to disk */

And at the end:

     flb_stop(ctx);
     flb_destroy(ctx);
+
+    /* Clean up test directory */
+    system("rm -rf /tmp/fluent-bit-kusto-test/");
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ae14454 and 8d08559.

📒 Files selected for processing (2)
  • plugins/out_azure_kusto/azure_kusto.c (2 hunks)
  • tests/runtime/out_azure_kusto.c (3 hunks)
🔇 Additional comments (2)
plugins/out_azure_kusto/azure_kusto.c (1)

410-431: LGTM! Correct fix for the nested loop iterator collision.

The introduction of f_tmp as a dedicated temporary iterator for the inner mk_list_foreach_safe loop properly resolves the SIGSEGV. Previously, both loops shared tmp, causing the inner loop to overwrite the outer loop's saved "next" pointer, corrupting iteration state upon returning to the outer loop.

tests/runtime/out_azure_kusto.c (1)

33-33: LGTM! Forward declaration and test list entry properly added.

Also applies to: 42-42

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
tests/runtime/out_azure_kusto.c (1)

244-245: Consider test isolation and cleanup for the buffer directory.

The hardcoded path /tmp/fluent-bit-kusto-test/ may cause issues with parallel test runs or leave stale files between runs. Consider:

  1. Using a unique temp directory (e.g., with mkdtemp pattern or PID suffix).
  2. Cleaning up the buffer directory at test start and/or end.

For a regression test focused on preventing SIGSEGV, this is a minor concern, but cleanup would prevent accumulation of test artifacts.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8d08559 and 55130bb.

📒 Files selected for processing (2)
  • plugins/out_azure_kusto/azure_kusto.c (2 hunks)
  • tests/runtime/out_azure_kusto.c (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • plugins/out_azure_kusto/azure_kusto.c
🧰 Additional context used
🧬 Code graph analysis (1)
tests/runtime/out_azure_kusto.c (1)
src/flb_lib.c (10)
  • flb_stop (1011-1055)
  • flb_destroy (228-263)
  • flb_create (143-225)
  • flb_service_set (652-678)
  • flb_input (266-276)
  • flb_input_set (305-335)
  • flb_output (279-289)
  • flb_output_set (520-551)
  • flb_start (983-994)
  • flb_lib_push (843-870)
🔇 Additional comments (2)
tests/runtime/out_azure_kusto.c (2)

33-33: LGTM!

The function declaration and TEST_LIST registration follow the existing patterns in this file.

Also applies to: 42-42


260-285: Regression test structure looks correct for its purpose.

The two-phase approach (write buffered data, restart to process backlog) correctly exercises the ingest_all_chunks code path where the nested loop SIGSEGV occurred. If the test completes without crashing, the fix is validated.

For enhanced verification, you could optionally add a comment clarifying that the test's success criterion is "no crash" rather than output validation.

@yaananth yaananth force-pushed the fix/azure-kusto-nested-loop-sigsegv branch from 55130bb to ea67b3e Compare November 26, 2025 11:19
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/runtime/out_azure_kusto.c (1)

280-283: Consider adding verification that backlog processing occurred.

While the test successfully exercises the buffering/backlog code path (its primary goal is crash prevention), it could be more robust by verifying that:

  • Buffered data was actually loaded from disk
  • Backlog processing completed
  • The buffer directory state changed appropriately

This is optional since the test's main purpose—ensuring no SIGSEGV—is achieved if execution completes successfully.

Example verification (optional):

/* After line 283, verify buffer directory state or add logging check */
/* Note: This would require inspecting flb logs or buffer directory contents */
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 55130bb and ea67b3e.

📒 Files selected for processing (2)
  • plugins/out_azure_kusto/azure_kusto.c (2 hunks)
  • tests/runtime/out_azure_kusto.c (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • plugins/out_azure_kusto/azure_kusto.c
🔇 Additional comments (1)
tests/runtime/out_azure_kusto.c (1)

33-33: LGTM!

The function prototype and test list entry follow the established pattern in the file.

Also applies to: 42-42

The ingest_all_chunks() function had nested mk_list_foreach_safe loops
that both used the same 'tmp' variable as the iterator. The macro stores
the 'next' pointer in its second argument for safe iteration during list
modification. When the inner loop overwrote 'tmp', it corrupted the outer
loop's iteration state, causing undefined behavior and a SIGSEGV crash
when processing buffered backlog data on startup.

Fix: Add a dedicated 'f_tmp' variable for the inner loop to prevent
iterator corruption.

Also adds a regression test (buffering_backlog) that exercises the
buffering/backlog restart code path to guard against future regressions.

Signed-off-by: Yash Ananth <[email protected]>
Signed-off-by: Yashwanth Anantharaju <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant