Skip to content

Conversation

@edsiper
Copy link
Member

@edsiper edsiper commented Sep 23, 2025

Fixes #10923

Added a backend hook to invalidate TLS sessions, invoked before socket shutdown so SSL_shutdown never targets a recycled descriptor.

Also introduced a unit test that confirms prepare_destroy_conn invalidates the TLS session and wired it into the internal test suite when TLS is enabled


Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

  • New Features

    • Add API to invalidate TLS sessions without destroying their context, and expose backend support for session invalidation.
  • Bug Fixes

    • Invalidate TLS sessions before socket shutdown to avoid races.
    • Prevent TLS shutdown on already-invalid file descriptors to reduce crashes/hangs.
  • Tests

    • Add unit test covering TLS-enabled upstream connection teardown and session invalidation.

@edsiper edsiper added this to the Fluent Bit v4.1 milestone Sep 23, 2025
@coderabbitai
Copy link

coderabbitai bot commented Sep 23, 2025

Walkthrough

Adds a TLS session invalidation hook and public API, implements invalidation for the OpenSSL backend, invokes session invalidation during upstream connection teardown when TLS is enabled, and adds a unit test exercising upstream TLS teardown.

Changes

Cohort / File(s) Summary
Public TLS API
include/fluent-bit/tls/flb_tls.h
Adds void (*session_invalidate)(void *) to struct flb_tls_backend and declares int flb_tls_session_invalidate(struct flb_tls_session *session);.
TLS Core
src/tls/flb_tls.c
Adds flb_tls_session_invalidate() which validates inputs and forwards to backend session_invalidate if present.
OpenSSL Backend
src/tls/openssl.c
Implements tls_session_invalidate(void *), marks session fd invalid, adjusts tls_session_destroy fd guard, and wires .session_invalidate = tls_session_invalidate in backend struct.
Upstream Teardown
src/flb_upstream.c
In prepare_destroy_conn, calls flb_tls_session_invalidate() when TLS is enabled and a session exists, before shutdown/close.
Unit Tests
tests/internal/CMakeLists.txt, tests/internal/upstream_tls.c
Conditionally include upstream_tls.c when TLS is built; add a test that verifies backend invalidate callback is invoked and connection fields are reset during release.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant U as Upstream
  participant T as TLS core
  participant B as TLS Backend (OpenSSL)

  rect rgba(232,244,253,0.6)
    Note over U: prepare_destroy_conn (TLS-enabled)
    U->>T: flb_tls_session_invalidate(session)
    alt backend supports invalidate
      T->>B: session_invalidate(session)
      B-->>T: session invalidated (fd = -1, mutex released)
    else
      T-->>U: no-op / error
    end
  end

  U->>U: shutdown_connection()
  U->>U: close(fd)
  Note over U: connection moved to destroy queue / marked shutdown
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

backport to v4.0.x

Suggested reviewers

  • koleini
  • fujimotos

Poem

A rabbit taps keys and gives a wink,
"Invalidate sessions before they sink."
fd set to -1, mutex freed with care,
Connections tidy, no races left there.
Hop—tests pass—cleanup done with flair. 🐇

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 12.50% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title "tls: fix regression and proper shutdown" is a concise, single-sentence summary that directly reflects the main change in the PR—adding a TLS session invalidation hook and ensuring correct shutdown to fix a regression—without extraneous detail or noise.
Linked Issues Check ✅ Passed The changes implement a backend session_invalidate hook, add flb_tls_session_invalidate, call it from prepare_destroy_conn before socket shutdown, add an OpenSSL invalidation implementation, and include a unit test; these coding changes directly address the primary objective of linked issue [#10923] by preventing SSL_shutdown from operating on recycled descriptors and therefore meet the linked issue's coding requirements.
Out of Scope Changes Check ✅ Passed All modified files and additions are confined to the TLS subsystem, upstream connection teardown, the OpenSSL backend, and accompanying unit tests; no unrelated subsystems or unrelated public APIs were changed, so there are no apparent out-of-scope changes.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch tls-regression-reuse

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 43ce960 and 984a13d.

📒 Files selected for processing (6)
  • include/fluent-bit/tls/flb_tls.h (2 hunks)
  • src/flb_upstream.c (1 hunks)
  • src/tls/flb_tls.c (1 hunks)
  • src/tls/openssl.c (3 hunks)
  • tests/internal/CMakeLists.txt (1 hunks)
  • tests/internal/upstream_tls.c (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
  • include/fluent-bit/tls/flb_tls.h
  • src/tls/flb_tls.c
  • tests/internal/CMakeLists.txt
  • src/flb_upstream.c
  • src/tls/openssl.c
  • tests/internal/upstream_tls.c
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
  • GitHub Check: pr-compile-centos-7
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
  • GitHub Check: PR - fuzzing test

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@edsiper edsiper force-pushed the tls-regression-reuse branch from b44c142 to cecb120 Compare September 23, 2025 22:00
@edsiper edsiper force-pushed the tls-regression-reuse branch from cecb120 to 7d873fb Compare September 24, 2025 04:12
@edsiper edsiper force-pushed the tls-regression-reuse branch from 7d873fb to 43ce960 Compare September 24, 2025 04:14
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
tests/internal/CMakeLists.txt (2)

58-64: Use list(APPEND) instead of set for appending to lists

More idiomatic and avoids re-specifying the existing variable.

 if(FLB_TLS)
-  set(UNIT_TESTS_FILES
-    ${UNIT_TESTS_FILES}
-    upstream_tls.c
-    )
+  list(APPEND UNIT_TESTS_FILES
+    upstream_tls.c
+  )
 endif()

58-64: Windows build compatibility check

This adds the TLS test on all platforms when FLB_TLS is ON. If upstream_tls.c uses POSIX-only APIs, it may fail on Windows. Confirm it builds on Windows CI; otherwise wrap it similarly to other POSIX-only tests.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cecb120 and 7d873fb.

📒 Files selected for processing (1)
  • tests/internal/CMakeLists.txt (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
  • GitHub Check: PR - fuzzing test
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
🔇 Additional comments (1)
tests/internal/CMakeLists.txt (1)

58-64: FLB_TLS guard is correct — keep if(FLB_TLS) in tests/internal/CMakeLists.txt

Top-level CMakeLists.txt declares option(FLB_TLS) and FLB_HAVE_TLS is derived from it, so the if(FLB_TLS) guard is correct.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/internal/upstream_tls.c (1)

84-89: Strengthen post-conditions: also verify busy_queue is empty.

After release, the connection should no longer be in busy_queue. This complements the destroy_queue assertion.

     TEST_CHECK(mk_list_size(&queue->destroy_queue) == 1);
     TEST_CHECK(conn.shutdown_flag == FLB_TRUE);
+    TEST_CHECK(mk_list_size(&queue->busy_queue) == 0);
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7d873fb and 43ce960.

📒 Files selected for processing (2)
  • tests/internal/CMakeLists.txt (1 hunks)
  • tests/internal/upstream_tls.c (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/internal/CMakeLists.txt
🧰 Additional context used
🧬 Code graph analysis (1)
tests/internal/upstream_tls.c (1)
src/flb_upstream.c (2)
  • flb_upstream_queue_init (219-224)
  • flb_upstream_conn_release (862-947)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
  • GitHub Check: PR - fuzzing test
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-centos-7
🔇 Additional comments (2)
tests/internal/upstream_tls.c (2)

23-30: LGTM: backend session_invalidate shim is simple and correct.

The callback increments exactly once; clear and side‑effect free.


51-53: Resource handling looks correct.

socket_pair[0] is closed by the release path; socket_pair[1] is explicitly closed by the test. Windows init/cleanup is properly guarded.

If CI has a Windows job, please confirm this test passes there too (event loop closing semantics can differ). -->

Also applies to: 90-95

Comment on lines +69 to +76
conn.fd = socket_pair[0];
conn.event.fd = conn.fd;
conn.event.status = 0;
conn.stream = (struct flb_stream *) &upstream;
conn.net = &upstream.base.net;
conn.tls_session = &tls_session;
conn.net_error = 0;

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Set conn.upstream and use upstream.base for conn.stream (fix crash risk).

flb_upstream_conn_release dereferences conn->upstream. Here it’s never set, and conn->stream is pointing to &upstream (not &upstream.base). This can lead to NULL deref/UB at runtime.

Apply this diff:

     conn.fd = socket_pair[0];
     conn.event.fd = conn.fd;
     conn.event.status = 0;
-    conn.stream = (struct flb_stream *) &upstream;
+    conn.upstream = &upstream;
+    conn.stream = &upstream.base;
     conn.net = &upstream.base.net;
     conn.tls_session = &tls_session;
-    conn.net_error = 0;
+    /* -1 is the sentinel used in keepalive path checks for “no error” */
+    conn.net_error = -1;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
conn.fd = socket_pair[0];
conn.event.fd = conn.fd;
conn.event.status = 0;
conn.stream = (struct flb_stream *) &upstream;
conn.net = &upstream.base.net;
conn.tls_session = &tls_session;
conn.net_error = 0;
conn.fd = socket_pair[0];
conn.event.fd = conn.fd;
conn.event.status = 0;
conn.upstream = &upstream;
conn.stream = &upstream.base;
conn.net = &upstream.base.net;
conn.tls_session = &tls_session;
/* -1 is the sentinel used in keepalive path checks for "no error" */
conn.net_error = -1;
🤖 Prompt for AI Agents
In tests/internal/upstream_tls.c around lines 69 to 76, conn->upstream is never
set and conn->stream is incorrectly set to &upstream (instead of the
upstream.base), which risks a NULL deref in flb_upstream_conn_release; set
conn.upstream = &upstream and set conn.stream = (struct flb_stream *)
&upstream.base to ensure the connection references the upstream object and its
base stream correctly.

@edsiper edsiper force-pushed the tls-regression-reuse branch from 43ce960 to 984a13d Compare September 24, 2025 13:24
@edsiper
Copy link
Member Author

edsiper commented Sep 24, 2025

rebased on top of master, #10930 should fix the processor issue

@vishnuvardhanappscrip
Copy link

vishnuvardhanappscrip commented Nov 14, 2025

Hi @edsiper @cosmo0920

I am reviewing the fix for Issue #10923 (Azure Kusto output plugin connection failure caused by PR #10895).
I see that PR #10924 resolves the issue and that the milestone is v4.1, but I am unable to determine the exact version where this fix was included.

Could you please confirm the specific release that contains the fix?

We checked the release notes and were unable to see PR #10924 (or any Azure Kusto plugin fixes) explicitly listed in any version’s changelog.
To finalize our internal upgrade plan and documentation, we need to know exactly which Fluent Bit release contains this fix.

Thank you for your help and for the quick turnaround on the original issue.

@patrick-stephens
Copy link
Collaborator

Looks like you already raised an issue but the commit shows the tag: #11172 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

output azure_kusto getting a network error connection to the azure storage account queue in 4.0.10

5 participants