Skip to content

Conversation

@cosmo0920
Copy link
Contributor

@cosmo0920 cosmo0920 commented Sep 17, 2025

This is because lifecycle of TLS is not synchronized with the current implementation.

Somewhere, we observed:

Our observation “TLS is freed too early in the upstream prepare-destroy phase → UAF risk” case is existing in the current code base.

So, even with Keepalive enabled, our Fluent Bit code base shows multiple conditions where the TLS session is freed during the “prepare destroy” step, which can race with async I/O and cause a use-after-free in ssl_write_internal. Moving TLS freeing to the final destroy_conn() phase mitigates this.


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • Run local packaging test showing all targets (including any new ones) build.
  • Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • Documentation required for this feature

Backporting

  • Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

  • Bug Fixes
    • Corrected TLS session cleanup during connection teardown, ensuring sessions are destroyed at the final destruction stage. This reduces the risk of resource leaks and improves stability in TLS-enabled environments.
  • Refactor
    • Streamlined the connection destruction flow for consistency and reliability, without changing external behavior or public interfaces.

This is because lifecycle of TLS is not synchronized with the current
implementation.

Somewhere, we observed:

Our observation “TLS is freed too early in the upstream prepare-destroy phase
→ UAF risk” case is existing in the current code base.

So, even with Keepalive enabled, our Fluent Bit code base shows multiple conditions where
the TLS session is freed during the “prepare destroy” step,
which can race with async I/O and cause a use-after-free in ssl_write_internal.
Moving TLS freeing to the final destroy_conn() phase mitigates this.

Signed-off-by: Hiroshi Hatake <[email protected]>
@coderabbitai
Copy link

coderabbitai bot commented Sep 17, 2025

Walkthrough

TLS session cleanup was moved from the prepare_destroy_conn path to the final destroy_conn path. Now, destroy_conn frees the TLS session (when enabled and non-NULL) immediately before removing the connection from lists and calling flb_connection_destroy. Non-TLS shutdown and socket close remain in prepare_destroy_conn.

Changes

Cohort / File(s) Summary
Upstream connection teardown
src/flb_upstream.c
Relocated TLS session destruction from prepare_destroy_conn to destroy_conn, guarded by FLB_HAVE_TLS and NULL checks; retained existing non-TLS shutdown and socket close in prepare_destroy_conn; no public API changes.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Caller
  participant Upstream as Upstream Conn
  participant TLS as TLS Session
  participant OS as Socket/OS

  rect rgba(224,240,255,0.5)
    note over Caller,Upstream: Connection teardown (two-phase)
    Caller->>Upstream: prepare_destroy_conn(conn)
    Upstream-->>OS: shutdown_connection(conn)
    Upstream-->>OS: close(socket)
    note right of Upstream: TLS not destroyed here (changed)
  end

  rect rgba(231,255,224,0.5)
    Caller->>Upstream: destroy_conn(conn)
    alt TLS enabled and session exists
      Upstream->>TLS: destroy()
      TLS-->>Upstream: freed
    else TLS disabled or NULL
      note right of Upstream: Skip TLS cleanup
    end
    Upstream-->>Upstream: unlink from lists
    Upstream-->>Caller: flb_connection_destroy(conn)
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

I twitch my ears at teardown’s tune,
TLS now waits till final swoon;
Sockets sleep, the lists grow light,
Then keys unspool in tidy night.
Hop-hop—cleanup’s crisp and done,
Connections set beneath the sun.
Carrot salute: refactors won! 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title "upstream: Move clearing TLS session from prepare phase to destroy phase" accurately and succinctly describes the main change in the patch — relocating TLS session cleanup from the prepare-destroy step to the final destroy path — and uses a clear subsystem prefix without extraneous detail, so a reviewer scanning history will understand the primary intent.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch cosmo0920-keep-tls-connection-after-completion-of-destroy

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/flb_upstream.c (1)

570-574: Drop redundant NULL assignment.

flb_tls_session_destroy() already sets session->connection->tls_session = NULL (see src/tls/flb_tls.c). Setting it again here is harmless but noisy.

 #ifdef FLB_HAVE_TLS
-        if (u_conn->tls_session != NULL) {
-            flb_tls_session_destroy(u_conn->tls_session);
-
-            u_conn->tls_session = NULL;
-        }
+        if (u_conn->tls_session != NULL) {
+            flb_tls_session_destroy(u_conn->tls_session);
+        }
 #endif
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 60db310 and 9ae75e7.

📒 Files selected for processing (1)
  • src/flb_upstream.c (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/flb_upstream.c (1)
src/tls/flb_tls.c (1)
  • flb_tls_session_destroy (751-768)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
  • GitHub Check: PR - fuzzing test
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-centos-7
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
🔇 Additional comments (2)
src/flb_upstream.c (2)

568-576: Good fix: defer TLS free to final destruction to avoid UAF in async I/O.

Moving TLS teardown from prepare_destroy_conn() to destroy_conn() aligns the lifecycle with event processing and removes the early-free race. The busy_flag and priority-queue guard before freeing TLS are appropriate.


568-576: No double-free — TLS is destroyed before flb_connection_destroy and cleared by flb_tls_session_destroy.

Both src/flb_upstream.c (destroy_conn) and src/flb_downstream.c call flb_tls_session_destroy(...) before flb_connection_destroy(...); flb_tls_session_destroy() sets session->connection->tls_session = NULL (src/tls/flb_tls.c). The extra u_conn->tls_session = NULL in destroy_conn is redundant but safe. No change required.

@edsiper edsiper merged commit 182f517 into master Sep 17, 2025
56 checks passed
@edsiper edsiper deleted the cosmo0920-keep-tls-connection-after-completion-of-destroy branch September 17, 2025 17:26
@edsiper
Copy link
Member

edsiper commented Sep 17, 2025

@cosmo0920 thanks for the PR, let's backport this for v4.0.10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants