-
Notifications
You must be signed in to change notification settings - Fork 1.9k
upstream: Move clearing TLS session from prepare phase to destroy phase #10886
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
upstream: Move clearing TLS session from prepare phase to destroy phase #10886
Conversation
This is because lifecycle of TLS is not synchronized with the current implementation. Somewhere, we observed: Our observation “TLS is freed too early in the upstream prepare-destroy phase → UAF risk” case is existing in the current code base. So, even with Keepalive enabled, our Fluent Bit code base shows multiple conditions where the TLS session is freed during the “prepare destroy” step, which can race with async I/O and cause a use-after-free in ssl_write_internal. Moving TLS freeing to the final destroy_conn() phase mitigates this. Signed-off-by: Hiroshi Hatake <[email protected]>
WalkthroughTLS session cleanup was moved from the prepare_destroy_conn path to the final destroy_conn path. Now, destroy_conn frees the TLS session (when enabled and non-NULL) immediately before removing the connection from lists and calling flb_connection_destroy. Non-TLS shutdown and socket close remain in prepare_destroy_conn. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Caller
participant Upstream as Upstream Conn
participant TLS as TLS Session
participant OS as Socket/OS
rect rgba(224,240,255,0.5)
note over Caller,Upstream: Connection teardown (two-phase)
Caller->>Upstream: prepare_destroy_conn(conn)
Upstream-->>OS: shutdown_connection(conn)
Upstream-->>OS: close(socket)
note right of Upstream: TLS not destroyed here (changed)
end
rect rgba(231,255,224,0.5)
Caller->>Upstream: destroy_conn(conn)
alt TLS enabled and session exists
Upstream->>TLS: destroy()
TLS-->>Upstream: freed
else TLS disabled or NULL
note right of Upstream: Skip TLS cleanup
end
Upstream-->>Upstream: unlink from lists
Upstream-->>Caller: flb_connection_destroy(conn)
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/flb_upstream.c (1)
570-574: Drop redundant NULL assignment.flb_tls_session_destroy() already sets session->connection->tls_session = NULL (see src/tls/flb_tls.c). Setting it again here is harmless but noisy.
#ifdef FLB_HAVE_TLS - if (u_conn->tls_session != NULL) { - flb_tls_session_destroy(u_conn->tls_session); - - u_conn->tls_session = NULL; - } + if (u_conn->tls_session != NULL) { + flb_tls_session_destroy(u_conn->tls_session); + } #endif
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/flb_upstream.c(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/flb_upstream.c (1)
src/tls/flb_tls.c (1)
flb_tls_session_destroy(751-768)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
- GitHub Check: PR - fuzzing test
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-centos-7
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
🔇 Additional comments (2)
src/flb_upstream.c (2)
568-576: Good fix: defer TLS free to final destruction to avoid UAF in async I/O.Moving TLS teardown from prepare_destroy_conn() to destroy_conn() aligns the lifecycle with event processing and removes the early-free race. The busy_flag and priority-queue guard before freeing TLS are appropriate.
568-576: No double-free — TLS is destroyed before flb_connection_destroy and cleared by flb_tls_session_destroy.Both src/flb_upstream.c (destroy_conn) and src/flb_downstream.c call flb_tls_session_destroy(...) before flb_connection_destroy(...); flb_tls_session_destroy() sets session->connection->tls_session = NULL (src/tls/flb_tls.c). The extra u_conn->tls_session = NULL in destroy_conn is redundant but safe. No change required.
|
@cosmo0920 thanks for the PR, let's backport this for v4.0.10 |
This is because lifecycle of TLS is not synchronized with the current implementation.
Somewhere, we observed:
Our observation “TLS is freed too early in the upstream prepare-destroy phase → UAF risk” case is existing in the current code base.
So, even with Keepalive enabled, our Fluent Bit code base shows multiple conditions where the TLS session is freed during the “prepare destroy” step, which can race with async I/O and cause a use-after-free in ssl_write_internal. Moving TLS freeing to the final destroy_conn() phase mitigates this.
Enter
[N/A]in the box, if an item is not applicable to your change.Testing
Before we can approve your change; please submit the following in a comment:
If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
ok-package-testlabel to test for all targets (requires maintainer to do).Documentation
Backporting
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.
Summary by CodeRabbit