Skip to content

Conversation

@zhihonl
Copy link
Contributor

@zhihonl zhihonl commented Jul 11, 2025

Background

CloudWatch introduced a new feature called Explored Related which is a UI component that helps user navigate between their telemetry(metrics,logs) and AWS resources. This is done by including a new field in PutLogEvents API which is called Entity. More documentation about this new field can be found here: https://docs.aws.amazon.com/AmazonCloudWatchLogs/latest/APIReference/API_Entity.html

Problem

For enabling Explore Related experience in AWS EKS cluster, we need additional data such as workload(deployment, daemonset, etc) type, cluster name. There's also an option to associate name of the service running within the cluster but FluentBit doesn't have existing functionality for that. This information is exposed from EKS Add-on called Amazon CloudWatch Observability Add-on which we can use to query for these information


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Screenshot of how UI interacts with the changes made in this PR

  • Example configuration file for the change

application-log.conf: |
          [INPUT]
            Name                tail
            Tag                 application.*
            Exclude_Path        /var/log/containers/cloudwatch-agent*, /var/log/containers/fluent-bit*, /var/log/containers/aws-node*, /var/log/containers/kube-proxy*
            Path                /var/log/containers/*.log
            multiline.parser    docker, cri
            DB                  /var/fluent-bit/state/flb_container.db
            Mem_Buf_Limit       50MB
            Skip_Long_Lines     On
            Refresh_Interval    10
            Rotate_Wait         30
            storage.type        filesystem
            Read_from_Head      ${READ_FROM_HEAD}
          
          [INPUT]
            Name                tail
            Tag                 application.*
            Path                /var/log/containers/fluent-bit*
            multiline.parser    docker, cri
            DB                  /var/fluent-bit/state/flb_log.db
            Mem_Buf_Limit       5MB
            Skip_Long_Lines     On
            Refresh_Interval    10
            Read_from_Head      ${READ_FROM_HEAD}
          
          [INPUT]
            Name                tail
            Tag                 application.*
            Path                /var/log/containers/cloudwatch-agent*
            multiline.parser    docker, cri
            DB                  /var/fluent-bit/state/flb_cwagent.db
            Mem_Buf_Limit       5MB
            Skip_Long_Lines     On
            Refresh_Interval    10
            Read_from_Head      ${READ_FROM_HEAD}
          
          [FILTER]
            Name                aws
            Match               application.*
            enable_entity   true
          
          [FILTER]
            Name                kubernetes
            Match               application.*
            Kube_URL            https://kubernetes.default.svc:443
            Kube_Tag_Prefix     application.var.log.containers.
            Merge_Log           On
            Merge_Log_Key       log_processed
            K8S-Logging.Parser  On
            K8S-Logging.Exclude Off
            Labels              Off
            Annotations         Off
            Use_Kubelet         On
            Kubelet_Port        10250
            Buffer_Size         0
          
          [OUTPUT]
            Name                cloudwatch_logs
            Match               application.*
            region              ${AWS_REGION}
            log_group_name      /aws/containerinsights/${CLUSTER_NAME}/application
            log_stream_prefix   ${HOST_NAME}-
            auto_create_group   true
            extra_user_agent    container-insights
  • Debug log output from testing the change
  • [] Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • Run local packaging test showing all targets (including any new ones) build.
  • Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • Documentation required for this feature
    Need to add documentation about new configuration values that enables Explored Related experience for users.

Backporting

  • Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

  • New Features

    • Pod-to-service association for Kubernetes logs with background refresh, preload-file support, TLS-secured upstream, platform detection (EKS/native), workload discovery, deployment-name parsing, and enriched emitted metadata (service name, environment, name source, platform, cluster, workload).
    • ConfigMap fetch API to support finer-grained pod-associated caching.
  • Configuration

    • New options to enable pod association and configure host/endpoint/port, TTL/refresh interval, preload cache path, TLS (CA/client cert/key, debug, verify), and platform override.

@edsiper
Copy link
Member

edsiper commented Jul 25, 2025

CI needs to pass before the review

@zhihonl zhihonl force-pushed the kubernetes-entity branch from 5dc266e to 954dc73 Compare August 1, 2025 19:30
@coderabbitai
Copy link

coderabbitai bot commented Aug 1, 2025

Walkthrough

Adds AWS pod-to-service association to the Kubernetes filter: new config options and flb_kube fields (deploymentRegex, aws_pod_service_hash_table, TLS/upstream, platform), resource-type-aware API requests and augmented metadata (aws_entity_*), workload discovery, background thread to load/refresh a pod→service map (preload file or API), and a new kubernetes_aws module with build wiring.

Changes

Cohort / File(s) Summary
Config / Context
plugins/filter_kubernetes/kube_conf.h, plugins/filter_kubernetes/kube_conf.c
New macros and service_attributes struct; many new flb_kube fields for pod association (deploymentRegex, aws_use_pod_association, aws_pod_association_*, aws_pod_service_hash_table, TTLs, refresh interval, preload path, TLS files, platform/set_platform); creation/destruction of aws_pod_service_hash_table; extended destroy cleanup (deploymentRegex, aws_pod_association_tls/upstream, platform).
Metadata & Control Flow
plugins/filter_kubernetes/kube_meta.h, plugins/filter_kubernetes/kube_meta.c
Added cluster/workload fields and lengths; get_meta_info_from_request made resource-type-aware; added get_api_server_configmap; workload discovery helpers (owner chain + deploymentRegex); payload augmentation to emit aws_entity_* fields; cache-key enrichment; platform detection and flb_kube_pod_association_init; meta cleanup updates.
Regex Support
plugins/filter_kubernetes/kube_regex.h, plugins/filter_kubernetes/kube_regex.c
Added DEPLOYMENT_REGEX macro and unconditional initialization of deploymentRegex with error handling.
Background Mapping & Config
plugins/filter_kubernetes/kubernetes.c
New background thread update_pod_service_map, mutex and event-loop coordination; cb_kube_init/cb_kube_exit extended to start/stop thread and manage task args; added globals (metadata_mutex, background_thread, evl, task_args); expanded plugin config_map with pod-association options and defaults; pack_map_content signature updated to accept ctx.
AWS Integration Module
plugins/filter_kubernetes/kubernetes_aws.h, plugins/filter_kubernetes/kubernetes_aws.c
New module to load or fetch and parse pod→service map (preload file or API via upstream), detect platform (aws-auth ConfigMap probe), and populate cluster from env; exposes fetch_pod_service_map, determine_platform, get_cluster_from_environment.
Build Wiring
plugins/filter_kubernetes/CMakeLists.txt
Adds kubernetes_aws.c to plugin sources.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant FB as Fluent Bit
  participant KF as Kube Filter
  participant BG as Background Thread
  participant AWS as kubernetes_aws
  participant HT as Pod Service HashTable

  FB->>KF: cb_kube_init(config)
  KF->>KF: flb_kube_pod_association_init(config)
  KF-->>BG: start update_pod_service_map(task_args)
  loop every refresh_interval
    BG->>AWS: fetch_pod_service_map(api_url, mutex)
    AWS->>HT: parse & update pod→service entries (file or API)
    AWS-->>BG: status
    BG->>BG: sleep(refresh_interval)
  end
  FB->>KF: process record (pod metadata)
  KF->>HT: lookup pod→service (aws_pod_service_hash_table)
  KF->>KF: search_workload()/deploymentRegex
  KF-->>FB: emit enriched metadata (aws_entity_*)
Loading
sequenceDiagram
  autonumber
  participant KF as Kube Filter
  participant API as K8s API Server
  participant KA as kubernetes_aws

  KF->>KA: determine_platform()
  KA->>API: GET kube-system/aws-auth ConfigMap
  API-->>KA: 200 / not found
  KA-->>KF: platform = eks | k8s
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • leonardo-albertovich
  • edsiper
  • koleini
  • fujimotos
  • PettitWesley

Poem

I nibble maps beneath the moonlit log,
I hop through regex roots and thread the fog.
A mutex thumps, a hash table hums along,
TLS burrows keep my secrets safe and strong.
I stash service names and sing — the cluster's song 🐇

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 18

🔭 Outside diff range comments (1)
tests/runtime/filter_kubernetes.c (1)

386-409: Check return values of file operations in tests.

The test doesn't verify if clear_file and write_log_to_file succeed, which could lead to false positives if file operations fail.

     if(type == KUBE_POD_ASSOCIATION) {
-        clear_file(path);
+        ret = clear_file(path);
+        TEST_CHECK_(ret == 0, "clearing log file");
     }
 
     /* Start the engine */
     ret = flb_start(ctx.flb);
     TEST_CHECK_(ret == 0, "starting engine");
     if (ret == -1) {
         goto exit;
     }
 
     /* Poll for up to 3 seconds or until we got a match */
     for (ret = 0; ret < 3000 && result.nMatched != nExpected; ret++) {
         usleep(1000);
         if (ret == 2000 && type == KUBE_POD_ASSOCIATION) {
-            write_log_to_file(path);
+            int write_ret = write_log_to_file(path, NULL);
+            TEST_CHECK_(write_ret == 0, "writing log entry");
         }
     }
🧹 Nitpick comments (7)
tests/runtime/data/kubernetes/meta/options_use-kubelet-enabled-pod.meta (1)

1-109: Consider consolidating duplicate test data

This file is nearly identical to options_use-pod-association-enabled.meta except for the pod name. Consider using a template or parameterized approach to reduce duplication and maintenance burden.

plugins/filter_kubernetes/kube_conf.h (2)

159-159: Inconsistent naming convention

The field deploymentRegex uses camelCase while other fields in the struct use snake_case (e.g., api_host, api_port). Consider renaming to deployment_regex for consistency.

-    struct flb_regex *deploymentRegex;
+    struct flb_regex *deployment_regex;

91-100: Consider using flexible buffer sizes

The service_attributes struct uses fixed-size buffers. While the sizes align with AWS limits, consider whether dynamic allocation might be more memory-efficient, especially if many instances are created.

plugins/filter_kubernetes/kubernetes.c (1)

478-478: Fix parameter type mismatch.

The function signature expects a const char * but namespace_kube_buf is already const char *, making the cast unnecessary and potentially hiding type issues.

-                            const char *namespace_kube_buf,
+                            const char *namespace_kube_buf,
plugins/filter_kubernetes/kube_meta.c (3)

832-847: Simplify NULL checks in callback function.

The NULL checks for name, value, and data are defensive but unnecessary in a callback context. If these are NULL, there's a bigger problem with the caller.

Consider simplifying to just check vlen:

 static void cb_results_workload(const char *name, const char *value,
                        size_t vlen, void *data)
 {
-    if (name == NULL || value == NULL ||  vlen == 0 || data == NULL) {
+    if (vlen == 0) {
         return;
     }
 
     struct flb_kube_meta *meta = data;

861-953: LGTM! Well-structured workload detection logic.

The implementation correctly handles the priority-based workload search through owner references. The use of goto for the set_workload label is appropriate here as it reduces code duplication.

Minor suggestions:

  • Consider adding a comment explaining the priority order at the function start
  • Some variable declarations could be moved closer to their usage

1467-1537: LGTM! Comprehensive AWS entity metadata enrichment.

The implementation correctly adds AWS entity fields to the metadata with proper conditional checks and string packing. The code follows the existing pattern in the codebase.

Consider extracting a helper function to reduce repetition:

static void pack_string_field(msgpack_packer *mp_pck, const char *key, size_t key_len,
                              const char *value, size_t value_len) {
    msgpack_pack_str(mp_pck, key_len);
    msgpack_pack_str_body(mp_pck, key, key_len);
    msgpack_pack_str(mp_pck, value_len);
    msgpack_pack_str_body(mp_pck, value, value_len);
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2b9741c and 954dc73.

⛔ Files ignored due to path filters (19)
  • tests/runtime/data/kubernetes/log/options/options_use-kubelet-disabled-daemonset_fluent-bit.log is excluded by !**/*.log
  • tests/runtime/data/kubernetes/log/options/options_use-kubelet-disabled-deployment_fluent-bit.log is excluded by !**/*.log
  • tests/runtime/data/kubernetes/log/options/options_use-kubelet-disabled-pod_fluent-bit.log is excluded by !**/*.log
  • tests/runtime/data/kubernetes/log/options/options_use-kubelet-disabled-replicaset_fluent-bit.log is excluded by !**/*.log
  • tests/runtime/data/kubernetes/log/options/options_use-kubelet-enabled-daemonset_fluent-bit.log is excluded by !**/*.log
  • tests/runtime/data/kubernetes/log/options/options_use-kubelet-enabled-deployment_fluent-bit.log is excluded by !**/*.log
  • tests/runtime/data/kubernetes/log/options/options_use-kubelet-enabled-pod_fluent-bit.log is excluded by !**/*.log
  • tests/runtime/data/kubernetes/log/options/options_use-kubelet-enabled-replicaset_fluent-bit.log is excluded by !**/*.log
  • tests/runtime/data/kubernetes/log/options/options_use-pod-association-enabled_fluent-bit.log is excluded by !**/*.log
  • tests/runtime/data/kubernetes/out/options/options_use-kubelet-disabled-daemonset_fluent-bit.out is excluded by !**/*.out
  • tests/runtime/data/kubernetes/out/options/options_use-kubelet-disabled-deployment_fluent-bit.out is excluded by !**/*.out
  • tests/runtime/data/kubernetes/out/options/options_use-kubelet-disabled-pod_fluent-bit.out is excluded by !**/*.out
  • tests/runtime/data/kubernetes/out/options/options_use-kubelet-disabled-replicaset_fluent-bit.out is excluded by !**/*.out
  • tests/runtime/data/kubernetes/out/options/options_use-kubelet-enabled-daemonset_fluent-bit.out is excluded by !**/*.out
  • tests/runtime/data/kubernetes/out/options/options_use-kubelet-enabled-deployment_fluent-bit.out is excluded by !**/*.out
  • tests/runtime/data/kubernetes/out/options/options_use-kubelet-enabled-pod_fluent-bit.out is excluded by !**/*.out
  • tests/runtime/data/kubernetes/out/options/options_use-kubelet-enabled-replicaset_fluent-bit.out is excluded by !**/*.out
  • tests/runtime/data/kubernetes/out/options/options_use-pod-association-enabled_fluent-bit.out is excluded by !**/*.out
  • tests/runtime/data/kubernetes/servicemap/options_use-pod-association-enabled_fluent-bit.map is excluded by !**/*.map
📒 Files selected for processing (18)
  • plugins/filter_kubernetes/kube_conf.c (4 hunks)
  • plugins/filter_kubernetes/kube_conf.h (3 hunks)
  • plugins/filter_kubernetes/kube_meta.c (22 hunks)
  • plugins/filter_kubernetes/kube_meta.h (2 hunks)
  • plugins/filter_kubernetes/kube_regex.c (1 hunks)
  • plugins/filter_kubernetes/kube_regex.h (1 hunks)
  • plugins/filter_kubernetes/kubernetes.c (5 hunks)
  • tests/runtime/data/kubernetes/meta/options_use-kubelet-disabled-daemonset.meta (1 hunks)
  • tests/runtime/data/kubernetes/meta/options_use-kubelet-disabled-deployment.meta (1 hunks)
  • tests/runtime/data/kubernetes/meta/options_use-kubelet-disabled-pod.meta (1 hunks)
  • tests/runtime/data/kubernetes/meta/options_use-kubelet-disabled-replicaset.meta (1 hunks)
  • tests/runtime/data/kubernetes/meta/options_use-kubelet-enabled-daemonset.meta (1 hunks)
  • tests/runtime/data/kubernetes/meta/options_use-kubelet-enabled-deployment.meta (1 hunks)
  • tests/runtime/data/kubernetes/meta/options_use-kubelet-enabled-pod.meta (1 hunks)
  • tests/runtime/data/kubernetes/meta/options_use-kubelet-enabled-replicaset.meta (1 hunks)
  • tests/runtime/data/kubernetes/meta/options_use-kubelet-enabled.meta (1 hunks)
  • tests/runtime/data/kubernetes/meta/options_use-pod-association-enabled.meta (1 hunks)
  • tests/runtime/filter_kubernetes.c (9 hunks)
🔇 Additional comments (26)
tests/runtime/data/kubernetes/meta/options_use-kubelet-enabled-replicaset.meta (1)

1-119: LGTM! Well-structured test data for ReplicaSet pod association.

The JSON structure is valid and provides comprehensive test data for a pod owned by a ReplicaSet. The owner reference correctly specifies the controlling ReplicaSet with appropriate metadata fields.

tests/runtime/data/kubernetes/meta/options_use-kubelet-enabled.meta (1)

20-30: LGTM! Proper addition of Deployment owner reference.

The ownerReferences field is correctly structured with appropriate API version, kind, and controller flags for a Deployment resource.

tests/runtime/data/kubernetes/meta/options_use-kubelet-enabled-daemonset.meta (1)

1-119: LGTM! Comprehensive test data for DaemonSet pod association.

The JSON manifest is well-structured and provides complete test coverage for a pod owned by a DaemonSet, including proper owner reference metadata.

plugins/filter_kubernetes/kube_conf.c (4)

193-196: LGTM! Pod hash table creation is properly configured.

The pod hash table creation correctly uses the configured TTL value and eviction policy for managing pod-to-service mappings.


214-216: LGTM! Proper cleanup of pod hash table.

The pod hash table cleanup is correctly implemented with null check before destruction.


226-228: LGTM! Deployment regex cleanup is properly handled.

The deployment regex is properly destroyed with appropriate null check.


243-253: LGTM! Comprehensive cleanup of new pod association resources.

All new resources for pod association feature are properly cleaned up:

  • TLS context destruction
  • Upstream connection cleanup
  • Platform string deallocation

The cleanup order and null checks are appropriate.

tests/runtime/data/kubernetes/meta/options_use-kubelet-disabled-deployment.meta (1)

1-126: LGTM! Well-structured test data for deployment pod scenario.

The Kubernetes Pod manifest JSON is properly structured with:

  • Valid metadata including owner references to ReplicaSet (typical for deployment)
  • Complete pod specification with container, volumes, and tolerations
  • Realistic status information for a running pod
  • Appropriate annotations for Prometheus monitoring

This test data effectively supports validation of pod association with deployment workloads.

tests/runtime/data/kubernetes/meta/options_use-kubelet-enabled-deployment.meta (1)

1-119: Verify the JSON structure for kubelet enabled scenario.

The JSON structure appears to mix individual Pod format (lines 1-4) with list format (items array). This seems unusual as:

  • Kubelet API typically returns pod lists without outer Pod kind/apiVersion
  • The combination of Pod metadata at root level with items array is non-standard

Please verify this structure matches the expected kubelet API response format for the test scenario.

plugins/filter_kubernetes/kube_meta.h (4)

30-31: LGTM! Consistent field additions for cluster and workload metadata.

The new length and string pointer fields follow the established pattern in the struct, maintaining consistency with existing metadata fields.

Also applies to: 38-39


40-41: LGTM! Proper string pointer declarations for new metadata fields.

The cluster and workload string pointers are appropriately declared and positioned within the struct.

Also applies to: 46-47


61-66: LGTM! Well-defined constants for ConfigMap API and resource types.

The new macro and constants follow existing naming conventions:

  • ConfigMap API path format is consistent with existing API formats
  • Resource type constants are clearly defined and appropriately named

81-81: LGTM! Appropriate function declaration for pod association initialization.

The function declaration follows the established pattern and naming convention for initialization functions in this module.

tests/runtime/data/kubernetes/meta/options_use-kubelet-disabled-replicaset.meta (1)

1-126: LGTM! Appropriate test data for ReplicaSet pod scenario.

The Kubernetes Pod manifest JSON is well-structured with:

  • Valid metadata with owner reference directly to ReplicaSet (line 23)
  • Complete pod specification matching other test files
  • Consistent status information for a running pod
  • Proper differentiation from deployment scenario by having direct ReplicaSet ownership

This effectively tests pod association with ReplicaSet workloads distinct from deployment-managed ReplicaSets.

tests/runtime/data/kubernetes/meta/options_use-pod-association-enabled.meta (1)

4-4: Empty metadata object in Pod List response

The top-level metadata field is empty, which is unusual for a Kubernetes Pod List API response. Consider whether this accurately represents the expected API response structure.

tests/runtime/data/kubernetes/meta/options_use-kubelet-disabled-pod.meta (1)

1-116: Different JSON structure compared to pod-association-enabled.meta

This file contains a single Pod object while options_use-pod-association-enabled.meta contains a Pod List. Ensure this structural difference is intentional for your test scenarios.

tests/runtime/data/kubernetes/meta/options_use-kubelet-disabled-daemonset.meta (1)

19-28: LGTM! Appropriate ownerReferences for DaemonSet

The ownerReferences section correctly identifies this pod as being owned by a DaemonSet, which is appropriate for testing pod association with different workload types.

plugins/filter_kubernetes/kube_conf.h (1)

228-235: LGTM! Comprehensive TLS configuration

The TLS configuration fields for pod association provide proper support for mutual TLS authentication with CA verification, client certificates, and debugging options.

plugins/filter_kubernetes/kubernetes.c (2)

189-273: LGTM! Well-structured error handling.

The function properly handles errors with appropriate cleanup of upstream connections and TLS resources. The TLS certificate rotation logic is well implemented.


125-129: Memory leak on JSON parsing error.

The buffer is allocated by flb_pack_json but not freed before the early return when parsing fails.

         if (ret < 0) {
             flb_plg_warn(ctx->ins, "Could not parse json response = %s",
                      api_buf);
-            flb_free(buffer);
             return;
         }
+        

Move the flb_free(buffer) to line 186 where it's freed in all paths.

Likely an incorrect or invalid review comment.

tests/runtime/filter_kubernetes.c (2)

286-286: Consider implications of debug log level in tests.

Changing from "error" to "debug" will produce more verbose output during test runs, which might make it harder to identify actual test failures.

Is this change intentional for debugging the new pod association feature? Consider using a conditional log level based on an environment variable to allow both verbose and quiet test runs.


491-513: Verify test certificate paths exist or are mocked

The Kubernetes filter tests (tests/runtime/filter_kubernetes.c) use hardcoded absolute paths for TLS fixtures—/tst/ca.crt, /tst/client.crt and /tst/client.key—but I don’t see corresponding files or setup in the repo. Please ensure these files are created or mocked in your test harness (or move them under the existing DPATH test directory) so the tests can reliably locate them.

  • File: tests/runtime/filter_kubernetes.c (around lines 491–513)
  • Hardcoded paths to verify or replace:
    "pod_association_host_server_ca_file", "/tst/ca.crt"
    "pod_association_host_client_cert_file", "/tst/client.crt"
    "pod_association_host_client_key_file", "/tst/client.key"
plugins/filter_kubernetes/kube_meta.c (4)

355-418: LGTM! Good generalization of the function.

The refactoring to accept resource_type and resource_name parameters makes the function more reusable for different Kubernetes resources, which is a good architectural improvement.


1996-2020: LGTM! Well-implemented pod association initialization.

The function properly handles TLS setup and upstream connection creation with appropriate error handling and resource cleanup. The thread-safe configuration is correctly applied.


1827-1891: Verify cache key buffer size calculation.

The cache key construction adds multiple pod association fields. While the size calculation appears correct, it's complex and error-prone.

Consider adding a safety margin or using dynamic allocation:

         pod_service_found = flb_hash_table_get(ctx->pod_hash_table,
                                  meta->podname, meta->podname_len,
                                  &tmp_service_attributes, &tmp_service_attr_size);
 
         if (pod_service_found != -1 && tmp_service_attributes != NULL) {
             if (tmp_service_attributes->name[0] != '\0') {
                 n += tmp_service_attributes->name_len + 1;
             }
             if (tmp_service_attributes->environment[0] != '\0') {
                 n += tmp_service_attributes->environment_len + 1;
             }
             if (tmp_service_attributes->name_source[0] != '\0') {
                 n += tmp_service_attributes->name_source_len + 1;
             }
         }
+        
+        /* Add safety margin for separators and null terminator */
+        n += 10;

2213-2228: LGTM! Clever platform detection logic.

The platform detection based on the presence of the aws-auth ConfigMap in the kube-system namespace is a smart way to differentiate between EKS and native Kubernetes clusters.

Comment on lines 201 to 227
/* Configuration used for enabling pod to service name mapping*/
int use_pod_association;
char *pod_association_host;
char *pod_association_endpoint;
int pod_association_port;

/*
* TTL is used to check how long should the mapped entry
* remain in the hash table
*/
struct flb_hash_table *pod_hash_table;
int pod_service_map_ttl;
int pod_service_map_refresh_interval;
flb_sds_t pod_service_preload_cache_path;
struct flb_upstream *pod_association_upstream;
/*
* This variable holds the Kubernetes platform type
* Current checks for EKS or Native Kuberentes
*/
char *platform;
/*
* This value is used for holding the platform config
* value. Platform will be overriden with this variable
* if it's set
*/
char *set_platform;

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Verify thread safety for pod_hash_table access

The pod_hash_table field will likely be accessed from multiple threads (main thread and background fetch thread mentioned in the PR summary). Ensure proper synchronization mechanisms are in place.


🏁 Script executed:

#!/bin/bash
# Search for mutex/lock usage around pod_hash_table access
rg -A 5 -B 5 "pod_hash_table" plugins/filter_kubernetes/

Length of output: 5956


Synchronize pod_hash_table reads in kube_meta.c

While parse_pod_service_map() properly guards writes to ctx->pod_hash_table with pthread_mutex_lock(&metadata_mutex) / pthread_mutex_unlock(&metadata_mutex), the lookups in kube_meta.c occur without any locking. This introduces a data‐race between the background fetch thread (writes) and the pipeline threads (reads).

Please wrap each flb_hash_table_get(ctx->pod_hash_table, …) in plugins/filter_kubernetes/kube_meta.c with the same metadata_mutex lock/unlock pair. For example:

- pod_service_found = flb_hash_table_get(ctx->pod_hash_table,
-                                      meta->podname, meta->podname_len,
-                                      &tmp_service_attributes,
-                                      &tmp_service_attr_size);
+ pthread_mutex_lock(&metadata_mutex);
+ pod_service_found = flb_hash_table_get(ctx->pod_hash_table,
+                                      meta->podname, meta->podname_len,
+                                      &tmp_service_attributes,
+                                      &tmp_service_attr_size);
+ pthread_mutex_unlock(&metadata_mutex);

Locations to update:

  • plugins/filter_kubernetes/kube_meta.c around the two calls to flb_hash_table_get(...) where pod_service_found is assigned.

This change will ensure safe concurrent access to the shared hash table.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In plugins/filter_kubernetes/kube_meta.c around the lines where
pod_service_found is assigned using flb_hash_table_get, the reads from
ctx->pod_hash_table are not protected by the metadata_mutex, causing a data race
with writes in parse_pod_service_map. To fix this, wrap each flb_hash_table_get
call with pthread_mutex_lock(&metadata_mutex) before the call and
pthread_mutex_unlock(&metadata_mutex) immediately after, ensuring thread-safe
concurrent access to the shared pod_hash_table.

@zhihonl
Copy link
Contributor Author

zhihonl commented Sep 22, 2025

@edsiper PR build failure doesn't look related this PR so think we should be good. Let me know if there's anything else you need for closing out this PR.

@edsiper edsiper merged commit 4bf4d41 into fluent:master Sep 22, 2025
51 of 52 checks passed
@edsiper
Copy link
Member

edsiper commented Sep 22, 2025

thank you!

SagiROosto pushed a commit to AnyVisionltd/fluent-bit that referenced this pull request Nov 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants