-
Notifications
You must be signed in to change notification settings - Fork 1.9k
filter_kubernetes: add entity attribute retrieval logics #10586
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
CI needs to pass before the review |
5dc266e to
954dc73
Compare
WalkthroughAdds AWS pod-to-service association to the Kubernetes filter: new config options and flb_kube fields (deploymentRegex, aws_pod_service_hash_table, TLS/upstream, platform), resource-type-aware API requests and augmented metadata (aws_entity_*), workload discovery, background thread to load/refresh a pod→service map (preload file or API), and a new kubernetes_aws module with build wiring. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant FB as Fluent Bit
participant KF as Kube Filter
participant BG as Background Thread
participant AWS as kubernetes_aws
participant HT as Pod Service HashTable
FB->>KF: cb_kube_init(config)
KF->>KF: flb_kube_pod_association_init(config)
KF-->>BG: start update_pod_service_map(task_args)
loop every refresh_interval
BG->>AWS: fetch_pod_service_map(api_url, mutex)
AWS->>HT: parse & update pod→service entries (file or API)
AWS-->>BG: status
BG->>BG: sleep(refresh_interval)
end
FB->>KF: process record (pod metadata)
KF->>HT: lookup pod→service (aws_pod_service_hash_table)
KF->>KF: search_workload()/deploymentRegex
KF-->>FB: emit enriched metadata (aws_entity_*)
sequenceDiagram
autonumber
participant KF as Kube Filter
participant API as K8s API Server
participant KA as kubernetes_aws
KF->>KA: determine_platform()
KA->>API: GET kube-system/aws-auth ConfigMap
API-->>KA: 200 / not found
KA-->>KF: platform = eks | k8s
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 18
🔭 Outside diff range comments (1)
tests/runtime/filter_kubernetes.c (1)
386-409: Check return values of file operations in tests.The test doesn't verify if
clear_fileandwrite_log_to_filesucceed, which could lead to false positives if file operations fail.if(type == KUBE_POD_ASSOCIATION) { - clear_file(path); + ret = clear_file(path); + TEST_CHECK_(ret == 0, "clearing log file"); } /* Start the engine */ ret = flb_start(ctx.flb); TEST_CHECK_(ret == 0, "starting engine"); if (ret == -1) { goto exit; } /* Poll for up to 3 seconds or until we got a match */ for (ret = 0; ret < 3000 && result.nMatched != nExpected; ret++) { usleep(1000); if (ret == 2000 && type == KUBE_POD_ASSOCIATION) { - write_log_to_file(path); + int write_ret = write_log_to_file(path, NULL); + TEST_CHECK_(write_ret == 0, "writing log entry"); } }
🧹 Nitpick comments (7)
tests/runtime/data/kubernetes/meta/options_use-kubelet-enabled-pod.meta (1)
1-109: Consider consolidating duplicate test dataThis file is nearly identical to
options_use-pod-association-enabled.metaexcept for the pod name. Consider using a template or parameterized approach to reduce duplication and maintenance burden.plugins/filter_kubernetes/kube_conf.h (2)
159-159: Inconsistent naming conventionThe field
deploymentRegexuses camelCase while other fields in the struct use snake_case (e.g.,api_host,api_port). Consider renaming todeployment_regexfor consistency.- struct flb_regex *deploymentRegex; + struct flb_regex *deployment_regex;
91-100: Consider using flexible buffer sizesThe
service_attributesstruct uses fixed-size buffers. While the sizes align with AWS limits, consider whether dynamic allocation might be more memory-efficient, especially if many instances are created.plugins/filter_kubernetes/kubernetes.c (1)
478-478: Fix parameter type mismatch.The function signature expects a
const char *but namespace_kube_buf is alreadyconst char *, making the cast unnecessary and potentially hiding type issues.- const char *namespace_kube_buf, + const char *namespace_kube_buf,plugins/filter_kubernetes/kube_meta.c (3)
832-847: Simplify NULL checks in callback function.The NULL checks for
name,value, anddataare defensive but unnecessary in a callback context. If these are NULL, there's a bigger problem with the caller.Consider simplifying to just check
vlen:static void cb_results_workload(const char *name, const char *value, size_t vlen, void *data) { - if (name == NULL || value == NULL || vlen == 0 || data == NULL) { + if (vlen == 0) { return; } struct flb_kube_meta *meta = data;
861-953: LGTM! Well-structured workload detection logic.The implementation correctly handles the priority-based workload search through owner references. The use of goto for the
set_workloadlabel is appropriate here as it reduces code duplication.Minor suggestions:
- Consider adding a comment explaining the priority order at the function start
- Some variable declarations could be moved closer to their usage
1467-1537: LGTM! Comprehensive AWS entity metadata enrichment.The implementation correctly adds AWS entity fields to the metadata with proper conditional checks and string packing. The code follows the existing pattern in the codebase.
Consider extracting a helper function to reduce repetition:
static void pack_string_field(msgpack_packer *mp_pck, const char *key, size_t key_len, const char *value, size_t value_len) { msgpack_pack_str(mp_pck, key_len); msgpack_pack_str_body(mp_pck, key, key_len); msgpack_pack_str(mp_pck, value_len); msgpack_pack_str_body(mp_pck, value, value_len); }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (19)
tests/runtime/data/kubernetes/log/options/options_use-kubelet-disabled-daemonset_fluent-bit.logis excluded by!**/*.logtests/runtime/data/kubernetes/log/options/options_use-kubelet-disabled-deployment_fluent-bit.logis excluded by!**/*.logtests/runtime/data/kubernetes/log/options/options_use-kubelet-disabled-pod_fluent-bit.logis excluded by!**/*.logtests/runtime/data/kubernetes/log/options/options_use-kubelet-disabled-replicaset_fluent-bit.logis excluded by!**/*.logtests/runtime/data/kubernetes/log/options/options_use-kubelet-enabled-daemonset_fluent-bit.logis excluded by!**/*.logtests/runtime/data/kubernetes/log/options/options_use-kubelet-enabled-deployment_fluent-bit.logis excluded by!**/*.logtests/runtime/data/kubernetes/log/options/options_use-kubelet-enabled-pod_fluent-bit.logis excluded by!**/*.logtests/runtime/data/kubernetes/log/options/options_use-kubelet-enabled-replicaset_fluent-bit.logis excluded by!**/*.logtests/runtime/data/kubernetes/log/options/options_use-pod-association-enabled_fluent-bit.logis excluded by!**/*.logtests/runtime/data/kubernetes/out/options/options_use-kubelet-disabled-daemonset_fluent-bit.outis excluded by!**/*.outtests/runtime/data/kubernetes/out/options/options_use-kubelet-disabled-deployment_fluent-bit.outis excluded by!**/*.outtests/runtime/data/kubernetes/out/options/options_use-kubelet-disabled-pod_fluent-bit.outis excluded by!**/*.outtests/runtime/data/kubernetes/out/options/options_use-kubelet-disabled-replicaset_fluent-bit.outis excluded by!**/*.outtests/runtime/data/kubernetes/out/options/options_use-kubelet-enabled-daemonset_fluent-bit.outis excluded by!**/*.outtests/runtime/data/kubernetes/out/options/options_use-kubelet-enabled-deployment_fluent-bit.outis excluded by!**/*.outtests/runtime/data/kubernetes/out/options/options_use-kubelet-enabled-pod_fluent-bit.outis excluded by!**/*.outtests/runtime/data/kubernetes/out/options/options_use-kubelet-enabled-replicaset_fluent-bit.outis excluded by!**/*.outtests/runtime/data/kubernetes/out/options/options_use-pod-association-enabled_fluent-bit.outis excluded by!**/*.outtests/runtime/data/kubernetes/servicemap/options_use-pod-association-enabled_fluent-bit.mapis excluded by!**/*.map
📒 Files selected for processing (18)
plugins/filter_kubernetes/kube_conf.c(4 hunks)plugins/filter_kubernetes/kube_conf.h(3 hunks)plugins/filter_kubernetes/kube_meta.c(22 hunks)plugins/filter_kubernetes/kube_meta.h(2 hunks)plugins/filter_kubernetes/kube_regex.c(1 hunks)plugins/filter_kubernetes/kube_regex.h(1 hunks)plugins/filter_kubernetes/kubernetes.c(5 hunks)tests/runtime/data/kubernetes/meta/options_use-kubelet-disabled-daemonset.meta(1 hunks)tests/runtime/data/kubernetes/meta/options_use-kubelet-disabled-deployment.meta(1 hunks)tests/runtime/data/kubernetes/meta/options_use-kubelet-disabled-pod.meta(1 hunks)tests/runtime/data/kubernetes/meta/options_use-kubelet-disabled-replicaset.meta(1 hunks)tests/runtime/data/kubernetes/meta/options_use-kubelet-enabled-daemonset.meta(1 hunks)tests/runtime/data/kubernetes/meta/options_use-kubelet-enabled-deployment.meta(1 hunks)tests/runtime/data/kubernetes/meta/options_use-kubelet-enabled-pod.meta(1 hunks)tests/runtime/data/kubernetes/meta/options_use-kubelet-enabled-replicaset.meta(1 hunks)tests/runtime/data/kubernetes/meta/options_use-kubelet-enabled.meta(1 hunks)tests/runtime/data/kubernetes/meta/options_use-pod-association-enabled.meta(1 hunks)tests/runtime/filter_kubernetes.c(9 hunks)
🔇 Additional comments (26)
tests/runtime/data/kubernetes/meta/options_use-kubelet-enabled-replicaset.meta (1)
1-119: LGTM! Well-structured test data for ReplicaSet pod association.The JSON structure is valid and provides comprehensive test data for a pod owned by a ReplicaSet. The owner reference correctly specifies the controlling ReplicaSet with appropriate metadata fields.
tests/runtime/data/kubernetes/meta/options_use-kubelet-enabled.meta (1)
20-30: LGTM! Proper addition of Deployment owner reference.The ownerReferences field is correctly structured with appropriate API version, kind, and controller flags for a Deployment resource.
tests/runtime/data/kubernetes/meta/options_use-kubelet-enabled-daemonset.meta (1)
1-119: LGTM! Comprehensive test data for DaemonSet pod association.The JSON manifest is well-structured and provides complete test coverage for a pod owned by a DaemonSet, including proper owner reference metadata.
plugins/filter_kubernetes/kube_conf.c (4)
193-196: LGTM! Pod hash table creation is properly configured.The pod hash table creation correctly uses the configured TTL value and eviction policy for managing pod-to-service mappings.
214-216: LGTM! Proper cleanup of pod hash table.The pod hash table cleanup is correctly implemented with null check before destruction.
226-228: LGTM! Deployment regex cleanup is properly handled.The deployment regex is properly destroyed with appropriate null check.
243-253: LGTM! Comprehensive cleanup of new pod association resources.All new resources for pod association feature are properly cleaned up:
- TLS context destruction
- Upstream connection cleanup
- Platform string deallocation
The cleanup order and null checks are appropriate.
tests/runtime/data/kubernetes/meta/options_use-kubelet-disabled-deployment.meta (1)
1-126: LGTM! Well-structured test data for deployment pod scenario.The Kubernetes Pod manifest JSON is properly structured with:
- Valid metadata including owner references to ReplicaSet (typical for deployment)
- Complete pod specification with container, volumes, and tolerations
- Realistic status information for a running pod
- Appropriate annotations for Prometheus monitoring
This test data effectively supports validation of pod association with deployment workloads.
tests/runtime/data/kubernetes/meta/options_use-kubelet-enabled-deployment.meta (1)
1-119: Verify the JSON structure for kubelet enabled scenario.The JSON structure appears to mix individual Pod format (lines 1-4) with list format (items array). This seems unusual as:
- Kubelet API typically returns pod lists without outer Pod kind/apiVersion
- The combination of Pod metadata at root level with items array is non-standard
Please verify this structure matches the expected kubelet API response format for the test scenario.
plugins/filter_kubernetes/kube_meta.h (4)
30-31: LGTM! Consistent field additions for cluster and workload metadata.The new length and string pointer fields follow the established pattern in the struct, maintaining consistency with existing metadata fields.
Also applies to: 38-39
40-41: LGTM! Proper string pointer declarations for new metadata fields.The cluster and workload string pointers are appropriately declared and positioned within the struct.
Also applies to: 46-47
61-66: LGTM! Well-defined constants for ConfigMap API and resource types.The new macro and constants follow existing naming conventions:
- ConfigMap API path format is consistent with existing API formats
- Resource type constants are clearly defined and appropriately named
81-81: LGTM! Appropriate function declaration for pod association initialization.The function declaration follows the established pattern and naming convention for initialization functions in this module.
tests/runtime/data/kubernetes/meta/options_use-kubelet-disabled-replicaset.meta (1)
1-126: LGTM! Appropriate test data for ReplicaSet pod scenario.The Kubernetes Pod manifest JSON is well-structured with:
- Valid metadata with owner reference directly to ReplicaSet (line 23)
- Complete pod specification matching other test files
- Consistent status information for a running pod
- Proper differentiation from deployment scenario by having direct ReplicaSet ownership
This effectively tests pod association with ReplicaSet workloads distinct from deployment-managed ReplicaSets.
tests/runtime/data/kubernetes/meta/options_use-pod-association-enabled.meta (1)
4-4: Empty metadata object in Pod List responseThe top-level
metadatafield is empty, which is unusual for a Kubernetes Pod List API response. Consider whether this accurately represents the expected API response structure.tests/runtime/data/kubernetes/meta/options_use-kubelet-disabled-pod.meta (1)
1-116: Different JSON structure compared to pod-association-enabled.metaThis file contains a single Pod object while
options_use-pod-association-enabled.metacontains a Pod List. Ensure this structural difference is intentional for your test scenarios.tests/runtime/data/kubernetes/meta/options_use-kubelet-disabled-daemonset.meta (1)
19-28: LGTM! Appropriate ownerReferences for DaemonSetThe ownerReferences section correctly identifies this pod as being owned by a DaemonSet, which is appropriate for testing pod association with different workload types.
plugins/filter_kubernetes/kube_conf.h (1)
228-235: LGTM! Comprehensive TLS configurationThe TLS configuration fields for pod association provide proper support for mutual TLS authentication with CA verification, client certificates, and debugging options.
plugins/filter_kubernetes/kubernetes.c (2)
189-273: LGTM! Well-structured error handling.The function properly handles errors with appropriate cleanup of upstream connections and TLS resources. The TLS certificate rotation logic is well implemented.
125-129: Memory leak on JSON parsing error.The
bufferis allocated byflb_pack_jsonbut not freed before the early return when parsing fails.if (ret < 0) { flb_plg_warn(ctx->ins, "Could not parse json response = %s", api_buf); - flb_free(buffer); return; } +Move the
flb_free(buffer)to line 186 where it's freed in all paths.Likely an incorrect or invalid review comment.
tests/runtime/filter_kubernetes.c (2)
286-286: Consider implications of debug log level in tests.Changing from "error" to "debug" will produce more verbose output during test runs, which might make it harder to identify actual test failures.
Is this change intentional for debugging the new pod association feature? Consider using a conditional log level based on an environment variable to allow both verbose and quiet test runs.
491-513: Verify test certificate paths exist or are mockedThe Kubernetes filter tests (
tests/runtime/filter_kubernetes.c) use hardcoded absolute paths for TLS fixtures—/tst/ca.crt,/tst/client.crtand/tst/client.key—but I don’t see corresponding files or setup in the repo. Please ensure these files are created or mocked in your test harness (or move them under the existingDPATHtest directory) so the tests can reliably locate them.
- File: tests/runtime/filter_kubernetes.c (around lines 491–513)
- Hardcoded paths to verify or replace:
•"pod_association_host_server_ca_file", "/tst/ca.crt"
•"pod_association_host_client_cert_file", "/tst/client.crt"
•"pod_association_host_client_key_file", "/tst/client.key"plugins/filter_kubernetes/kube_meta.c (4)
355-418: LGTM! Good generalization of the function.The refactoring to accept
resource_typeandresource_nameparameters makes the function more reusable for different Kubernetes resources, which is a good architectural improvement.
1996-2020: LGTM! Well-implemented pod association initialization.The function properly handles TLS setup and upstream connection creation with appropriate error handling and resource cleanup. The thread-safe configuration is correctly applied.
1827-1891: Verify cache key buffer size calculation.The cache key construction adds multiple pod association fields. While the size calculation appears correct, it's complex and error-prone.
Consider adding a safety margin or using dynamic allocation:
pod_service_found = flb_hash_table_get(ctx->pod_hash_table, meta->podname, meta->podname_len, &tmp_service_attributes, &tmp_service_attr_size); if (pod_service_found != -1 && tmp_service_attributes != NULL) { if (tmp_service_attributes->name[0] != '\0') { n += tmp_service_attributes->name_len + 1; } if (tmp_service_attributes->environment[0] != '\0') { n += tmp_service_attributes->environment_len + 1; } if (tmp_service_attributes->name_source[0] != '\0') { n += tmp_service_attributes->name_source_len + 1; } } + + /* Add safety margin for separators and null terminator */ + n += 10;
2213-2228: LGTM! Clever platform detection logic.The platform detection based on the presence of the
aws-authConfigMap in thekube-systemnamespace is a smart way to differentiate between EKS and native Kubernetes clusters.
| /* Configuration used for enabling pod to service name mapping*/ | ||
| int use_pod_association; | ||
| char *pod_association_host; | ||
| char *pod_association_endpoint; | ||
| int pod_association_port; | ||
|
|
||
| /* | ||
| * TTL is used to check how long should the mapped entry | ||
| * remain in the hash table | ||
| */ | ||
| struct flb_hash_table *pod_hash_table; | ||
| int pod_service_map_ttl; | ||
| int pod_service_map_refresh_interval; | ||
| flb_sds_t pod_service_preload_cache_path; | ||
| struct flb_upstream *pod_association_upstream; | ||
| /* | ||
| * This variable holds the Kubernetes platform type | ||
| * Current checks for EKS or Native Kuberentes | ||
| */ | ||
| char *platform; | ||
| /* | ||
| * This value is used for holding the platform config | ||
| * value. Platform will be overriden with this variable | ||
| * if it's set | ||
| */ | ||
| char *set_platform; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Verify thread safety for pod_hash_table access
The pod_hash_table field will likely be accessed from multiple threads (main thread and background fetch thread mentioned in the PR summary). Ensure proper synchronization mechanisms are in place.
🏁 Script executed:
#!/bin/bash
# Search for mutex/lock usage around pod_hash_table access
rg -A 5 -B 5 "pod_hash_table" plugins/filter_kubernetes/Length of output: 5956
Synchronize pod_hash_table reads in kube_meta.c
While parse_pod_service_map() properly guards writes to ctx->pod_hash_table with pthread_mutex_lock(&metadata_mutex) / pthread_mutex_unlock(&metadata_mutex), the lookups in kube_meta.c occur without any locking. This introduces a data‐race between the background fetch thread (writes) and the pipeline threads (reads).
Please wrap each flb_hash_table_get(ctx->pod_hash_table, …) in plugins/filter_kubernetes/kube_meta.c with the same metadata_mutex lock/unlock pair. For example:
- pod_service_found = flb_hash_table_get(ctx->pod_hash_table,
- meta->podname, meta->podname_len,
- &tmp_service_attributes,
- &tmp_service_attr_size);
+ pthread_mutex_lock(&metadata_mutex);
+ pod_service_found = flb_hash_table_get(ctx->pod_hash_table,
+ meta->podname, meta->podname_len,
+ &tmp_service_attributes,
+ &tmp_service_attr_size);
+ pthread_mutex_unlock(&metadata_mutex);Locations to update:
- plugins/filter_kubernetes/kube_meta.c around the two calls to
flb_hash_table_get(...)wherepod_service_foundis assigned.
This change will ensure safe concurrent access to the shared hash table.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In plugins/filter_kubernetes/kube_meta.c around the lines where
pod_service_found is assigned using flb_hash_table_get, the reads from
ctx->pod_hash_table are not protected by the metadata_mutex, causing a data race
with writes in parse_pod_service_map. To fix this, wrap each flb_hash_table_get
call with pthread_mutex_lock(&metadata_mutex) before the call and
pthread_mutex_unlock(&metadata_mutex) immediately after, ensuring thread-safe
concurrent access to the shared pod_hash_table.
tests/runtime/data/kubernetes/meta/options_use-kubelet-disabled-pod.meta
Outdated
Show resolved
Hide resolved
tests/runtime/data/kubernetes/meta/options_use-kubelet-enabled-pod.meta
Outdated
Show resolved
Hide resolved
tests/runtime/data/kubernetes/meta/options_use-pod-association-enabled.meta
Outdated
Show resolved
Hide resolved
5bfdfba to
8c1d5b7
Compare
8c1d5b7 to
980f69b
Compare
Signed-off-by: Zhihong Lin <[email protected]>
980f69b to
d8c6162
Compare
|
@edsiper PR build failure doesn't look related this PR so think we should be good. Let me know if there's anything else you need for closing out this PR. |
|
thank you! |
…10586) Signed-off-by: Zhihong Lin <[email protected]>
Background
CloudWatch introduced a new feature called Explored Related which is a UI component that helps user navigate between their telemetry(metrics,logs) and AWS resources. This is done by including a new field in PutLogEvents API which is called Entity. More documentation about this new field can be found here: https://docs.aws.amazon.com/AmazonCloudWatchLogs/latest/APIReference/API_Entity.html
Problem
For enabling Explore Related experience in AWS EKS cluster, we need additional data such as workload(deployment, daemonset, etc) type, cluster name. There's also an option to associate name of the service running within the cluster but FluentBit doesn't have existing functionality for that. This information is exposed from EKS Add-on called Amazon CloudWatch Observability Add-on which we can use to query for these information
Enter
[N/A]in the box, if an item is not applicable to your change.Testing
Before we can approve your change; please submit the following in a comment:
Screenshot of how UI interacts with the changes made in this PR
Example configuration file for the change
If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
ok-package-testlabel to test for all targets (requires maintainer to do).Documentation
Need to add documentation about new configuration values that enables Explored Related experience for users.
Backporting
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.
Summary by CodeRabbit
New Features
Configuration