feat(kubernetes): add kubernetes_watch input for real-time resource monitoring#624
Conversation
|
I've built some simple testing scripts as well. |
aronchick
left a comment
There was a problem hiding this comment.
I think i got to everything!
|
@gregfurman I've addressed your feedback - think i got to everything. Just LMK if you need anything else! |
3f280f6 to
ff966a3
Compare
gregfurman
left a comment
There was a problem hiding this comment.
Thanks for addressing all my feedback. The only blocking things left to address IMO is the Close method and use of caching. Looking forward to getting this in!
|
@gregfurman All feedback addressed and tests passing locally - ready for re-review! |
…onitoring Add a new kubernetes_watch input that uses the Kubernetes Watch API to stream real-time events (ADDED, MODIFIED, DELETED) for cluster resources. This enables event-driven pipelines that react to Kubernetes state changes. Features: - Watch standard resources (pods, services, deployments, etc.) - Support for Custom Resource Definitions (CRDs) via dynamic client - Per-namespace or cluster-wide monitoring - Label and field selector filtering - Automatic reconnection with exponential backoff on watch expiration - Proper handling of 410 Gone errors (expected watch API behavior) The implementation includes: - Flexible authentication (in-cluster, kubeconfig, explicit credentials) - Resource version tracking for resumable watches - Graceful shutdown coordination - Comprehensive test coverage
Replace the static standardResources map with the client-go RESTMapper for dynamic GVR (Group/Version/Resource) resolution. This approach: - Automatically supports any resource type the cluster knows about - Handles CRDs without special configuration needed - Correctly resolves plural/singular resource names - Uses cached discovery to minimize API calls The RESTMapper queries the cluster's discovery API once and caches the results, then resolves resource names to their full GVR on demand.
Use the internal filesystem abstraction (ifs) for reading token files in buildExplicitClient. This allows for proper isolation and testing of filesystem operations.
Move InClusterNamespace to auth_unix.go with Unix-only build tag since it reads /var/run/secrets/kubernetes.io/serviceaccount/namespace which doesn't exist on Windows. Add auth_windows.go stub that returns default.
Change label_selector from string to map for better YAML ergonomics.
Users can now write:
label_selector:
app: myapp
env: prod
Instead of:
label_selector: "app=myapp,env=prod"
Add LabelSelectorFromMap helper to convert map to Kubernetes format.
Change field_selector from string to map for consistency with
label_selector. Users can now write:
field_selector:
"status.phase": Running
Instead of:
field_selector: "status.phase=Running"
Add a LintRule to validate that event_types values are one of ADDED, MODIFIED, or DELETED. Update description to document valid values.
…tion Replace custom calculateBackoff function with the standard retries package. This adds configurable max_retries and backoff fields with exponential backoff behavior from cenkalti/backoff library. Remove unused backoff constants and custom implementation.
Replace custom requestContext helper with the built-in SoftStopCtx method from shutdown.Signaller. This eliminates the need for spawning goroutines to monitor the stop channel. Remove now-unused request_context.go file.
Use comma-ok idiom when reading from eventChan to properly detect when the channel is closed and return ErrEndOfInput.
…HasStopped Add WaitGroup to track watch goroutines and ensure they complete before closing the eventChan. Remove unnecessary TriggerHasStopped call which is typically managed by the framework.
Internal packages don't need extensive package-level docs. The component documentation lives in the ConfigSpec descriptions.
Update test case to reflect the label_selector field change from string to map format.
Add kubeconfig_yaml field to allow passing kubeconfig content directly as a string instead of a file path. This enables reading kubeconfig from secrets or environment variables. Uses client-go's clientcmd.NewClientConfigFromBytes() to parse the raw YAML content, with support for context override when specified.
Reverts cosmetic changes to input_file.go and metrics_json_api.go to keep the PR diff focused on kubernetes_watch changes only.
Eliminate the metaKeyCache type and its usage in favor of direct string concatenation for generating metadata keys. Update the metadataDescription function to follow Go naming conventions. This change streamlines the code and improves clarity in how metadata keys are constructed for Kubernetes resources.
- TestCloseTriggersEndOfInput: verifies Close causes Read to return ErrEndOfInput - TestCloseDrainsEventsBeforeShutdown: verifies events can be read before close - TestReadReturnsErrEndOfInputOnClosedChannel: verifies closed channel handling - TestReadRespectsContextCancellation: verifies context cancellation works - TestConcurrentReadsAndClose: verifies thread-safety with multiple readers
ba0309f to
bccd629
Compare
|
@gregfurman tag you're it. (i thinki got everything) |
|
@aronchick Will give this a final look today! Thanks for addressing all my feedback 🙌 |
gregfurman
left a comment
There was a problem hiding this comment.
Looks good! Last, you just need to add the kubernetes implementation to public/components in order to register this plugin to the global bento environment.
Here's an example of this for the mqtt component:
bento/public/components/mqtt/package.go
Lines 1 to 6 in 85332f0
then we can add this to public/components/all so that importing in github.com/warpstreamlabs/bento/public/components/all will automatically register the component to the global environment i.e
bento/public/components/all/package.go
Line 37 in 85332f0
Lastly, run make docs and commit the generated .md files for the kubernetes component that will be used in the bento doc site.
Once done, think it's good to go!
|
@jem-davies The K8s API adds an extra 24.4 MB to the bento binary size (at least on Darwin) which ends up constituing like 11% of the size. Should we add this package behind the See |
My take is that if a user is worried about binary size or say, CVE's in components they don't use - then we should advise them to create their own distribution where they import select components. So therefore I don't think we should use a build tag requiring users to specifically 'opt-in' to a k8s component. |
…onitoring (warpstreamlabs#624) * feat(kubernetes): add kubernetes_watch input for real-time resource monitoring Add a new kubernetes_watch input that uses the Kubernetes Watch API to stream real-time events (ADDED, MODIFIED, DELETED) for cluster resources. This enables event-driven pipelines that react to Kubernetes state changes. Features: - Watch standard resources (pods, services, deployments, etc.) - Support for Custom Resource Definitions (CRDs) via dynamic client - Per-namespace or cluster-wide monitoring - Label and field selector filtering - Automatic reconnection with exponential backoff on watch expiration - Proper handling of 410 Gone errors (expected watch API behavior) The implementation includes: - Flexible authentication (in-cluster, kubeconfig, explicit credentials) - Resource version tracking for resumable watches - Graceful shutdown coordination - Comprehensive test coverage --------- Co-authored-by: Greg Furman <gregfurman99@gmail.com>
Summary
Add a new
kubernetes_watchinput that uses the Kubernetes Watch API to stream real-time events for cluster resources. This enables event-driven pipelines that react to Kubernetes state changes.Features
Implementation Highlights
DeferredDiscoveryRESTMapperfor dynamic GVR resolution (no static resource list)Example Configuration
Test Plan
Files Changed (~1,060 lines)
internal/impl/kubernetes/- New kubernetes input packagego.mod,go.sum- k8s client-go dependencies