Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate odigos-instrumentation label to Source CRD #2231

Open
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

damemi
Copy link
Contributor

@damemi damemi commented Jan 14, 2025

This adds a migration for the current odigos-instrumentation label to Source CRDs.

If a workload or namespace has odigos-instrumentation: enabled, the startlangdetection controller will automatically create a Source for it (if one doesn't already exist) and log a deprecation warning.

If a workload or namespace has odigos-instrumentation: disabled, the deleteinstrumentationconfig controller will automatically create a Source with disableInstrumentation: true.

In both cases, if a Source already exists, the controller will update the existing Source. This means that the label can be toggled back and forth.

However, deleting the label will not inherently un-instrument a workload that doesn't have inheriting Namespace instrumentation. We can't tell if the label was deleted without checking the change during the event, which is bad to do. This is the only drop in feature parity from the current label.

@damemi damemi force-pushed the source-label-migration branch 2 times, most recently from 8fda813 to bd21fe1 Compare January 15, 2025 13:37
@damemi damemi force-pushed the source-label-migration branch from fd744ba to 804b47a Compare January 15, 2025 15:36
api/odigos/v1alpha1/source_types.go Show resolved Hide resolved
k8sutils/pkg/source/source.go Outdated Show resolved Hide resolved
k8sutils/pkg/source/source.go Show resolved Hide resolved
k8sutils/pkg/source/source.go Show resolved Hide resolved
Comment on lines 46 to 51
// isDeleteInstrumentationConfigSource returns true if the Source object is relevant to starting language detection.
// This means that the Source must be either:
// 1) A normal (non-excluding) Source AND terminating, or
// 2) An excluding Source AND NOT terminating
// In either of these cases, we want to check if workloads should start to be instrumented.
var SourcePredicates = predicate.Funcs{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment does not match the name.
Don't we check in the reconciler for IsActiveSource? since we are not looking at the diff here, maybe worth just removing this predicate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, a predicate makes sense to filter out checking irrelevant Sources in the reconciler. We don't need to look at the diff to tell whether this Source is relevant. If anything I think we can remove the same check in the Reconcile function

Otherwise, both startlangdetection and deleteinstrumentationconfig simultaneously try to reconcile() every Source. It's a no-op on the irrelevant controller because of the check, but moving the check to a predicate shifts it further up the chain

@damemi damemi force-pushed the source-label-migration branch from 76fc703 to 773dc66 Compare January 15, 2025 16:26
newReplicas := wNew.AvailableReplicas()
oldReplicas := wOld.AvailableReplicas()

// 1. workload became enabled and has available (running) replicas
if becameEnabled && newReplicas > 0 {
// 1. workload has available (running) replicas
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will increase the amount of events the reconciler handles significantly (from only workloads instrumented by Odigos to all workloads), It doesn't matter fro correctness, I'm just wondering about performance.
In the reconcilers it looks like we'll do a few operations for each workload even those that are not instrumented (and for every update) - maybe we should call sourceutils.IsObjectInstrumentedBySource as early as possible?

Copy link
Contributor Author

@damemi damemi Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was originally leaning away from calling IsObjectInstrumentedBySource because it means an extra API call before we even get to the reconciler. The concern is API server traffic. And since that's one of the first API calls we make inside the reconciler anyway, that tradeoff seemed like a wash.

But, for non-instrumented workloads, not filtering them would actually mean 3 unnecessary API calls: one to Reconcile(), one to getWorkloadObject(), and then IsObjectInstrumentedBySource. In contrast, we already have the object in the Predicate, so we could filter out non-instrumented workloads with just 1 API call, like you're suggesting.

However, that means an extra unnecessary API call for every instrumented workload. And instrumented workloads are really what we care about. In fact, we hope that instrumented workloads should largely outnumber non-instrumented workloads for Odigos users :)

To put it in numbers, if a cluster has at least 4 instrumented workloads for every 1 non-instrumented workload, calling IsObjectInstrumentedBySource in the predicate will be a net increase to unnecessary API server traffic

So I think the efficiency gain for non-instrumented workloads isn't worth the efficiency loss for workloads we actually care about.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this analysis.
However, I have seen cases of instrumenting single or a few workloads in a cluster for a few reasons. It might be a new client just starting and doesn't want to commit to instrumenting all the applications, or maybe most of the apps are not relevant/ written in unsupported languages.
So, in those cases (which I think are not rare) - we will have some regression in terms of performance. I guess it also depends on how often deployments are updated in the cluster. If some automatic tools are involved I think that can also increase the amount of times the reconciler will trigger.
Not saying this should block or anything, just worth to consider.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's definitely a good consideration. I also think that the "extra" API call should usually be using the controller's cache, while the event getting passed to Reconcile() is actual traffic every time. So I think my point above isn't always valid anyway.

I added a basic version of the previous check in b6ad6ac, ptal. Some notes:

  • We don't have a context to use in the requests, so the IsObjectInstrumentedBySource call needs to just use context.Background.
  • We can't retry on an error, so we need to either ignore the error and let the object through (where the controller can retry it -- say a client hiccup) or just block the request entirely. I think the former is safer so we don't miss relevant objects.
  • While we're migrating, we still need to check the label here. That's fine, but I simplified the check in Update to just look at if the new object is instrumented instead of comparing if the old object was not instrumented. There's no equivalent way with Sources to know if the old object didn't have a Source, so I think that's fine.

Let me know what you think

tests/e2e/source/chainsaw-test.yaml Show resolved Hide resolved
k8sutils/pkg/source/source.go Show resolved Hide resolved
tests/e2e/source/chainsaw-test.yaml Show resolved Hide resolved
@damemi damemi force-pushed the source-label-migration branch from b6ad6ac to cc31bbe Compare January 17, 2025 22:49
Copy link
Collaborator

@RonFed RonFed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few nits, this looks great 🥇

if !instrumented || !workload.IsObjectLabeledForInstrumentation(e.Object) {
return false
}

return w.AvailableReplicas() > 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can move this check before the API calls, to potentially save calls.

Comment on lines 58 to 62
instrumented, _ := sourceutils.IsObjectInstrumentedBySource(context.Background(), i.Client, e.ObjectNew)
if !instrumented || !workload.IsObjectLabeledForInstrumentation(e.ObjectNew) {
return false
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same nit as above: maybe better to do the replicas check before this.


sourceutils "github.com/odigos-io/odigos/k8sutils/pkg/source"
k8sutils "github.com/odigos-io/odigos/k8sutils/pkg/utils"
"github.com/odigos-io/odigos/k8sutils/pkg/workload"
)

type NamespacesReconciler struct {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe worth marking this as deprecated reconciler
just making sure, we are going to remove it once we fully remove support for labels, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we should still have workload/namespace reconcilers. Since Sources are detached from the workloads, a workload can be redeployed or come up after the Source (this is by design). If that happens, we should trigger on the event for that workload too. Maybe we could deprecate the Namespace reconciler, since Sources need to exist in a namespace. But if we ever support cross-namespace Sources we'll have similar uses for it. It's also just good to have in case we miss a Source event

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants