Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Use Node affinity instead of device #2339

Merged
merged 15 commits into from
Feb 4, 2025

Conversation

tamirdavid1
Copy link
Collaborator

No description provided.

@tamirdavid1 tamirdavid1 force-pushed the odiglet-use-node-affinity branch 2 times, most recently from 897e515 to 6060cf2 Compare January 29, 2025 09:17
@tamirdavid1 tamirdavid1 changed the title feat: init commit feat: Use Node affinity instead of device Jan 29, 2025
@tamirdavid1 tamirdavid1 force-pushed the odiglet-use-node-affinity branch from a0e2a8b to 5e669ee Compare January 29, 2025 10:38
@tamirdavid1 tamirdavid1 force-pushed the odiglet-use-node-affinity branch from 2d93917 to a1864e9 Compare January 29, 2025 12:19
Copy link
Contributor

@damemi damemi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the overall approach lgtm, can you please update the e2es to check that the affinity is added and removed when expected? (doesn't need to be in every test, just one add/remove should be good)

Also, is the label ever cleaned up when Odigos is uninstalled?

corev1 "k8s.io/api/core/v1"
)

func AddOdigletInstalledAffinity(pod *corev1.Pod) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like this is only called by the instrumentor right now, is there a plan for it to be used in other places? If not it could probably just be there instead of k8sutils

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks !

@@ -395,6 +395,16 @@ func NewOdigletDaemonSet(ns string, version string, imagePrefix string, imageNam
Args: []string{
"init",
},
Env: []corev1.EnvVar{
{
Name: "NODE_NAME",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: consider using k8sconsts.NodeNameEnvVar here and also below in the main container

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@RonFed
Copy link
Collaborator

RonFed commented Jan 29, 2025

Looks great, +1 to Mike's comments, especially about cleaning the label
Also we need to think on how this will work on odigos upgrade, since odiglets will do a rollout

@tamirdavid1 tamirdavid1 force-pushed the odiglet-use-node-affinity branch from bcf1f5b to 838b26d Compare January 30, 2025 13:05
@tamirdavid1
Copy link
Collaborator Author

the overall approach lgtm, can you please update the e2es to check that the affinity is added and removed when expected? (doesn't need to be in every test, just one add/remove should be good)

Also, is the label ever cleaned up when Odigos is uninstalled?

First of all thanks for the great comments :)

  1. I've added e2e tests to check that the affinity is added (*)
  2. Added cleanup label from node once Odigos uninstalled
  • it will be removed if app not instrumented (and webhook wont run to add it)

Copy link
Contributor

@damemi damemi Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, could you please add an update to one of the uninstrumentation tests to make sure the affinity is removed? Since it's a required affinity, if we don't remove it then user workloads could end up unschedulable (like if odigos is uninstalled)

I think you could test that by basically copying this file (simple-demo-instrumented), asserting that affinity is empty with chainsaw, and then adding that assertion to some place like here in the Source e2e, after we've uninstrumented the workloads

after that, lgtm

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently as far as i know we dont have un-instrumentation tests.
and we're adding the affinity on the webhook which run only for pods of instrumented workloads

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I forgot that this is on the pods, not the workload. In that case we shouldn't need to worry about the scheduling issue I mentioned. Since when a workload is uninstrumented, it will roll out new pods.

os.Exit(-1)
}

if err := k8snode.AddLabelToNode(nn, k8sconsts.OdigletInstalledLabel, k8sconsts.OdigletInstalledLabelValue); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should specify here if it's community or enterprise.
Since these 2 options will end up with different content in the node fs, and if a community pod get's scheduled on onprem node or vice-versa, it will crash the pod.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've splitted enterprise/community pod labels and also handle the cleanup of them

@tamirdavid1 tamirdavid1 force-pushed the odiglet-use-node-affinity branch 3 times, most recently from dea2bc0 to f2f168f Compare February 4, 2025 09:37
@tamirdavid1 tamirdavid1 force-pushed the odiglet-use-node-affinity branch from b7ce6f7 to 1e33641 Compare February 4, 2025 12:35
@tamirdavid1 tamirdavid1 merged commit f2ddc8a into odigos-io:main Feb 4, 2025
43 of 46 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants