-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Use Node affinity instead of device #2339
feat: Use Node affinity instead of device #2339
Conversation
897e515
to
6060cf2
Compare
a0e2a8b
to
5e669ee
Compare
2d93917
to
a1864e9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the overall approach lgtm, can you please update the e2es to check that the affinity is added and removed when expected? (doesn't need to be in every test, just one add/remove should be good)
Also, is the label ever cleaned up when Odigos is uninstalled?
k8sutils/pkg/pod/pod.go
Outdated
corev1 "k8s.io/api/core/v1" | ||
) | ||
|
||
func AddOdigletInstalledAffinity(pod *corev1.Pod) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like this is only called by the instrumentor right now, is there a plan for it to be used in other places? If not it could probably just be there instead of k8sutils
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks !
cli/cmd/resources/odiglet.go
Outdated
@@ -395,6 +395,16 @@ func NewOdigletDaemonSet(ns string, version string, imagePrefix string, imageNam | |||
Args: []string{ | |||
"init", | |||
}, | |||
Env: []corev1.EnvVar{ | |||
{ | |||
Name: "NODE_NAME", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: consider using k8sconsts.NodeNameEnvVar
here and also below in the main container
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Looks great, +1 to Mike's comments, especially about cleaning the label |
bcf1f5b
to
838b26d
Compare
First of all thanks for the great comments :)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, could you please add an update to one of the uninstrumentation tests to make sure the affinity is removed? Since it's a required affinity, if we don't remove it then user workloads could end up unschedulable (like if odigos is uninstalled)
I think you could test that by basically copying this file (simple-demo-instrumented
), asserting that affinity is empty with chainsaw, and then adding that assertion to some place like here in the Source e2e, after we've uninstrumented the workloads
after that, lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently as far as i know we dont have un-instrumentation tests.
and we're adding the affinity on the webhook which run only for pods of instrumented workloads
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I forgot that this is on the pods, not the workload. In that case we shouldn't need to worry about the scheduling issue I mentioned. Since when a workload is uninstrumented, it will roll out new pods.
odiglet/odiglet.go
Outdated
os.Exit(-1) | ||
} | ||
|
||
if err := k8snode.AddLabelToNode(nn, k8sconsts.OdigletInstalledLabel, k8sconsts.OdigletInstalledLabelValue); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should specify here if it's community or enterprise.
Since these 2 options will end up with different content in the node fs, and if a community pod get's scheduled on onprem node or vice-versa, it will crash the pod.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've splitted enterprise/community pod labels and also handle the cleanup of them
dea2bc0
to
f2f168f
Compare
…rading oss>enterprise
b7ce6f7
to
1e33641
Compare
No description provided.