-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SGX: set EPC limits via NRI annotations #1582
base: main
Are you sure you want to change the base?
Conversation
66288ca
to
56c0a9f
Compare
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #1582 +/- ##
==========================================
- Coverage 51.94% 51.68% -0.26%
==========================================
Files 42 42
Lines 4911 4941 +30
==========================================
+ Hits 2551 2554 +3
- Misses 2210 2237 +27
Partials 150 150 ☔ View full report in Codecov by Sentry. |
@@ -32,10 +32,22 @@ spec: | |||
- name: sgx-provision | |||
mountPath: /dev/sgx_provision | |||
readOnly: true | |||
- name: nri-sgx-epc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the node selector for the NRI plugin could also including the intel.feature.node.kubernetes.io/sgx
label that adds the NFD add-on to SGX enabled nodes?
This label is already used for the NFD NodeFeatureRule
(see here). This more specific node selector could be an optional kustomize
patch (or added to the epc-nfd overlay) and perhaps used for the entire DaemonSet
? This would ensure that the components only run on the nodes on which they are required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can address this in the next rev. The current draft just pulls the NRI bits + the SGX webhook modifications and does not add support for any other deployment options just yet.
I was also checking if NFD can check "NRI is enabled" but that does not seem to be available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in case an additional node selector would be added, you would like to add it to all deployment options and not just for the one based on the kustomize files (described here)?
Should I create an official feature request for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This more specific node selector could be an optional
kustomize
patch (or added to the epc-nfd overlay) and perhaps used for the entireDaemonSet
?
I could add intel.feature.node.kubernetes.io/sgx
to the epc-nfd
overlay by default. The "base" daemonSet is kept minimal and I'm likely going to drop the NRI bits to an overlay in the final version to make it user "opt-in".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ffuerste I made some updates to this. let me know how it looks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late reply.
I really like the implementation of the NFD label as a kustomize patch as part of the overlay for the intel-sgx-plugin DaemonSet
.
Also, I think the defining the NFD label as a nodeSelector
for the new cAdvisor DaemonSet
is a pretty good idea since you named the directory for the resource files sgx_epc_metrics
. This clearly points to the use case of providing metrics on EPC consumption.
However, maybe we should add a short comment to the label defined as the nodeSelector
in the patches? E.g. # intel.feature.node.kubernetes.io/sgx: 'true' # added to SGX enabled nodes by NFD add-on
. Maybe this is not necessary for the epc-nfd
overlay, but I think it's beneficial for the cAdvisor resource file, because currently the dependency to the NFD add-on is not specified anywhere.
Moreover, maybe the new cAdvisor component should be mentioned in the README of the sgx plugin? Or you don't want to add it there, because it's maybe only needed until the cAdvisor in the kubelet is supporting the misc metrics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but I think it's beneficial for the cAdvisor resource file, because currently the dependency to the NFD add-on is not specified anywhere.
I can add the comment my next rebase. I'm not sure the label is going to be needed in the final version because cAdvisor could be used for cluster telemetry in general. Unless of course the desire is to only get the EPC+memory metrics... Let's see.
Moreover, maybe the new cAdvisor component should be mentioned in the README of the sgx plugin? Or you don't want to add it there, because it's maybe only needed until the cAdvisor in the kubelet is supporting the misc metrics?
I will add it later once I have better visibility on the PR readiness (which depends on runc 1.2 release I believe). We are most likely going to need the "standalone" deployment for a while.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or you don't want to add it there, because it's maybe only needed until the cAdvisor in the kubelet is supporting the misc metrics?
I do not think kubelet will be adding metrics, as there are plans on reducing them:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. I had checked with the KEP author on what the best approach for me would be and the suggestion was to start with runc+cAdvisor side of things. Longer run we'd need misc
to be part of runtimes' metrics which is the future anyway but for now we'd use a standalone cAdvisor.
77acf3e
to
bc4654f
Compare
c876e22
to
a817f63
Compare
4cbfdfc
to
69ddd56
Compare
7b3235e
to
0b70ce4
Compare
cb3cd37
to
ce38f8d
Compare
Signed-off-by: Mikko Ylinen <[email protected]>
Signed-off-by: Mikko Ylinen <[email protected]>
Signed-off-by: Mikko Ylinen <[email protected]>
Signed-off-by: Mikko Ylinen <[email protected]>
Signed-off-by: Mikko Ylinen <[email protected]>
No description provided.