-
Notifications
You must be signed in to change notification settings - Fork 935
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metrics: Differentiate between restricted and unrestricted certificates #13214
Conversation
4ccf8b6
to
fced0e7
Compare
fced0e7
to
26de3e4
Compare
26de3e4
to
9ab4a81
Compare
By the way, I was trying to see what is the current behavior when using OIDC on the metrics endpoint but it doesn't return any metric samples, only their descriptions. For me it only works when using certificates, so OIDC metrics looks broken. |
please open separate issue and see if @markylaing can help with this on his return. |
With the changes in this PR it's working again. Somehow the filtering of samples got corrupted, but it looks to be fixed with 691b57f. |
I have updated the table to see which permissions are now checked when accessing the metrics endpoint. |
Cool! So Does this PR fix #13217 ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good thanks!
Will await @markylaing thoughts too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I missed one thing, please can you add a test for checking TLS cert access to metrics (restricted/unrestricted) to catch any future regressions like this
967e2f2
to
6dd789f
Compare
No, it's just another problem with returning metrics. But it's fixed in this PR. And there seems to be one more. The internal server metrics don't show up in the response if the cache interval of 8 seconds expired. @gabrielmougard have you seen the same behavior and maybe fixed it with the changes in #13222? |
6dd789f
to
39d1e94
Compare
Yes I see it too. I'm trying to come up with a fix on #13222 |
@roosterfish regarding the internal server metrics, I made them appear in the cache but I had to tie them to a project (I chose diff --git a/lxd/api_metrics.go b/lxd/api_metrics.go
index 66999f11f7..2f28e911b7 100644
--- a/lxd/api_metrics.go
+++ b/lxd/api_metrics.go
@@ -101,7 +101,7 @@ func metricsGet(d *Daemon, r *http.Request) response.Response {
metricSet := metrics.NewMetricSet(nil)
var projectNames []string
-
+ var intMetrics *metrics.MetricSet
err := s.DB.Cluster.Transaction(r.Context(), func(ctx context.Context, tx *db.ClusterTx) error {
// Figure out the projects to retrieve.
if projectName != "" {
@@ -119,9 +119,8 @@ func metricsGet(d *Daemon, r *http.Request) response.Response {
}
}
- // Add internal metrics.
- metricSet.Merge(internalMetrics(ctx, s.StartTime, tx))
-
+ // register internal metrics.
+ intMetrics = internalMetrics(ctx, s.StartTime, tx)
return nil
})
if err != nil {
@@ -304,6 +303,10 @@ func metricsGet(d *Daemon, r *http.Request) response.Response {
updatedProjects := []string{}
for project, entries := range newMetrics {
+ if project == "default" {
+ entries.Merge(intMetrics) // internal metrics are always considered new and associated with the default project.
+ }
+
metricsCache[project] = metricsCacheEntry{
expiry: time.Now().Add(cacheDuration),
metrics: entries,
@@ -402,28 +405,28 @@ func internalMetrics(ctx context.Context, daemonStartTime time.Time, tx *db.Clus
runtime.ReadMemStats(&ms)
- out.AddSamples(metrics.GoAllocBytes, metrics.Sample{Value: float64(ms.Alloc)})
- out.AddSamples(metrics.GoAllocBytesTotal, metrics.Sample{Value: float64(ms.TotalAlloc)})
- out.AddSamples(metrics.GoBuckHashSysBytes, metrics.Sample{Value: float64(ms.BuckHashSys)})
- out.AddSamples(metrics.GoFreesTotal, metrics.Sample{Value: float64(ms.Frees)})
- out.AddSamples(metrics.GoGCSysBytes, metrics.Sample{Value: float64(ms.GCSys)})
- out.AddSamples(metrics.GoHeapAllocBytes, metrics.Sample{Value: float64(ms.HeapAlloc)})
- out.AddSamples(metrics.GoHeapIdleBytes, metrics.Sample{Value: float64(ms.HeapIdle)})
- out.AddSamples(metrics.GoHeapInuseBytes, metrics.Sample{Value: float64(ms.HeapInuse)})
- out.AddSamples(metrics.GoHeapObjects, metrics.Sample{Value: float64(ms.HeapObjects)})
- out.AddSamples(metrics.GoHeapReleasedBytes, metrics.Sample{Value: float64(ms.HeapReleased)})
- out.AddSamples(metrics.GoHeapSysBytes, metrics.Sample{Value: float64(ms.HeapSys)})
- out.AddSamples(metrics.GoLookupsTotal, metrics.Sample{Value: float64(ms.Lookups)})
- out.AddSamples(metrics.GoMallocsTotal, metrics.Sample{Value: float64(ms.Mallocs)})
- out.AddSamples(metrics.GoMCacheInuseBytes, metrics.Sample{Value: float64(ms.MCacheInuse)})
- out.AddSamples(metrics.GoMCacheSysBytes, metrics.Sample{Value: float64(ms.MCacheSys)})
- out.AddSamples(metrics.GoMSpanInuseBytes, metrics.Sample{Value: float64(ms.MSpanInuse)})
- out.AddSamples(metrics.GoMSpanSysBytes, metrics.Sample{Value: float64(ms.MSpanSys)})
- out.AddSamples(metrics.GoNextGCBytes, metrics.Sample{Value: float64(ms.NextGC)})
- out.AddSamples(metrics.GoOtherSysBytes, metrics.Sample{Value: float64(ms.OtherSys)})
- out.AddSamples(metrics.GoStackInuseBytes, metrics.Sample{Value: float64(ms.StackInuse)})
- out.AddSamples(metrics.GoStackSysBytes, metrics.Sample{Value: float64(ms.StackSys)})
- out.AddSamples(metrics.GoSysBytes, metrics.Sample{Value: float64(ms.Sys)})
+ out.AddSamples(metrics.GoAllocBytes, metrics.Sample{Labels: map[string]string{"project": "default"}, Value: float64(ms.Alloc)})
+ out.AddSamples(metrics.GoAllocBytesTotal, metrics.Sample{Labels: map[string]string{"project": "default"}, Value: float64(ms.TotalAlloc)})
+ out.AddSamples(metrics.GoBuckHashSysBytes, metrics.Sample{Labels: map[string]string{"project": "default"}, Value: float64(ms.BuckHashSys)})
+ out.AddSamples(metrics.GoFreesTotal, metrics.Sample{Labels: map[string]string{"project": "default"}, Value: float64(ms.Frees)})
+ out.AddSamples(metrics.GoGCSysBytes, metrics.Sample{Labels: map[string]string{"project": "default"}, Value: float64(ms.GCSys)})
+ out.AddSamples(metrics.GoHeapAllocBytes, metrics.Sample{Labels: map[string]string{"project": "default"}, Value: float64(ms.HeapAlloc)})
+ out.AddSamples(metrics.GoHeapIdleBytes, metrics.Sample{Labels: map[string]string{"project": "default"}, Value: float64(ms.HeapIdle)})
+ out.AddSamples(metrics.GoHeapInuseBytes, metrics.Sample{Labels: map[string]string{"project": "default"}, Value: float64(ms.HeapInuse)})
+ out.AddSamples(metrics.GoHeapObjects, metrics.Sample{Labels: map[string]string{"project": "default"}, Value: float64(ms.HeapObjects)})
+ out.AddSamples(metrics.GoHeapReleasedBytes, metrics.Sample{Labels: map[string]string{"project": "default"}, Value: float64(ms.HeapReleased)})
+ out.AddSamples(metrics.GoHeapSysBytes, metrics.Sample{Labels: map[string]string{"project": "default"}, Value: float64(ms.HeapSys)})
+ out.AddSamples(metrics.GoLookupsTotal, metrics.Sample{Labels: map[string]string{"project": "default"}, Value: float64(ms.Lookups)})
+ out.AddSamples(metrics.GoMallocsTotal, metrics.Sample{Labels: map[string]string{"project": "default"}, Value: float64(ms.Mallocs)})
+ out.AddSamples(metrics.GoMCacheInuseBytes, metrics.Sample{Labels: map[string]string{"project": "default"}, Value: float64(ms.MCacheInuse)})
+ out.AddSamples(metrics.GoMCacheSysBytes, metrics.Sample{Labels: map[string]string{"project": "default"}, Value: float64(ms.MCacheSys)})
+ out.AddSamples(metrics.GoMSpanInuseBytes, metrics.Sample{Labels: map[string]string{"project": "default"}, Value: float64(ms.MSpanInuse)})
+ out.AddSamples(metrics.GoMSpanSysBytes, metrics.Sample{Labels: map[string]string{"project": "default"}, Value: float64(ms.MSpanSys)})
+ out.AddSamples(metrics.GoNextGCBytes, metrics.Sample{Labels: map[string]string{"project": "default"}, Value: float64(ms.NextGC)})
+ out.AddSamples(metrics.GoOtherSysBytes, metrics.Sample{Labels: map[string]string{"project": "default"}, Value: float64(ms.OtherSys)})
+ out.AddSamples(metrics.GoStackInuseBytes, metrics.Sample{Labels: map[string]string{"project": "default"}, Value: float64(ms.StackInuse)})
+ out.AddSamples(metrics.GoStackSysBytes, metrics.Sample{Labels: map[string]string{"project": "default"}, Value: float64(ms.StackSys)})
+ out.AddSamples(metrics.GoSysBytes, metrics.Sample{Labels: map[string]string{"project": "default"}, Value: float64(ms.Sys)})
return out
}
Right now the cache is keyed by project and these internal metrics are 'global' so this solution is a bit misleading, but it shows the right data.. What do you thiink about this approach? |
Signed-off-by: Julian Pelizäus <[email protected]>
This way we can apply multiple filters on internal server and project metrics at once. Additionally this ensures that entity URLs are always called with the right type. Otherwise this will raise a warning for every sample. Signed-off-by: Julian Pelizäus <[email protected]>
973ef27
to
61ea52d
Compare
Jup, tested this now a few times with installation going from julian@thinkpad:~$ snap install lxd --channel 5.20/stable
lxd (5.20/stable) 5.20-f3dd836 from Canonical✓ installed
julian@thinkpad:~$ lxc config trust add metrics.crt --type metrics
julian@thinkpad:~$ lxc config trust show a80f90615040 | grep restricted
restricted: false
julian@thinkpad:~$ snap refresh lxd --channel latest/edge
2024-03-28T15:55:45+01:00 INFO Waiting for "snap.lxd.daemon.service" to stop.
lxd (edge) git-e6f9af1 from Canonical✓ refreshed
julian@thinkpad:~$ lxc config trust show a80f90615040 | grep restricted
restricted: true The PR is now ready for a final review. |
// Filter all metrics for view permissions. | ||
// Internal server metrics without labels are compared against the server entity. | ||
metricSet.FilterSamples(func(labels map[string]string) bool { | ||
if labels["project"] != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct? What if the label does have a non-empty project, but the user doesn't have a project permission, but does have server permission, this would return false?
Doesn't having the server permission mean they can see all metrics (including the project ones)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay tricky one, thanks for rising this as I have found another potential security bug in my changes. The unrestricted metrics certificate got just too much permission.
But the way the checking is done here is right.
In case of the TLS driver if the certificate is unrestricted, the driver permits can_view_metrics
by default (see latest push where I went back to the original changes).
In case of OIDC this is allowed as the can_view_metrics
on the server
type supersedes the can_view_metrics
on the project
type. This way the check still returns true
even if there is no explicit can_view_metrics
entitlement set for the project (see the changed OpenFGA model).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK so its resolved internally in each driver that the user has implicit project level permissions if its assigned at the server level. Makes sense thanks
…erts Signed-off-by: Julian Pelizäus <[email protected]>
…tityTypeCertificateMetricsUnrestricted This effectively replaces identityTypeCertificateMetrics with identityTypeCertificateMetricsRestricted and adds the new type identityTypeCertificateMetricsUnrestricted. By replacing identityTypeCertificateMetrics with identityTypeCertificateMetricsRestricted we ensure that restricted metrics certificates that were falsely moved to identityTypeCertificateMetrics are still considered to be secure. On the other side this marks unrestricted certificates as restricted by default. This has to be manually modified by unrestricting the respective metrics certficiate. Signed-off-by: Julian Pelizäus <[email protected]>
61ea52d
to
d20f3d1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Mhhh, I just saw when adding new metrics certs they are |
Seems like we need more tests too. |
Signed-off-by: Julian Pelizäus <[email protected]>
…cted and unrestricted type Signed-off-by: Julian Pelizäus <[email protected]>
…TypeCertificateMetricsUnrestricted Signed-off-by: Julian Pelizäus <[email protected]>
…cted and IdentityTypeCertificateMetricsUnrestricted Signed-off-by: Julian Pelizäus <[email protected]>
…tyTypeCertificateMetricsUnrestricted Signed-off-by: Julian Pelizäus <[email protected]>
This allows granting the can_view_metrics entitlement on specific projects so that requests to the metrics endpoint containing the project query parameter only return the respective project's metrics. Signed-off-by: Julian Pelizäus <[email protected]>
Ensure that restricted certificates are mapped to IdentityTypeCertificateMetricsRestricted and vice versa. Signed-off-by: Julian Pelizäus <[email protected]>
Signed-off-by: Julian Pelizäus <[email protected]>
d20f3d1
to
eb99d83
Compare
Ok it was a display bug, it's still the right type in the database. |
eb99d83
to
b1345ea
Compare
Additionally check the responses if the endpoint gets queried using the project parameter as this is producing different results for restricted and unrestricted metrics certificates. Signed-off-by: Julian Pelizäus <[email protected]>
b1345ea
to
6503560
Compare
The test is now extended to check for the restriction level too. |
Fixes #13200.
Overview
This PR adds two new identity types
IdentityTypeCertificateMetricsRestricted
andIdentityTypeCertificateMetricsUnrestricted
. They effectively replaceIdentityTypeCertificateMetrics
which doesn't track if a metrics certificate is restricted or not.Additionally the respective DB patch is modified to map restricted certificates to their right identity type and vice versa.
Furthermore the logic for checking metrics samples is now modified to require looping the set of samples only once which should improve the performance.
Permissions
The following permissions checking is now in place when accessing the
/1.0/metrics
endpoint:/1.0/metrics
can_view_metrics
server
/1.0/metrics?project=xyz
can_view_metrics
project
/1.0/metrics
can_view_metrics
server
/1.0/metrics?project=xyz
can_view_metrics
project
*1: If the certificate is unrestricted also return the internal server metrics. This is too ensure backwards compatibility with what is in place currently (
5.20
).*2: This doesn't specifically check if there is another permission to allow viewing the projects instances. Having
can_view_metrics
on the project level supersedes any more fine grained entitlements on instances. If the user also hascan_view_metrics
onserver
return the internal server metrics too.