[2/n] Silo-level max device token TTL #8214

david-crespo · 2025-05-24T20:52:38Z

Closes #2302. Built on top of #8137, but doesn't really rely on much in there. Implementing RFD 570.

Add silo_settings table with nullable device token expiration (seconds) column. Null setting means no max
- Also populate this table with null maxes for all existing silos
Follow behavior of quotas table: create silo settings entry on silo create, delete it on silo delete
View and update settings endpoints at /v1/settings with all necessary plumbing
- Following RFD 570, these are silo-scoped endpoints, only accessible to users inside the silo, as opposed to, for example, fleet operators being able to set it for any silo
Users don't set token expiration directly yet — at token create time, if the current silo has its max set, the token gets an expiration timestamp based on that max. Otherwise the token does not expire
Test that trying to set a negative or 0 TTL 400s
Fix authz tests and verify endpoint tests
Bikeshed endpoint paths, operation IDs, property names, etc

david-crespo · 2025-05-25T16:00:52Z

nexus/types/src/external_api/params.rs

+// and differently for quotas. Maybe the best thing would be to make them all
+// non-nullable on SiloQuotasUpdate. I vaguely remember the latter being the
+// direction we wanted to go in general anyway. Can't find the issue where it
+// was discussed.


something to think about

david-crespo · 2025-05-25T16:02:07Z

nexus/src/app/silo.rs

+        // TODO: modify seems fine, but look into why policy has its own
+        // separate permission
+        let (.., authz_silo) =
+            silo_lookup.lookup_for(authz::Action::Modify).await?;


I was imitating the policy endpoint, but I don't see why we would need to do an auth check here if the datastore function does the same check.

david-crespo · 2025-05-25T16:27:34Z

nexus/tests/config.test.toml

 # List of authentication schemes to support.
 [authn]
-schemes_external = ["spoof", "session_cookie"]
+schemes_external = ["spoof", "session_cookie", "access_token"]


This is necessary to make tokens actually work in the integration tests, so we can test that they work and that they expire.

ahl

If I understand correctly, a silo administrator can modify the token expiration for their own silo, but a central admin could not modify the policy for another silo. And there's no way to set a policy that would apply (e.g. by default) to all silos (existing and new). This seems like a decision worth validating with prospective users. For example, if I were operating the control plane, responsible for the product at large, I would want to be able to set a policy that applied to all silos.

The name "settings" is very general. I would suggest far too general. I'd suggest more specific, discoverable naming.

What configuration happens at the silo-level today? What do we imagine happening in the future? With the proposed API, it seems very easy to accidentally remove token expiration: oxide settings --some-other-setting 7 -- since device_token_max_ttl_seconds is optional, I think that accidentally remove that setting.

As an administrator, I would expect to be able to create a silo with an expiration policy (i.e. atomically). That doesn't seem to be possible.

david-crespo · 2025-05-27T18:35:09Z

The solution we've discussed for fleet admins setting silo-level TTLs (not yet incorporated into the RFD) is setting a fleet-level setting and then allowing or disallowing each silo's admins to override it. That is unlikely to make it into v15 (though it's possible) but it seems reasonable for v16.

Will work on the other stuff.

david-crespo · 2025-05-29T23:05:33Z

Here's what I'm thinking.

`/v1/settings` too generic

I'd like to change it but I'm having a hard time coming up with alternatives. The only setting right now is token TTL. The plan was to also add configurable web session idle and absolute TTL, but that may be less important so we might not do it right away. I can't think of any other settings of this kind we will want to add. /v1/auth-settings is vague and weird. /v1/token-settings doesn't fit well with session TTLs. The fact that this naming is so hard is some kind of technical design smell.

I don't love that it doesn't make clear that it's silo-scoped, but that's how our other silo-scoped endpoints look:

Optional TTL field is too easy to accidentally unset

We discussed this in chat, @ahl is going to add a Nullable type that we can use in place of Option that will let us get a nullable field that is required to be present.

Fleet admin can't change silo settings or create silo with setting

We could add this to silo create pretty easily, but I think I want to punt on adding endpoints that would let the fleet admin change these settings because a) they can make it work as silo admins even though it's unwieldy, and b) I don't know how the fleet admin API should look. Like I said, it may not just look like them updating the setting directly — maybe we want to let them set defaults that can be overridden by silo admins.

ahl · 2025-05-30T18:16:24Z

Here's what I'm thinking.

/v1/settings too generic

I'd like to change it but I'm having a hard time coming up with alternatives.

I think we need something that indicates the entity to which the settings apply i.e. the silo.

Optional TTL field is too easy to accidentally unset

We discussed this in chat, @ahl is going to add a Nullable type that we can use in place of Option that will let us get a nullable field that is required to be present.

I should have asked this as we were figuring the Nullable thing, but let's also think about how this manifests in the CLI. We don't want it to be easy there to unset this value either! Alternatively: is this the right API? Do we want PATCH? Do we want a different API for each "setting"?

Fleet admin can't change silo settings or create silo with setting

We could add this to silo create pretty easily, but I think I want to punt on adding endpoints that would let the fleet admin change these settings because a) they can make it work as silo admins even though it's unwieldy, and b) I don't know how the fleet admin API should look. Like I said, it may not just look like them updating the setting directly — maybe we want to let them set defaults that can be overridden by silo admins.

Have we validated how customers want to use this?

david-crespo · 2025-05-31T00:32:46Z

The 10 other silo-scoped things in my screenshot don't say silo either, so I don't want to get out of step with those. I'm thinking /v1/auth-settings and then maybe in the next release we should move them all under /v1/silo/...?
It should be required in the CLI too. We often talk about PATCH but I don't think this PR is a good time to figure it out
@qdzlug said regarding my summary (copied below) of what I have and what I'm punting on: "I’m pretty sure that will work as a good start for customers." He has been engaged with the two customers who want this feature most urgently. At the end of the summary I mention the specifics they are interested in.

token expiration MVP stack is ready for review. it's roughly an implementation of part of RFD 570 though I'm deviating a bit and I plan to update the RFD to match the implementation. these are aiming to satisfy some hard requirements from at least two customers

Add id column on tokens and sessions #8137

Silo level setting for nullable max token TTL #8214

Users can list and delete their own tokens #8227

User can optionally request a specific TTL at token create time #8231

a couple of notable missing features that I'm hoping to squeak by without in v15. I could be persuaded to add them, but I would rather punt on deciding what the APIs for this look like until v16

only silo admin can set max TTL on their own silo — no operator-level access to this setting yet

users can only delete their own tokens — silo admin cannot list and expire other users' tokens yet

It's not ideal, but I think this should be enough to unblock the two customers, one of whom is more interested in the silo-level max TTL and the other is more interested in per-token short TTLs for CI.

ahl · 2025-05-31T05:08:39Z

The 10 other silo-scoped things in my screenshot don't say silo either, so I don't want to get out of step with those. I'm thinking /v1/auth-settings and then maybe in the release we should move them all under /v1/silo/...?

auth-settings is a helpful clarification

It should be required in the CLI too. We often talk about PATCH but I don't think this PR is a good time to figure it out

Agreed. I'm not sure what progenitor does with this. Certainly it's not going to do the right thing in the SDK... sadly. I think I should make the Nullable thing into its own tiny crate that progenitor can rely on.

@qdzlug said regarding my summary (copied below) of what I have and what I'm punting on: "I’m pretty sure that will work as a good start for customers." He has been engaged with the two customers who want this feature most urgently.

Understood.

nexus/db-model/src/silo_auth_settings.rs

nexus/types/src/external_api/params.rs

nexus/types/src/external_api/views.rs

nexus/src/app/silo.rs

nexus/tests/integration_tests/device_auth.rs

david-crespo · 2025-06-03T21:21:51Z

nexus/types/src/external_api/views.rs

-    pub device_token_max_ttl_seconds: Option<i64>,
+    /// Maximum lifetime of a device token in seconds. If set to null, users
+    /// will be able to create tokens that do not expire.
+    pub device_token_max_ttl_seconds: Option<u32>,


Went with u32 here because it feels maybe overkill to have to validate that this is NonZero when it comes out of the DB.

david-crespo · 2025-06-03T21:22:12Z

Got everything in df0fae8

Built on top of #8137 and #8214. This is only for a user to list and delete their own tokens. It doesn't quite match RFD 570, which says `/v1/device-tokens` instead of `/v1/me/access-tokens`, but it feels good under `/v1/me`, and after trying to make the UI too, I think "access tokens" is much more intuitive. If I stick with this, I will update RFD 570 to match. ~~I'm not sure about the path `/v1/device-tokens` — in the API we call them `Device Access Tokens`. I think `/v1/access-tokens` might be more intuitive because the `device` is sort of an implementation detail, it refers to the OAuth device auth flow, which we are using. In practice, the user just gets a token with the CLI and pastes a code into the web UI and they don't have to think too much about it, so exposing that detail in the name might not be worth it.~~ Went with `/v1/me/access-tokens`. - [x] Basic token list and delete - [x] Basic integration tests - [x] Finalize endpoint paths - [x] Figure out authz story - Went with restricting datastore functions to current actor for now

Closes #8147. Built on #8137, #8214, and #8227. This is pretty straightforward, I think. The user gives us a TTL in seconds at token request time. We store it on the request row. When they come back in the later step to confirm the code and generate the token, we retrieve the TTL, validate that it is less than the silo max (if one is set), and we use it to generate the `time_expires` timestamp, which cannot be changed later. One slightly surprising bit is that we can't validate the TTL against the silo max at initial request time because we don't have any idea what silo the user is associated with until the confirm step. So probably want to make sure we are handling TTL validation errors nicely in the web console, because I think that's where they will show up.

…e required to be present (#9046) Optional fields on a request body can be left out completely from the JSON. In the case of instance update (and probably many other endpoints) this is a problem because passing `null` and passing nothing have the same effect, namely unsetting that field if it is already set in the DB. And because the fields are optional, the generated client types do not enforce that they are present (at least when creating the JSON by hand and also in TypeScript, which distinguishes between optional and nullable at the type level). It is easy to do this by accident: while working on [this](oxidecomputer/console#2900), I [discovered a bug](oxidecomputer/console#2900 (comment)) in the console where we accidentally unset the auto restart policy on an instance when you resize an instance. This PR changes `Option` to `Nullable`, which I added in #8214 to fix this problem: `null` remains a valid value, but the client is no longer allowed to leave that field out of the body. If we don't want to do this, I can live with it because in the console, I have [made a helper](oxidecomputer/console@60ca75e) that enforces that all fields are explicitly present. But I do think the status quo is very error-prone. https://github.com/oxidecomputer/omicron/blob/c6989117c72ecf88d4e3dc8a597d9fe703152a93/common/src/api/external/mod.rs#L3564-L3570

david-crespo changed the title ~~Silo-level max token expiration~~ Silo-level max device token TTL May 24, 2025

david-crespo changed the title ~~Silo-level max device token TTL~~ [2/n] Silo-level max device token TTL May 24, 2025

david-crespo marked this pull request as ready for review May 25, 2025 15:56

david-crespo commented May 25, 2025

View reviewed changes

david-crespo requested a review from ahl May 27, 2025 17:04

ahl reviewed May 27, 2025

View reviewed changes

This was referenced May 27, 2025

[3/n] /v1/me/access-tokens list and delete #8227

Merged

[4/n] Let user request a specific TTL for a token #8231

Merged

david-crespo force-pushed the token-session-ids branch from 6fbe688 to a4d0d74 Compare May 30, 2025 16:33

david-crespo force-pushed the silo-token-settings branch from f68b227 to 3c36e26 Compare May 30, 2025 16:56

david-crespo force-pushed the token-session-ids branch from a4d0d74 to 6f16cfe Compare May 31, 2025 00:00

david-crespo force-pushed the silo-token-settings branch from 3c36e26 to 26d25ab Compare May 31, 2025 00:00

david-crespo force-pushed the silo-token-settings branch 2 times, most recently from d2fc15e to 14c544c Compare June 3, 2025 00:01

Base automatically changed from token-session-ids to main June 3, 2025 16:04

david-crespo added 8 commits June 3, 2025 11:05

add silo settings table with max token ttl

b03b0a0

make token expiration do something

791d946

silo-scoped silo settings endpoints

a4456aa

sonnet 4 generated a full working test from the stub

072e934

exercise get silo settings endpoint

a82ad21

migration to create settings entries for existing silos

ac70bc4

disallow 0 max ttl

ece0895

add endpoints to unauthorized test

09df45d

david-crespo added 6 commits June 3, 2025 11:09

Nullable<T> for nullable but not optional, fix setting null

7dd0f13

better comments on Nullable

329fe8e

/v1/settings -> /v1/auth-settings

efa26e7

global rename silo_settings -> silo_auth_settings

b520a82

cargo fmt ugh

5d8f01f

SiloSettings -> SiloAuthSettings

4dd9321

david-crespo requested a review from hawkw June 3, 2025 17:11

david-crespo force-pushed the silo-token-settings branch from 14c544c to 4dd9321 Compare June 3, 2025 17:19

hawkw reviewed Jun 3, 2025

View reviewed changes

use SqlU32 for model, remove unnecessary schemars annotation

df0fae8

david-crespo commented Jun 3, 2025

View reviewed changes

hawkw approved these changes Jun 3, 2025

View reviewed changes

david-crespo enabled auto-merge (squash) June 3, 2025 21:46

david-crespo merged commit fc3244d into main Jun 4, 2025
17 checks passed

david-crespo deleted the silo-token-settings branch June 4, 2025 01:34

david-crespo mentioned this pull request Jul 3, 2025

Add effective fleet and silo role to /v1/me response #8515

Closed

david-crespo mentioned this pull request Sep 18, 2025

InstanceUpdate body: change Option fields to Nullable so they're required to be present #9046

Merged

[2/n] Silo-level max device token TTL #8214

[2/n] Silo-level max device token TTL #8214

Uh oh!

Conversation

david-crespo commented May 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

david-crespo May 25, 2025

Choose a reason for hiding this comment

Uh oh!

david-crespo May 25, 2025

Choose a reason for hiding this comment

Uh oh!

david-crespo May 25, 2025

Choose a reason for hiding this comment

Uh oh!

ahl left a comment

Choose a reason for hiding this comment

Uh oh!

david-crespo commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

david-crespo commented May 29, 2025

/v1/settings too generic

Optional TTL field is too easy to accidentally unset

Fleet admin can't change silo settings or create silo with setting

Uh oh!

ahl commented May 30, 2025

/v1/settings too generic

Optional TTL field is too easy to accidentally unset

Fleet admin can't change silo settings or create silo with setting

Uh oh!

david-crespo commented May 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ahl commented May 31, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

david-crespo Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

david-crespo commented Jun 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

david-crespo commented May 24, 2025 •

edited

Loading

david-crespo commented May 27, 2025 •

edited

Loading

`/v1/settings` too generic

`/v1/settings` too generic

david-crespo commented May 31, 2025 •

edited

Loading