Skip to content

Conversation

@david-crespo
Copy link
Contributor

@david-crespo david-crespo commented May 24, 2025

Closes #2302. Built on top of #8137, but doesn't really rely on much in there. Implementing RFD 570.

  • Add silo_settings table with nullable device token expiration (seconds) column. Null setting means no max
    • Also populate this table with null maxes for all existing silos
  • Follow behavior of quotas table: create silo settings entry on silo create, delete it on silo delete
  • View and update settings endpoints at /v1/settings with all necessary plumbing
    • Following RFD 570, these are silo-scoped endpoints, only accessible to users inside the silo, as opposed to, for example, fleet operators being able to set it for any silo
  • Users don't set token expiration directly yet — at token create time, if the current silo has its max set, the token gets an expiration timestamp based on that max. Otherwise the token does not expire
  • Test that trying to set a negative or 0 TTL 400s
  • Fix authz tests and verify endpoint tests
  • Bikeshed endpoint paths, operation IDs, property names, etc

@david-crespo david-crespo changed the title Silo-level max token expiration Silo-level max device token TTL May 24, 2025
@david-crespo david-crespo changed the title Silo-level max device token TTL [2/n] Silo-level max device token TTL May 24, 2025
@david-crespo david-crespo marked this pull request as ready for review May 25, 2025 15:56
// and differently for quotas. Maybe the best thing would be to make them all
// non-nullable on SiloQuotasUpdate. I vaguely remember the latter being the
// direction we wanted to go in general anyway. Can't find the issue where it
// was discussed.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something to think about

// TODO: modify seems fine, but look into why policy has its own
// separate permission
let (.., authz_silo) =
silo_lookup.lookup_for(authz::Action::Modify).await?;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was imitating the policy endpoint, but I don't see why we would need to do an auth check here if the datastore function does the same check.

# List of authentication schemes to support.
[authn]
schemes_external = ["spoof", "session_cookie"]
schemes_external = ["spoof", "session_cookie", "access_token"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is necessary to make tokens actually work in the integration tests, so we can test that they work and that they expire.

@david-crespo david-crespo requested a review from ahl May 27, 2025 17:04
Copy link
Contributor

@ahl ahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, a silo administrator can modify the token expiration for their own silo, but a central admin could not modify the policy for another silo. And there's no way to set a policy that would apply (e.g. by default) to all silos (existing and new). This seems like a decision worth validating with prospective users. For example, if I were operating the control plane, responsible for the product at large, I would want to be able to set a policy that applied to all silos.

The name "settings" is very general. I would suggest far too general. I'd suggest more specific, discoverable naming.

What configuration happens at the silo-level today? What do we imagine happening in the future? With the proposed API, it seems very easy to accidentally remove token expiration: oxide settings --some-other-setting 7 -- since device_token_max_ttl_seconds is optional, I think that accidentally remove that setting.

As an administrator, I would expect to be able to create a silo with an expiration policy (i.e. atomically). That doesn't seem to be possible.

@david-crespo
Copy link
Contributor Author

david-crespo commented May 27, 2025

The solution we've discussed for fleet admins setting silo-level TTLs (not yet incorporated into the RFD) is setting a fleet-level setting and then allowing or disallowing each silo's admins to override it. That is unlikely to make it into v15 (though it's possible) but it seems reasonable for v16.

Will work on the other stuff.

@david-crespo
Copy link
Contributor Author

Here's what I'm thinking.

/v1/settings too generic

I'd like to change it but I'm having a hard time coming up with alternatives. The only setting right now is token TTL. The plan was to also add configurable web session idle and absolute TTL, but that may be less important so we might not do it right away. I can't think of any other settings of this kind we will want to add. /v1/auth-settings is vague and weird. /v1/token-settings doesn't fit well with session TTLs. The fact that this naming is so hard is some kind of technical design smell.

I don't love that it doesn't make clear that it's silo-scoped, but that's how our other silo-scoped endpoints look:

image

Optional TTL field is too easy to accidentally unset

We discussed this in chat, @ahl is going to add a Nullable type that we can use in place of Option that will let us get a nullable field that is required to be present.

Fleet admin can't change silo settings or create silo with setting

We could add this to silo create pretty easily, but I think I want to punt on adding endpoints that would let the fleet admin change these settings because a) they can make it work as silo admins even though it's unwieldy, and b) I don't know how the fleet admin API should look. Like I said, it may not just look like them updating the setting directly — maybe we want to let them set defaults that can be overridden by silo admins.

@ahl
Copy link
Contributor

ahl commented May 30, 2025

Here's what I'm thinking.

/v1/settings too generic

I'd like to change it but I'm having a hard time coming up with alternatives.

I think we need something that indicates the entity to which the settings apply i.e. the silo.

Optional TTL field is too easy to accidentally unset

We discussed this in chat, @ahl is going to add a Nullable type that we can use in place of Option that will let us get a nullable field that is required to be present.

I should have asked this as we were figuring the Nullable thing, but let's also think about how this manifests in the CLI. We don't want it to be easy there to unset this value either! Alternatively: is this the right API? Do we want PATCH? Do we want a different API for each "setting"?

Fleet admin can't change silo settings or create silo with setting

We could add this to silo create pretty easily, but I think I want to punt on adding endpoints that would let the fleet admin change these settings because a) they can make it work as silo admins even though it's unwieldy, and b) I don't know how the fleet admin API should look. Like I said, it may not just look like them updating the setting directly — maybe we want to let them set defaults that can be overridden by silo admins.

Have we validated how customers want to use this?

@david-crespo
Copy link
Contributor Author

david-crespo commented May 31, 2025

  1. The 10 other silo-scoped things in my screenshot don't say silo either, so I don't want to get out of step with those. I'm thinking /v1/auth-settings and then maybe in the next release we should move them all under /v1/silo/...?
  2. It should be required in the CLI too. We often talk about PATCH but I don't think this PR is a good time to figure it out
  3. @qdzlug said regarding my summary (copied below) of what I have and what I'm punting on: "I’m pretty sure that will work as a good start for customers." He has been engaged with the two customers who want this feature most urgently. At the end of the summary I mention the specifics they are interested in.

token expiration MVP stack is ready for review. it's roughly an implementation of part of RFD 570 though I'm deviating a bit and I plan to update the RFD to match the implementation. these are aiming to satisfy some hard requirements from at least two customers

  1. Add id column on tokens and sessions #8137
  2. Silo level setting for nullable max token TTL #8214
  3. Users can list and delete their own tokens #8227
  4. User can optionally request a specific TTL at token create time #8231

a couple of notable missing features that I'm hoping to squeak by without in v15. I could be persuaded to add them, but I would rather punt on deciding what the APIs for this look like until v16

  1. only silo admin can set max TTL on their own silo — no operator-level access to this setting yet
  2. users can only delete their own tokens — silo admin cannot list and expire other users' tokens yet

It's not ideal, but I think this should be enough to unblock the two customers, one of whom is more interested in the silo-level max TTL and the other is more interested in per-token short TTLs for CI.

@ahl
Copy link
Contributor

ahl commented May 31, 2025

  1. The 10 other silo-scoped things in my screenshot don't say silo either, so I don't want to get out of step with those. I'm thinking /v1/auth-settings and then maybe in the release we should move them all under /v1/silo/...?

auth-settings is a helpful clarification

  1. It should be required in the CLI too. We often talk about PATCH but I don't think this PR is a good time to figure it out

Agreed. I'm not sure what progenitor does with this. Certainly it's not going to do the right thing in the SDK... sadly. I think I should make the Nullable thing into its own tiny crate that progenitor can rely on.

  1. @qdzlug said regarding my summary (copied below) of what I have and what I'm punting on: "I’m pretty sure that will work as a good start for customers." He has been engaged with the two customers who want this feature most urgently.

Understood.

@david-crespo david-crespo force-pushed the silo-token-settings branch 2 times, most recently from d2fc15e to 14c544c Compare June 3, 2025 00:01
Base automatically changed from token-session-ids to main June 3, 2025 16:04
@david-crespo david-crespo requested a review from hawkw June 3, 2025 17:11
@david-crespo david-crespo force-pushed the silo-token-settings branch from 14c544c to 4dd9321 Compare June 3, 2025 17:19
pub device_token_max_ttl_seconds: Option<i64>,
/// Maximum lifetime of a device token in seconds. If set to null, users
/// will be able to create tokens that do not expire.
pub device_token_max_ttl_seconds: Option<u32>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went with u32 here because it feels maybe overkill to have to validate that this is NonZero when it comes out of the DB.

@david-crespo
Copy link
Contributor Author

Got everything in df0fae8

@david-crespo david-crespo enabled auto-merge (squash) June 3, 2025 21:46
@david-crespo david-crespo merged commit fc3244d into main Jun 4, 2025
17 checks passed
@david-crespo david-crespo deleted the silo-token-settings branch June 4, 2025 01:34
david-crespo added a commit that referenced this pull request Jun 4, 2025
Built on top of #8137 and #8214.

This is only for a user to list and delete their own tokens. It doesn't
quite match RFD 570, which says `/v1/device-tokens` instead of
`/v1/me/access-tokens`, but it feels good under `/v1/me`, and after
trying to make the UI too, I think "access tokens" is much more
intuitive. If I stick with this, I will update RFD 570 to match.

~~I'm not sure about the path `/v1/device-tokens` — in the API we call
them `Device Access Tokens`. I think `/v1/access-tokens` might be more
intuitive because the `device` is sort of an implementation detail, it
refers to the OAuth device auth flow, which we are using. In practice,
the user just gets a token with the CLI and pastes a code into the web
UI and they don't have to think too much about it, so exposing that
detail in the name might not be worth it.~~ Went with
`/v1/me/access-tokens`.

- [x] Basic token list and delete
- [x] Basic integration tests
- [x] Finalize endpoint paths
- [x] Figure out authz story
  - Went with restricting datastore functions to current actor for now
david-crespo added a commit that referenced this pull request Jun 4, 2025
Closes #8147.

Built on #8137, #8214, and #8227.

This is pretty straightforward, I think. The user gives us a TTL in
seconds at token request time. We store it on the request row. When they
come back in the later step to confirm the code and generate the token,
we retrieve the TTL, validate that it is less than the silo max (if one
is set), and we use it to generate the `time_expires` timestamp, which
cannot be changed later.

One slightly surprising bit is that we can't validate the TTL against
the silo max at initial request time because we don't have any idea what
silo the user is associated with until the confirm step. So probably
want to make sure we are handling TTL validation errors nicely in the
web console, because I think that's where they will show up.
david-crespo added a commit that referenced this pull request Sep 18, 2025
…e required to be present (#9046)

Optional fields on a request body can be left out completely from the
JSON. In the case of instance update (and probably many other endpoints)
this is a problem because passing `null` and passing nothing have the
same effect, namely unsetting that field if it is already set in the DB.
And because the fields are optional, the generated client types do not
enforce that they are present (at least when creating the JSON by hand
and also in TypeScript, which distinguishes between optional and
nullable at the type level). It is easy to do this by accident: while
working on [this](oxidecomputer/console#2900), I
[discovered a
bug](oxidecomputer/console#2900 (comment))
in the console where we accidentally unset the auto restart policy on an
instance when you resize an instance.

This PR changes `Option` to `Nullable`, which I added in #8214 to fix
this problem: `null` remains a valid value, but the client is no longer
allowed to leave that field out of the body. If we don't want to do
this, I can live with it because in the console, I have [made a
helper](oxidecomputer/console@60ca75e)
that enforces that all fields are explicitly present. But I do think the
status quo is very error-prone.


https://github.com/oxidecomputer/omicron/blob/c6989117c72ecf88d4e3dc8a597d9fe703152a93/common/src/api/external/mod.rs#L3564-L3570
charliepark pushed a commit that referenced this pull request Sep 19, 2025
…e required to be present (#9046)

Optional fields on a request body can be left out completely from the
JSON. In the case of instance update (and probably many other endpoints)
this is a problem because passing `null` and passing nothing have the
same effect, namely unsetting that field if it is already set in the DB.
And because the fields are optional, the generated client types do not
enforce that they are present (at least when creating the JSON by hand
and also in TypeScript, which distinguishes between optional and
nullable at the type level). It is easy to do this by accident: while
working on [this](oxidecomputer/console#2900), I
[discovered a
bug](oxidecomputer/console#2900 (comment))
in the console where we accidentally unset the auto restart policy on an
instance when you resize an instance.

This PR changes `Option` to `Nullable`, which I added in #8214 to fix
this problem: `null` remains a valid value, but the client is no longer
allowed to leave that field out of the body. If we don't want to do
this, I can live with it because in the console, I have [made a
helper](oxidecomputer/console@60ca75e)
that enforces that all fields are explicitly present. But I do think the
status quo is very error-prone.


https://github.com/oxidecomputer/omicron/blob/c6989117c72ecf88d4e3dc8a597d9fe703152a93/common/src/api/external/mod.rs#L3564-L3570
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Expiration for device authn access tokens

4 participants