Track notifications from the Auth service #560

amrc-benmorrow · 2025-09-02T11:38:42Z

This is not fully tested yet.

Add an Auth interface to RxClient. This tracks ACLs via change-notify, caching and tracking the ACL for a given principal for half an hour. Calls to the existing Auth interface methods will now go via this cache instead of the HTTP request cache.

Update existing Auth clients to use RxClient. In most cases this is all that is required to support change-notify. In some cases additional updates to use the ConfigDB notify interface are also included.

amrc-benmorrow · 2025-10-01T15:20:14Z

Causes failures in cluster-manager on initial install. The failures are caused by subscriptions to the ConfigDB being refused. I thought this was to do with permissions being cached inappropriately in RxClient.Auth but this doesn't appear to be the cause. The failures go away once auth and configdb have been restarted.

amrc-benmorrow · 2025-10-03T14:25:47Z

The cause here is that adding a new principal UPN mapping is not causing the permissions for that principal to be correctly recalculated internally. This problem lies entirely inside the Auth service.

The ConfigDB is performing a WATCH on v2/acl/kerberos/sv1clustermanager. Initially this returns a 403 response as the sv1configdb UPN isn't mapped to an account yet. (This is why this only appears on initial install.) This causes the ConfigDB to respond 503 to cluster manager subscriptions, as we can't check ACLs. So far this is all correct. However when the sv1configdb account mapping is created (by service-setup, in this case, but krbkeys would be the same) the WATCH request should update with the ACL information, but instead it sends another 403 response.

This can be reliably reproduced by:

Creating an account with ReadACL permission.
Deleting the UPN mapping for that account.
Subscribing to WATCH v2/acl/....
Recreating the UPN mapping.

This overrides the core acl-fetch method in the Auth interface with one which subscribes to notifications. This gives us immediate updates when permissions change. We subscribe to the ACL for a given principal for half an hour; if we have no more requests in that time the subscription will be dropped.

This gives push authentication.

Code which has moved from acs-configdb to js-service-api was not moved across correctly.

Skip the Directory for now as that needs direct access to the raw ACL.

The Directory needs access to the raw ACL, so expose this as a public method. It's cleaner to override a public method in any case.

This should be more reliable as the Sparkplug notifications do not sync well with the HTTP caching.

The ServiceClient already had correct code to return a 503 when we got permission denied from the Auth service. Use this instead of overriding it. Switch to Response in place of Maybe as this preserves the HTTP response code.

I even created a sequence function especially for this purpose...

Rationalise our error returns in ServiceClient so that we only return ServiceError for a problem with the remote service. Ordinary parameter errors now throw other errors. This means that a ServiceError thrown during processing of a request should always invoke a 503 response. Remove the hack in Auth.fetch_acl to throw a ServiceError with 503 in.

The original logic here did not retry failed updates and left this to the 10-minute reconcile loop. Changing it to retry individual actions causes a retrying update to block all other update, including an update to delete the cluster. Group actions by cluster. Attach retry logic to each action and then switch through the actions for each cluster. This ensures we abandon a retrying action if we should be doing something else instead.

Instead of duck-typing use a proper class to distinguish between 'a service we contacted returned an error', 'we need to return a specific error' and other failures which happen to result in an error object with a `status` property.

amrc-benmorrow force-pushed the bmz/auth-rx-client branch 4 times, most recently from 91f04a0 to 0ca1421 Compare October 1, 2025 14:19

amrc-benmorrow added 17 commits October 17, 2025 15:18

Upgrade ConfigDB to RxClient

6bcca9d

This gives push authentication.

Refactoring bugs in PR#545

13c9e10

Code which has moved from acs-configdb to js-service-api was not moved across correctly.

Lint fixes

a57abc5

We need to keep local deps of local deps

f471d4a

Upgrade services to RxClient

b092675

Skip the Directory for now as that needs direct access to the raw ACL.

Upgrade Directory to RxClient

dff1245

The Directory needs access to the raw ACL, so expose this as a public method. It's cleaner to override a public method in any case.

Update cluster-manager to use RxClient

cb5cc6e

This should be more reliable as the Sparkplug notifications do not sync well with the HTTP caching.

Update package-lock

050c3b3

Correctly return 503 from ACL lookups

308c17c

The ServiceClient already had correct code to return a 503 when we got permission denied from the Auth service. Use this instead of overriding it. Switch to Response in place of Maybe as this preserves the HTTP response code.

Dumps can return 204

324b2f1

Track Kerberos identities correctly

a8054b1

I even created a sequence function especially for this purpose...

We do need to be able to throw a custom status

48d908d

Instead of duck-typing use a proper class to distinguish between 'a service we contacted returned an error', 'we need to return a specific error' and other failures which happen to result in an error object with a `status` property.

We need to use the quoted name

6a5d813

wip: cmdesc

7688799

amrc-benmorrow force-pushed the bmz/auth-rx-client branch from cc1fb6b to 7688799 Compare October 17, 2025 14:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Track notifications from the Auth service #560

Track notifications from the Auth service #560

Uh oh!

amrc-benmorrow commented Sep 2, 2025

Uh oh!

amrc-benmorrow commented Oct 1, 2025

Uh oh!

amrc-benmorrow commented Oct 3, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Track notifications from the Auth service #560

Are you sure you want to change the base?

Track notifications from the Auth service #560

Uh oh!

Conversation

amrc-benmorrow commented Sep 2, 2025

Uh oh!

amrc-benmorrow commented Oct 1, 2025

Uh oh!

amrc-benmorrow commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

amrc-benmorrow commented Oct 3, 2025 •

edited

Loading