add named role support #202

mikemrm · 2023-11-27T22:17:24Z

This adds support for giving roles names, along with tracking additional data such as the creator, created at and updated at.

To support this, a small database package has been created to handle interacting with the backend database and simplifying changes.

In addition to the database package, the http api has changed to support these new features.

All endpoints which return a role will now return the new information if such details exist.
POST /api/v1/resources/:id/roles now requires a name to be provided.
PATCH /api/v1/roles/:id has been added with support for updating an existing roles name and actions.

jnschaeffer

Dropping some early comments here regarding the Helm chart. I'll get to the code next.

jnschaeffer · 2023-12-01T21:31:05Z

chart/permissions-api/templates/job-migrate-database.yaml

@@ -0,0 +1,70 @@
+{{- if has .Values.config.crdb.migrateHook (list "pre-sync" "manual") }}


Do we anticipate needing or wanting to support these three kinds of migration workflows?

manual is perhaps not needed. but pre-sync and init are useful.

pre-sync can't be used during the initial deployment as it will rely on configs and secrets that have to be synced first, and therefore won't be available. so you must use the init method. but after using the init method, you can switch to the pre-sync method which will ensure migrations only execute once per sync.

if having these options is not preferred we could get rid of it and just use init containers.

jnschaeffer · 2023-12-01T21:31:39Z

chart/permissions-api/templates/job-migrate-database.yaml

+  {{- else }}
+  generateName: migrate-database-
+  annotations:
+    argocd.argoproj.io/hook: PreSync


In general I think it's good to not assume any user of this is using Argo specifically.

jnschaeffer · 2023-12-01T21:38:24Z

chart/permissions-api/values.yaml

+    # migrateHook sets when to run database migrations. one of: pre-sync, init, manual
+    #  - pre-sync: hook runs as a job before any other changes are synced.
+    #  - init: is run as an init container to the server deployment and may run multiple times if replica count is high.
+    #  - manual: a migrate-database job will be available to triggered manually
+    migrateHook: "init"


Same comment re: Argo. I do like the concept, though!

jnschaeffer · 2023-12-01T21:38:54Z

chart/permissions-api/values.yaml

+    # password is the auth password to the database
+    password: ""


This seems like something we shouldn't even let users set manually, right?

I usually like to expose all available configuration options, which this is an option that can be set in the crdbx config which is why I added it here.

jnschaeffer · 2023-12-01T21:39:14Z

chart/permissions-api/values.yaml

+    # uri is the raw uri connection string
+    uri: ""


Same here re: sensitive data.

re same as above

jnschaeffer

Some thoughts.

db/migrations.go

internal/types/types.go

internal/database/db.go

internal/query/relations.go

jnschaeffer · 2023-12-04T18:18:29Z

internal/query/relations.go

+	dbTx, err := e.db.DeleteRoleTransaction(ctx, roleResource.ID)
+	if err != nil {
+		// If the role doesn't exist, simply ignore.
+		if !errors.Is(err, database.ErrNoRoleFound) {
+			return err
+		}
+	} else {
+		// Setup rollback in case an error occurs before we commit.
+		defer dbTx.Rollback() //nolint:errcheck // error is logged in function
+	}
+
 	for _, filter := range filters {
 		if err = e.deleteRelationships(ctx, filter); err != nil {
-			return fmt.Errorf("failed to delete role action %s: %w", filter.OptionalResourceId, err)
+			err = fmt.Errorf("failed to delete role action %s: %w", filter.OptionalResourceId, err)
+
+			span.RecordError(err)
+			span.SetStatus(codes.Error, err.Error())
+
+			return err
+		}
+	}


What happens if I attempt to update a role with new actions while it's being deleted? Reading this code, it looks possible that I wind up with relationships in SpiceDB but no corresponding CRDB entries. This might also, ironically, require a SELECT ... FOR UPDATE.

this is now handled properly as we always have a transaction open before on the database. if two changes came in at the same time, one would have to wait until the database released the row.

jnschaeffer

I haven't looked at the tests yet but this looks broadly good. Some minor points below.

jnschaeffer · 2023-12-11T20:22:26Z

internal/api/roles.go

+	// check on the role resource.
+	resource, err := r.engine.GetRoleResource(ctx, roleResource)
+	if err != nil {
+		return echo.NewHTTPError(http.StatusBadRequest, "error getting resource").SetInternal(err)


Does this make sense as a 400? If the resource wasn't found, we probably want to return a 404, and if something else failed then to return a 500.

jnschaeffer · 2023-12-11T20:25:31Z

internal/api/types.go

 	Actions []string        `json:"actions"`
+
+	ResourceID gidx.PrefixedID `json:"resource_id,omitempty"`
+	Creator    gidx.PrefixedID `json:"creator"`


I saw the earlier comment around this got resolved but it looks like I missed this one. As a side note, we could also make this CreatedBy/created_by instead; Creator/creator just feels off.

We also probably want a field denoting who last updated the role too.

whoops, good catch.

jnschaeffer · 2023-12-11T20:26:50Z

internal/query/mock/mock.go

 	}

 	return role, nil
 }

+// UpdateRole returns nothing but satisfies the Engine interface.
+func (e *Engine) UpdateRole(ctx context.Context, actor, roleResource types.Resource, newName string, newActions []string) (types.Role, error) {
+	return types.Role{}, nil


I'm increasingly wary of mock interfaces (even though I'm pretty sure I wrote this one), but for the sake of completeness it likely makes sense to return the updated fields here.

yeah fair nuff. i'll update this to use the mock support for returning values.

internal/query/relations.go

jnschaeffer · 2023-12-11T20:32:00Z

internal/query/relations.go

+	// If the name is the same, then clear the changed name.
+	if role.Name == newName {
+		newName = ""
+	}


We can probably just keep the name and pass it right through. It saves us a little complexity and ultimately almost all of the time is going to be spent in the DB regardless.

good point. i'll update it to instead, if no name is provided (for example, when just updating actions) it will populate the name from the existing record so we can keep the query simple.

jnschaeffer · 2023-12-11T20:33:40Z

internal/query/relations.go

+	if err := e.store.CommitContext(dbCtx); err != nil {
+		span.RecordError(err)
+		span.SetStatus(codes.Error, err.Error())
+
+		logRollbackErr(e.logger, e.store.RollbackContext(dbCtx))
+
+		return types.Role{}, err
+	}


Same question here re: rolling back SpiceDB relationships. I'm not saying we need to roll them back, but behavior here should be defined, preferably as comments.

internal/storage/context.go

jnschaeffer · 2023-12-11T20:39:27Z

internal/storage/errors.go

+
+func pqIsRoleAlreadyExistsError(err error) bool {
+	if pqErr, ok := err.(*pq.Error); ok {
+		return pqErr.Code.Name() == "unique_violation" && pqErr.Constraint == "roles_pkey"


I'm a little suspicious of tying our error detection logic to constraint names in the DB. Is there a better way to do this?

I'm not sure of a better way to handle error detection. Since we own the database and migrations, we know the names of the indexes. We could make these more configurable or make them discovered on start but I'm not sure that's necessary at this point.

Alternatively we could just return less descriptive errors but I would like to avoid that.

jnschaeffer · 2023-12-11T20:39:36Z

internal/storage/errors.go

+
+func pqIsRoleNameTakenError(err error) bool {
+	if pqErr, ok := err.(*pq.Error); ok {
+		return pqErr.Code.Name() == "unique_violation" && pqErr.Constraint == "roles_resource_id_name"


Same here re: constraints.

jnschaeffer · 2023-12-11T21:11:42Z

internal/storage/roles.go

+	// If no name is provided, only update the updated_at timestamp.
+	if name == "" {
+		row = tx.QueryRowContext(ctx, `
+			UPDATE roles SET updated_at = now() WHERE id = $1
+			RETURNING id, name, resource_id, creator_id, created_at, updated_at
+		`, roleID.String())
+	} else {
+		row = tx.QueryRowContext(ctx, `
+			UPDATE roles SET name = $1, updated_at = now() WHERE id = $2
+			RETURNING id, name, resource_id, creator_id, created_at, updated_at
+			`, name, roleID.String(),
+		)
+	}


As before, we can probably make this less complex by just accepting whatever role name we get.

senior · 2023-12-13T16:57:37Z

internal/query/relations.go

+
+		logRollbackErr(e.logger, e.store.RollbackContext(dbCtx))
+
+		// TODO: add spicedb rollback logic.


What happens without this?

The role actions are persisted (and will be visible to the user), but role metadata changes are not. While we could attempt to roll back the changes to SpiceDB, that operation itself could fail, meaning we would be in the same situation regardless.

We should probably spell that out here in the code, though.

Thanks for that clarification, I'll get a version of this added to the code.

jnschaeffer

Logic looks good - mostly some notes on documentation and tests.

jnschaeffer · 2023-12-13T18:23:11Z

internal/query/relations.go

+	old := make(map[string]bool, len(oldActions))
+	new := make(map[string]bool, len(newActions))


No need to change this, but if you want to save a few bytes on the heap, struct{} can be used here along with the _, ok := myMap["some_key"] pattern.

jnschaeffer · 2023-12-13T18:29:50Z

internal/query/relations.go

+
+		logRollbackErr(e.logger, e.store.RollbackContext(dbCtx))
+
+		// TODO: add spicedb rollback logic.


The role actions are persisted (and will be visible to the user), but role metadata changes are not. While we could attempt to roll back the changes to SpiceDB, that operation itself could fail, meaning we would be in the same situation regardless.

We should probably spell that out here in the code, though.

jnschaeffer · 2023-12-13T18:31:30Z

internal/query/relations.go

 	relationships, err := e.readRelationships(ctx, filter)
 	if err != nil {
 		return nil, err
 	}

 	out := relationshipsToRoles(relationships)

+	rolesByID := make(map[string]storage.Role, len(dbRoles))


Maybe a silly question, but why not just have a map[gidx.PrefixedID]storage.Role if we just convert every key to a string?

🤔 that's a good question... i'm not sure why I did that... It definitely doesn't need to be converted. Thanks!

jnschaeffer · 2023-12-13T18:34:31Z

internal/query/relations.go

+		span.RecordError(err)
+		span.SetStatus(codes.Error, err.Error())
+
+		logRollbackErr(e.logger, e.store.RollbackContext(dbCtx))


We should probably document what happens here too if this operation fails. The delete from the DB will fail and be rolled back, but the role will have no associated actions, which is probably fine.

That said, it looks like maybe roles can be deleted even if they have assignments? If that's the case, that's not great, but let's solve that in a follow-up PR rather than make this one bigger since it's not new behavior.

jnschaeffer · 2023-12-13T18:35:29Z

internal/query/relations_test.go

+			Name:  "UpdateMissingRole",
+			Input: gidx.MustNewID(RolePrefix),
+			CheckFn: func(ctx context.Context, t *testing.T, res testingx.TestResult[types.Role]) {
+				assert.Error(t, res.Err)


We probably want to check what the error is here, and not just that an error occurred.

jnschaeffer · 2023-12-13T18:36:04Z

internal/query/relations_test.go

+			Input: role.ID,
+			CheckFn: func(ctx context.Context, t *testing.T, res testingx.TestResult[types.Role]) {
+				assert.NoError(t, res.Err)
+				require.Equal(t, "test2", res.Success.Name)


It'll be good to also check the other attributes here: update time, actions, and so on.

jnschaeffer · 2023-12-13T18:40:48Z

internal/testingx/testing.go

+			if !testCase.Sync {
+				t.Parallel()
+			}


We almost certainly want to ensure our test cases are run concurrently. This is a good way to detect race conditions in Go code.

alexhall

Would it make sense to store the role actions in CRDB and anoint that as a source of truth for role definitions? If I were the one on PagerDuty rotation for this system I'd be pretty nervous about the brittleness of coordinating transactions between different databases and the risk of things getting into an inconsistent state, and having a single source of truth at least gives you a starting point for restoring the system to a known-good state when the need arises.

alexhall · 2023-12-14T17:34:11Z

internal/storage/migrations/20231122000000_initial_schema.sql

+-- create "roles" table
+CREATE TABLE "roles" (
+  "id" character varying NOT NULL,
+  "name" character varying(64) NOT NULL,


What's the reasoning for setting this limit at the storage level? In my experience, doing this is just setting yourself up for running future database migrations to increase the limit - 64 characters is pretty low for something you intend to expose to users.

64 characters seems to be a common limit. This field is part of a unique index and needs to have a limit to ensure performance.

jnschaeffer · 2024-01-09T19:54:13Z

Would it make sense to store the role actions in CRDB and anoint that as a source of truth for role definitions? If I were the one on PagerDuty rotation for this system I'd be pretty nervous about the brittleness of coordinating transactions between different databases and the risk of things getting into an inconsistent state, and having a single source of truth at least gives you a starting point for restoring the system to a known-good state when the need arises.

@alexhall I missed this before, sorry. I might be wrong here, but if we have actions stored in CRDB it doesn't seem like that fixes the problems that come from coordinating transactions between both systems. If a commit fails in CRDB but we already persisted actions to SpiceDB, for example, we'll now be in an inconsistent state (i.e., SpiceDB will have a different view of actions in a role from what CRDB shows). As written, we at least keep the actions in only one place.

alexhall · 2024-01-09T20:19:51Z

Would it make sense to store the role actions in CRDB and anoint that as a source of truth for role definitions? If I were the one on PagerDuty rotation for this system I'd be pretty nervous about the brittleness of coordinating transactions between different databases and the risk of things getting into an inconsistent state, and having a single source of truth at least gives you a starting point for restoring the system to a known-good state when the need arises.

@alexhall I missed this before, sorry. I might be wrong here, but if we have actions stored in CRDB it doesn't seem like that fixes the problems that come from coordinating transactions between both systems. If a commit fails in CRDB but we already persisted actions to SpiceDB, for example, we'll now be in an inconsistent state (i.e., SpiceDB will have a different view of actions in a role from what CRDB shows). As written, we at least keep the actions in only one place.

In this situation, I wouldn't be writing to SpiceDB until after the write to CRDB succeeds. If the write to SpiceDB fails, that can be retried, and possibly even reconciled in the background, but at least I'd know that I have persisted in CRDB a consistent view of how the roles and actions should be. In other words, treat SpiceDB as a query-optimized view used for evaluating access checks, accept that data will get out of sync between CRDB and SpiceDB, and optimize for your ability to recover from that situation. I don't know how well that aligns with your existing architecture, but that's what jumps out to me at first glance when looking at this PR.

jnschaeffer · 2024-01-12T17:25:24Z

Would it make sense to store the role actions in CRDB and anoint that as a source of truth for role definitions? If I were the one on PagerDuty rotation for this system I'd be pretty nervous about the brittleness of coordinating transactions between different databases and the risk of things getting into an inconsistent state, and having a single source of truth at least gives you a starting point for restoring the system to a known-good state when the need arises.

@alexhall I missed this before, sorry. I might be wrong here, but if we have actions stored in CRDB it doesn't seem like that fixes the problems that come from coordinating transactions between both systems. If a commit fails in CRDB but we already persisted actions to SpiceDB, for example, we'll now be in an inconsistent state (i.e., SpiceDB will have a different view of actions in a role from what CRDB shows). As written, we at least keep the actions in only one place.

In this situation, I wouldn't be writing to SpiceDB until after the write to CRDB succeeds. If the write to SpiceDB fails, that can be retried, and possibly even reconciled in the background, but at least I'd know that I have persisted in CRDB a consistent view of how the roles and actions should be. In other words, treat SpiceDB as a query-optimized view used for evaluating access checks, accept that data will get out of sync between CRDB and SpiceDB, and optimize for your ability to recover from that situation. I don't know how well that aligns with your existing architecture, but that's what jumps out to me at first glance when looking at this PR.

I think this idea has merit, though perhaps for different reasons: storing the actions in CRDB lets us have more flexibility around how we present them (i.e., changing the action names, adding descriptions, etc).

That said, would it be worth putting this in a separate PR? This one is already at ~2k LOC and we probably want to give more consideration to how we define actions in general for reasons beyond DB transactions.

Database package along with migrations and migrate command giving permissions-api it's own database to store details into. Initial support is for Roles Get, Create, Update and Delete. Signed-off-by: Mike Mason <[email protected]>

This integrates the new database which contains role metadata into the query engine as well as updates the http api to expose this new information. Signed-off-by: Mike Mason <[email protected]>

Signed-off-by: Mike Mason <[email protected]>

Updated CreateRole, UpdateRole and DeleteRole to CreateRoleTransaction, UpdateRoleTransaction and DeleteRoleTransaction to make it more clear that a transaction is being started. Additionally, the comments on these methods have been updated to include statements that Commit or Rollback must be called to ensure the database lifts all locks on rows which are affected. Previously, the method names made it appear as though the action was taken and completed. However this could lead to hung connections and rows. The new names make it clear that a new transaction is being started. The returning struct only has two methods Commit and Rollback in addition to the Record attribute resulting in a simple structure. Signed-off-by: Mike Mason <[email protected]>

Signed-off-by: Mike Mason <[email protected]>

…roperly Signed-off-by: Mike Mason <[email protected]>

Since we're working with multiple backends, this allows us to place a lock early and ensure a separate request doesn't conflict with an in-flight change. Signed-off-by: Mike Mason <[email protected]>

Signed-off-by: Mike Mason <[email protected]>

jnschaeffer

Looks good - at this point I think we're at least in a good enough state to end-to-end test this and see how it shakes out before cutting a release.

mikemrm force-pushed the named-roles-database branch 6 times, most recently from 3d57e8a to 80dde1f Compare November 30, 2023 15:11

mikemrm changed the title ~~add permissions-api database and migrations for named roles~~ add named role support Dec 1, 2023

mikemrm force-pushed the named-roles-database branch from ed155e2 to d9d8992 Compare December 1, 2023 21:03

mikemrm marked this pull request as ready for review December 1, 2023 21:23

mikemrm requested review from a team as code owners December 1, 2023 21:23

jnschaeffer requested changes Dec 1, 2023

View reviewed changes

jnschaeffer requested changes Dec 4, 2023

View reviewed changes

mikemrm force-pushed the named-roles-database branch 3 times, most recently from 1f15a23 to 0a6b4ca Compare December 7, 2023 17:09

mikemrm requested a review from jnschaeffer December 7, 2023 17:16

jnschaeffer requested changes Dec 11, 2023

View reviewed changes

mikemrm force-pushed the named-roles-database branch from 11fa1d6 to 291ad8e Compare December 13, 2023 12:47

mikemrm requested a review from jnschaeffer December 13, 2023 12:55

senior reviewed Dec 13, 2023

View reviewed changes

jnschaeffer requested changes Dec 13, 2023

View reviewed changes

alexhall reviewed Dec 14, 2023

View reviewed changes

mikemrm requested a review from jnschaeffer January 11, 2024 21:42

mikemrm force-pushed the named-roles-database branch 2 times, most recently from efd9772 to 784860c Compare January 11, 2024 21:59

add permissions-api database, migrations

351eb6a

Database package along with migrations and migrate command giving permissions-api it's own database to store details into. Initial support is for Roles Get, Create, Update and Delete. Signed-off-by: Mike Mason <[email protected]>

mikemrm added 7 commits January 16, 2024 19:29

implement role metadata database into query engine

aeedd0d

This integrates the new database which contains role metadata into the query engine as well as updates the http api to expose this new information. Signed-off-by: Mike Mason <[email protected]>

add logging and health check

80a9f63

Signed-off-by: Mike Mason <[email protected]>

add support for updating of roles

84d2e1e

Signed-off-by: Mike Mason <[email protected]>

add database changes to chart and support migrations

28f33d7

Signed-off-by: Mike Mason <[email protected]>

implement review suggestions

cadcf2f

Signed-off-by: Mike Mason <[email protected]>

implement second round of review suggestions

15c1247

Signed-off-by: Mike Mason <[email protected]>

mikemrm force-pushed the named-roles-database branch from 784860c to ecd30fe Compare January 16, 2024 19:49

mikemrm added 2 commits January 16, 2024 21:27

add missing rollback comments and ensure tests are checking results p…

97ab799

…roperly Signed-off-by: Mike Mason <[email protected]>

lock role record before updating or deleting

1c56883

Since we're working with multiple backends, this allows us to place a lock early and ensure a separate request doesn't conflict with an in-flight change. Signed-off-by: Mike Mason <[email protected]>

mikemrm force-pushed the named-roles-database branch 6 times, most recently from de5f8a2 to da03164 Compare January 18, 2024 16:13

correct ListRoles to ensure it always lists roles from the database

f2d105f

Signed-off-by: Mike Mason <[email protected]>

mikemrm force-pushed the named-roles-database branch from da03164 to f2d105f Compare January 18, 2024 19:18

jnschaeffer approved these changes Jan 19, 2024

View reviewed changes

mikemrm merged commit 28ffea5 into infratographer:main Jan 19, 2024
4 checks passed

mikemrm deleted the named-roles-database branch January 19, 2024 17:39

		@@ -0,0 +1,70 @@
		{{- if has .Values.config.crdb.migrateHook (list "pre-sync" "manual") }}


		logRollbackErr(e.logger, e.store.RollbackContext(dbCtx))

		// TODO: add spicedb rollback logic.

		old := make(map[string]bool, len(oldActions))
		new := make(map[string]bool, len(newActions))

add named role support #202

add named role support #202

Conversation

mikemrm commented Nov 27, 2023 • edited Loading

jnschaeffer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jnschaeffer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jnschaeffer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jnschaeffer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexhall left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jnschaeffer commented Jan 9, 2024

alexhall commented Jan 9, 2024

jnschaeffer commented Jan 12, 2024

jnschaeffer left a comment

Choose a reason for hiding this comment

mikemrm commented Nov 27, 2023 •

edited

Loading