Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework persistence to not have a table per namespace #613

Closed
zepatrik opened this issue Jun 2, 2021 · 7 comments · Fixed by #638
Closed

Rework persistence to not have a table per namespace #613

zepatrik opened this issue Jun 2, 2021 · 7 comments · Fixed by #638
Assignees
Labels
blocking Blocks milestones or other issues or pulls. corp/m8 Up for M8 at Ory Corp. feat New feature or request.

Comments

@zepatrik
Copy link
Member

zepatrik commented Jun 2, 2021

Is your feature request related to a problem? Please describe.

Currently we have tables per namespace. This decision was made based on the idea to have storage specific settings (like additional indexes) per namespace (outlined in #303). The problem with that is that it is not scaling well in terms of maintenance and performance in some cases.

Describe the solution you'd like

A much more scalable solution will be to use a single table that has a namespace column. We might still be able to move back to separate tables by implementing another persister, but that is a topic for when the APIs are considered stable.

Additional context

from #303

Google Zanzibar §2.3

The paper specifies the following storage parameters:

Storage pa-rameters include sharding settings and an encoding for objectIDs that helps Zanzibar optimize storage of integer, string,and other object ID formats.

From my first experience with the system, it will be helpful to allow configure database indexes as well, as the read-API can be used in different ways by a client.

cockroachdb/cockroach#63206

@zepatrik zepatrik added feat New feature or request. blocking Blocks milestones or other issues or pulls. corp/m8 Up for M8 at Ory Corp. labels Jun 2, 2021
@zepatrik zepatrik self-assigned this Jun 2, 2021
@zepatrik
Copy link
Member Author

zepatrik commented Jun 8, 2021

One other point that came up:

At some point in time (t1) a new namespace config is added with a relation 'observer'. Relation tuples are added that reference the 'observer' relation (e.g. projects:project1#observer@user1). At some later time (t2) the namespace config is updated, and the 'observer' relation is renamed to a more aptly named 'viewer'. But this update request will fail until all of the references to the 'observer' relation have been renamed.

The problematic part here is that we might have the relation inside a subject set in a different namespace. This prevents us (currently) from just using select count(*) from namespace_0001 where relation = 'observer' to determine whether there are still old relations.

One remedy for renaming might be to use relation IDs instead of relation strings like we do with namespaces and their IDs. Alternatively it might make sense to not store the encoded subject set in the subject column, but instead store it decoded somehow.

@zepatrik
Copy link
Member Author

An overview of all the features and issues we want to address with this. This is really an exhausting list and I am pretty sure that there are features Keto will not support any time soon, but they are still worth considering now.

@zepatrik
Copy link
Member Author

zepatrik commented Jun 16, 2021

Proposal

Above list has a lot of features that we might be able to implement better and more performant by not having subject sets in the encoded form. Rather, I would like to have the following polymorphic table:

Tuple ID Network ID Namespace Object Relation Subject ID Subject Namespace Subject Object Subject Relation Commit Time
uuidv4 uuidv4 string string string nullstring nullstring nullstring nullstring timestamp

The primary key would be (Network ID, Namespace, Tuple ID)

Each row either has the Subject ID set, or Subject Namespace, Subject Object, and Subject Relation.

Alt 1: two tables for each type of tuple

Drawbacks compared to proposal:

  • not a single index for all tuples
  • sorting and pagination within a single query is much more complex and less performant

Advantages:

  • database does only allow valid rows (you can only insert a tuple that is also valid from the code perspective)

Alt 2: table with the encoded subject as it is now:

Drawbacks compared to proposal:

  • some of above features will definitely require us to query for information in the encoded subject, something similar to
DELETE FROM relation_tuples WHERE subject LIKE 'namespace-to-be-deleted#%'
  • sorting by information inside of subject sets is impossible

Advantages:

  • again: database does only allow valid rows

@aeneasr
Copy link
Member

aeneasr commented Jun 16, 2021

What does the original paper say about this?

@aeneasr
Copy link
Member

aeneasr commented Jun 16, 2021

By the way, FTS indices are not yet implemented in CRDB: cockroachdb/cockroach#7821

@aeneasr
Copy link
Member

aeneasr commented Jun 16, 2021

Multiple indices per query are also only partially supported:

Bildschirmfoto 2021-06-16 um 13 59 37

@zepatrik
Copy link
Member Author

zepatrik commented Jun 16, 2021

I might not have put it clearly, my preferred proposal is the polymorphic table with decoded subject sets. The other alternatives are just ideas I had but don't like because of the drawbacks.
The paper says:

We store relation tuples of each namespace in a separate database, where each row is identified by primary key (shardID, object ID, relation, user, commit timestamp). Multiple tuple versions are stored on different rows, so that we can evaluate checks and reads at any timestamp within the garbage collection window. The ordering of primary key sallows us to look up all relation tuples for a given object ID or (object ID, relation) pair.

§ 3.1.1

They don't really talk about how subject sets or subject IDs are handled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocking Blocks milestones or other issues or pulls. corp/m8 Up for M8 at Ory Corp. feat New feature or request.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants