Rework persistence to not have a table per namespace #613

zepatrik · 2021-06-02T10:00:32Z

Is your feature request related to a problem? Please describe.

Currently we have tables per namespace. This decision was made based on the idea to have storage specific settings (like additional indexes) per namespace (outlined in #303). The problem with that is that it is not scaling well in terms of maintenance and performance in some cases.

Describe the solution you'd like

A much more scalable solution will be to use a single table that has a namespace column. We might still be able to move back to separate tables by implementing another persister, but that is a topic for when the APIs are considered stable.

Additional context

from #303

Google Zanzibar §2.3

The paper specifies the following storage parameters:

Storage pa-rameters include sharding settings and an encoding for objectIDs that helps Zanzibar optimize storage of integer, string,and other object ID formats.

From my first experience with the system, it will be helpful to allow configure database indexes as well, as the read-API can be used in different ways by a client.

cockroachdb/cockroach#63206

zepatrik · 2021-06-08T15:06:07Z

One other point that came up:

At some point in time (t1) a new namespace config is added with a relation 'observer'. Relation tuples are added that reference the 'observer' relation (e.g. projects:project1#observer@user1). At some later time (t2) the namespace config is updated, and the 'observer' relation is renamed to a more aptly named 'viewer'. But this update request will fail until all of the references to the 'observer' relation have been renamed.

The problematic part here is that we might have the relation inside a subject set in a different namespace. This prevents us (currently) from just using select count(*) from namespace_0001 where relation = 'observer' to determine whether there are still old relations.

One remedy for renaming might be to use relation IDs instead of relation strings like we do with namespaces and their IDs. Alternatively it might make sense to not store the encoded subject set in the subject column, but instead store it decoded somehow.

zepatrik · 2021-06-14T17:49:11Z

An overview of all the features and issues we want to address with this. This is really an exhausting list and I am pretty sure that there are features Keto will not support any time soon, but they are still worth considering now.

above mentioned CRDB issues: not having unbound number of tables
above described changes of namespace configuration: easily determine all tuples containing relation within one namespace
allow deletion by some query, possibly across namespaces: Bulk deletion of relation tuples #599
different kinds of check requests: e.g. Check whether a relation exists to any object #555 Allow narrowed ACL evaluation in check requests #323
custom sorting in the list API: Add examples for a typical web application #546
list all objects a subject has access to, including indirect access: Add examples for a typical web application #546 and others
snapshot tokens (aka zookies) might require staleness bound checks during query evaluation: Provide Consistency Guarantees using Snapshot Tokens #517
sharding/avoiding database hot spots: Database sharding #306

zepatrik · 2021-06-16T11:18:23Z

Proposal

Above list has a lot of features that we might be able to implement better and more performant by not having subject sets in the encoded form. Rather, I would like to have the following polymorphic table:

Tuple ID	Network ID	Namespace	Object	Relation	Subject ID	Subject Namespace	Subject Object	Subject Relation	Commit Time
uuidv4	uuidv4	string	string	string	nullstring	nullstring	nullstring	nullstring	timestamp

The primary key would be (Network ID, Namespace, Tuple ID)

Each row either has the Subject ID set, or Subject Namespace, Subject Object, and Subject Relation.

Alt 1: two tables for each type of tuple

Drawbacks compared to proposal:

not a single index for all tuples
sorting and pagination within a single query is much more complex and less performant

Advantages:

database does only allow valid rows (you can only insert a tuple that is also valid from the code perspective)

Alt 2: table with the encoded subject as it is now:

Drawbacks compared to proposal:

some of above features will definitely require us to query for information in the encoded subject, something similar to

DELETE FROM relation_tuples WHERE subject LIKE 'namespace-to-be-deleted#%'

sorting by information inside of subject sets is impossible

Advantages:

again: database does only allow valid rows

aeneasr · 2021-06-16T11:54:54Z

What does the original paper say about this?

aeneasr · 2021-06-16T11:56:29Z

By the way, FTS indices are not yet implemented in CRDB: cockroachdb/cockroach#7821

aeneasr · 2021-06-16T11:59:51Z

Multiple indices per query are also only partially supported:

zepatrik · 2021-06-16T15:34:37Z

I might not have put it clearly, my preferred proposal is the polymorphic table with decoded subject sets. The other alternatives are just ideas I had but don't like because of the drawbacks.
The paper says:

We store relation tuples of each namespace in a separate database, where each row is identified by primary key (shardID, object ID, relation, user, commit timestamp). Multiple tuple versions are stored on different rows, so that we can evaluate checks and reads at any timestamp within the garbage collection window. The ordering of primary key sallows us to look up all relation tuples for a given object ID or (object ID, relation) pair.

§ 3.1.1

They don't really talk about how subject sets or subject IDs are handled.

Closes #613

zepatrik added feat New feature or request. blocking Blocks milestones or other issues or pulls. corp/m8 Up for M8 at Ory Corp. labels Jun 2, 2021

zepatrik self-assigned this Jun 2, 2021

zepatrik mentioned this issue Jun 15, 2021

Namespace migrations are not applied on DSN memory #446

Closed

This was referenced Jun 21, 2021

Migration path for v0.7 #628

Closed

Change SQL pagination to not use offset #633

Closed

zepatrik added a commit that referenced this issue Jun 29, 2021

refactor: persistence table structure

6ab8998

Closes #613

zepatrik mentioned this issue Jun 29, 2021

refactor: persistence table structure #638

Merged

6 tasks

zepatrik closed this as completed in #638 Aug 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework persistence to not have a table per namespace #613

Rework persistence to not have a table per namespace #613

zepatrik commented Jun 2, 2021

zepatrik commented Jun 8, 2021

zepatrik commented Jun 14, 2021

zepatrik commented Jun 16, 2021 •

edited

Loading

aeneasr commented Jun 16, 2021

aeneasr commented Jun 16, 2021

aeneasr commented Jun 16, 2021

zepatrik commented Jun 16, 2021 •

edited

Loading

Rework persistence to not have a table per namespace #613

Rework persistence to not have a table per namespace #613

Comments

zepatrik commented Jun 2, 2021

zepatrik commented Jun 8, 2021

zepatrik commented Jun 14, 2021

zepatrik commented Jun 16, 2021 • edited Loading

Proposal

Alt 1: two tables for each type of tuple

Alt 2: table with the encoded subject as it is now:

aeneasr commented Jun 16, 2021

aeneasr commented Jun 16, 2021

aeneasr commented Jun 16, 2021

zepatrik commented Jun 16, 2021 • edited Loading

zepatrik commented Jun 16, 2021 •

edited

Loading

zepatrik commented Jun 16, 2021 •

edited

Loading