diff --git a/content/en/docs/user-guides/configuration-advanced/_index.md b/content/en/docs/user-guides/configuration-advanced/_index.md index 22409a173..93e1c5d63 100644 --- a/content/en/docs/user-guides/configuration-advanced/_index.md +++ b/content/en/docs/user-guides/configuration-advanced/_index.md @@ -1,5 +1,5 @@ --- title: Advanced Configuration description: User guides covering advanced configuration concepts -weight: 3 ---- \ No newline at end of file +weight: 5 +--- diff --git a/content/en/docs/user-guides/configuration-advanced/createlookupvindex.md b/content/en/docs/user-guides/configuration-advanced/createlookupvindex.md index 727bf4ef5..2494d3dc0 100644 --- a/content/en/docs/user-guides/configuration-advanced/createlookupvindex.md +++ b/content/en/docs/user-guides/configuration-advanced/createlookupvindex.md @@ -5,7 +5,7 @@ aliases: ['/docs/user-guides/createlookupvindex/'] --- {{< info >}} -This guide follows on from the Get Started guides. Please make sure that you have an [Operator](../../../get-started/operator), [local](../../../get-started/local) or [Helm](../../../get-started/helm) installation ready. Make sure you are at the point where you have the sharded keyspace called `customer` setup. +This guide follows on from the Get Started guides. Please make sure that you have an [Operator](../../../get-started/operator) or [local](../../../get-started/local) installation ready. Make sure you are at the point where you have the sharded keyspace called `customer` setup. {{< /info >}} **CreateLookupVindex** is a new VReplication workflow in Vitess 6. It is used to create **and** backfill a lookup Vindex automatically for a table that already exists, and may have a significant amount of data in it already. @@ -324,18 +324,13 @@ mysql> select sku, hex(keyspace_id) from corder_lookup; +-----------+------------------+ ``` -Basically, this shows exactly what we expected. Now, we can clean up the -VReplication streams. Note these commands will clean up all VReplication -streams on these tablets. You may want to filter by `id` if there are other -streams running: +Basically, this shows exactly what we expected. Now, we have to clean-up +the artifacts of the backfill. The `ExternalizeVindex` command will delete +the vreplication streams and also clear the `write_only` flag from the +vindex indicating that it is not backfilling any more. ```sh -$ vtctlclient -server localhost:15999 VReplicationExec zone1-0000000300 "delete from _vt.vreplication" -+ -+ -$ vtctlclient -server localhost:15999 VReplicationExec zone1-0000000400 "delete from _vt.vreplication" -+ -+ +$ vtctlclient -server localhost:15999 ExternalizeVindex customer.corder_lookup ``` Next, to confirm the lookup Vindex is doing what we think it should, we can @@ -475,3 +470,6 @@ mysql> select sku, hex(keyspace_id) from corder_lookup; We added a new row to the `corder` table, and now we have a new row in the lookup table. +### ExternalizeVindex + +Once the backfill is done, diff --git a/content/en/docs/user-guides/configuration-basic/_index.md b/content/en/docs/user-guides/configuration-basic/_index.md index 12443759a..8faa6e968 100644 --- a/content/en/docs/user-guides/configuration-basic/_index.md +++ b/content/en/docs/user-guides/configuration-basic/_index.md @@ -1,5 +1,5 @@ --- title: Configuration description: User guides covering basic configuration concepts -weight: 1 ---- \ No newline at end of file +weight: 2 +--- diff --git a/content/en/docs/user-guides/migration/_index.md b/content/en/docs/user-guides/migration/_index.md index fb917bbe3..faf9ba602 100644 --- a/content/en/docs/user-guides/migration/_index.md +++ b/content/en/docs/user-guides/migration/_index.md @@ -1,5 +1,5 @@ --- title: Migration description: User guides covering migration to Vitess -weight: 2 ---- \ No newline at end of file +weight: 3 +--- diff --git a/content/en/docs/user-guides/operating-vitess/_index.md b/content/en/docs/user-guides/operating-vitess/_index.md index 9b481b632..9b91e1356 100644 --- a/content/en/docs/user-guides/operating-vitess/_index.md +++ b/content/en/docs/user-guides/operating-vitess/_index.md @@ -2,5 +2,5 @@ title: Operational description: User guides for covering operational aspects of Vitess description: User guides covering operational aspects of Vitess -weight: 4 ---- \ No newline at end of file +weight: 5 +--- diff --git a/content/en/docs/user-guides/sql/_index.md b/content/en/docs/user-guides/sql/_index.md index e79cbe08c..67307d95a 100644 --- a/content/en/docs/user-guides/sql/_index.md +++ b/content/en/docs/user-guides/sql/_index.md @@ -1,5 +1,5 @@ --- title: SQL Statement Analysis description: User guides covering analyzing SQL statements -weight: 3 ---- \ No newline at end of file +weight: 4 +--- diff --git a/content/en/docs/user-guides/vschema-guide/_index.md b/content/en/docs/user-guides/vschema-guide/_index.md new file mode 100644 index 000000000..c60d42ebf --- /dev/null +++ b/content/en/docs/user-guides/vschema-guide/_index.md @@ -0,0 +1,5 @@ +--- +title: VSchema and Query Serving +description: Configuring VSchema for serving queries +weight: 1 +--- diff --git a/content/en/docs/user-guides/vschema-guide/advanced-vschema.md b/content/en/docs/user-guides/vschema-guide/advanced-vschema.md new file mode 100644 index 000000000..1f4bb9df5 --- /dev/null +++ b/content/en/docs/user-guides/vschema-guide/advanced-vschema.md @@ -0,0 +1,135 @@ +--- +title: Advanced VSchema Properties +weight: 11 +--- + +With the exception of Multi-Column Vindexes, advanced VSchema Properties do not have DDL constructs. They can only be updated through `vtctld` CLI commands. + +## Multi-Column Vindexes + +Multi-Column Vindexes are useful in the following two use cases: + +* Grouping customers by their regions so they can be hosted in specific geographical locations. This may be required for compliance, and also to achieve better performance. +* For a multi-tenant system, grouping all rows of a tenant in a separate set of shards. This limits the fan out of queries if searching only for rows that are related to a single tenant. + +In both cases the leading column is the region or tenant, and is used to form the first few bits of the `keyspace_id`. The second column is used for the bits that follow. Since Vitess shards by keyrange, this approach will naturally group all rows of a region or tenant within the same shard, or within a group of consecutive shards. Since each shard is its own MySQL cluster, these can then be deployed to different regions as needed. + +Please refer to [Region-based Sharding](../../configuration-advanced/region-sharding) for an example on how to use the `region_json` vindex. + +Currently, the Vindex gets used for assigning a `keyspace_id` at the time of insert and at the time of resharding. Additional vindexes need to be added to the table for routing query constructs that contain WHERE clauses. + +Vitess does not have the capability to route a query based on multiple values of a multi-column vindex in a where clause yet. This feature will be added soon. + +#### Alternate approach + +You have the option to pre-combine the region and id bits into a single column and use that as an input for a single column vindex. This approach achieves the same goals as a multi-column vindex. Moreover, you avoid having to define additional vindexes for query routing. + +The downside of this approach is that it is harder to migrate an id to a different region. + +## Reference Tables + +Sharded databases often need the ability to join their tables with smaller “reference” tables. For example, the `product` table could be seen as a reference table. Other use cases are tables that map static information like zipcode to city, etc. + +Joining against these tables across keyspaces results in cross-shard joins that may not be very efficient or fast. + +Vitess allows you to create a table in a sharded keyspace as a reference table. This means that it will treat the table as having an identical set of rows across all shards. A query that joins a sharded table against such reference tables is then performed locally within each shard. + +A reference table should not have any vindex, and is defined in the VSchema as a reference type: + +```json +{ + "sharded": true, + "tables": { + "zip_detail": { "type": "reference" } + } +} +``` + +It may become a challenge to keep a reference table correctly updated across all shards. Vitess supports the [Materialize](../../migration/materialize) feature that allows you to maintain the original table in an unsharded keyspace and automatically propagate changes to that table in real-time across all shards. + +## Column List + +The VSchema allows you to specify the list of columns along with their types for every table. This allows Vitess to make optimization decisions where necessary. + +For example, specifying that a column contains text allows VTGate to request further collation specific information (`weight_string`) if additional sorting is needed after collecting results from all shards. + +For example, issuing this query against `customer` would fail: + +```text +mysql> select customer_id, uname from customer order by uname; +ERROR 1105 (HY000): vtgate: http://sougou-lap1:12345/: types are not comparable: VARCHAR vs VARCHAR +``` + +However, we can modify the VSchema as follows: + +```json + "customer": { + "column_vindexes": [{ + "column": "customer_id", + "name": "hash" + }], + "auto_increment": { + "column": "customer_id", + "sequence": "product.customer_seq" + }, + "columns": [{ + "name": "uname", + "type": "VARCHAR" + }] + } +``` + +Re-issuing the same query will now succeed: + +```text +mysql> select customer_id, uname from customer order by uname; ++-------------+---------+ +| customer_id | uname | ++-------------+---------+ +| 1 | alice | +| 2 | bob | +| 3 | charlie | +| 4 | dan | +| 5 | eve | ++-------------+---------+ +5 rows in set (0.00 sec) +``` + +Specifying columns against tables also allows VTGate to resolve ambiguous naming of columns against the right tables. + +#### Authoritative List + +If you have listed all columns of a table in the VSchema, you can add the `column_list_authoritative` flag to the table: + +```json + "customer": { + "column_vindexes": [{ + "column": "customer_id", + "name": "hash" + }], + "auto_increment": { + "column": "customer_id", + "sequence": "product.customer_seq" + }, + "columns": [{ + "name": "uname", + "type": "VARCHAR" + }], + "column_list_authoritative": true + } +``` + +This flag causes VTGate to automatically expand expressions like `select *` or insert statements that don’t specify the column list. + +The caveat about using this feature is that you have to keep this column list in sync with the underlying schema. + +In the future, Vitess will allow you to pull this information from the vttablets and automatically keep it up-to-date. + +## Routing Rules + +Routing Rules are an advanced method of redirecting queries meant for one table to another. They are just pointers and are analogous to symbolic links in a file system. You should generally not have to use routing rules in Vitess. + +Workflows like `MoveTables` make use of routing rules to create the existence of the target tables and manage traffic switch from source to target by manipulating these routing rules. + +For more information, please refer to the [Routing Rules](../../../reference/features/schema-routing-rules) section. + diff --git a/content/en/docs/user-guides/vschema-guide/img/vschema1.png b/content/en/docs/user-guides/vschema-guide/img/vschema1.png new file mode 100644 index 000000000..f85138de2 Binary files /dev/null and b/content/en/docs/user-guides/vschema-guide/img/vschema1.png differ diff --git a/content/en/docs/user-guides/vschema-guide/img/vschema2.png b/content/en/docs/user-guides/vschema-guide/img/vschema2.png new file mode 100644 index 000000000..85ec159f2 Binary files /dev/null and b/content/en/docs/user-guides/vschema-guide/img/vschema2.png differ diff --git a/content/en/docs/user-guides/vschema-guide/lookup-as-primary.md b/content/en/docs/user-guides/vschema-guide/lookup-as-primary.md new file mode 100644 index 000000000..cc84f6378 --- /dev/null +++ b/content/en/docs/user-guides/vschema-guide/lookup-as-primary.md @@ -0,0 +1,146 @@ +--- +title: Lookup as Primary Vindex +weight: 10 +--- + +It is likely that a customer order goes through a life cycle of events. This would best be represented in a separate `corder_event` table that will contain a `corder_id` column as a foreign key into `corder.corder_id`. It would also be beneficial to co-locate the event rows with their associated order. + +Just like we shared the `hash` vindex between `customer` and `corder`, we can share `corder_keyspace_idx` between `corder` and `corder_event`. We can also make it the Primary Vindex for `corder_event`. When an order is created, the lookup row for it is also created. Subsequently, an insert into `corder_event` will request the vindex to compute the `keyspace_id` for that `corder_id`, and that will succeed because the lookup entry for it already exists. This is where the significance of the owner table comes into play: The owner table creates the entries, whereas other tables only read those entries. + +Inserting a `corder_event` row without creating a corresponding `corder` entry will result in an error. This behavior is in line with the traditional foreign key constraint enforced by relational databases. + +Sharing the lookup vindex also has the additional benefit of saving space because we avoid creating separate entries for the new table. + +We start with creating the sequence table in the `product` keyspace. + +Schema: + +```sql +create table corder_event_seq(id bigint, next_id bigint, cache bigint, primary key(id)) comment 'vitess_sequence'; +insert into corder_event_seq(id, next_id, cache) values(0, 1, 3); +``` + +VSchema: + +```json + "corder_event_seq": { "type": "sequence" } +``` + +We then create the `corder_event` table in `customer`: + +```sql +create table corder_event(corder_event_id bigint, corder_id bigint, ename varchar(128), primary key(corder_id, corder_event_id)); +``` + +In the VSchema, there is no need to create a vindex because we are going to reuse the existing one: + +```json + "corder_event": { + "column_vindexes": [{ + "column": "corder_id", + "name": "corder_keyspace_idx" + }], + "auto_increment": { + "column": "corder_event_id", + "sequence": "product.corder_event_seq" + } + } +``` + +Alternate VSchema DDL: + +```sql +alter vschema add sequence product.corder_event_seq; +alter vschema on customer.corder_event add vindex corder_keyspace_idx(corder_id); +alter vschema on customer.corder_event add auto_increment corder_event_id using product.corder_event_seq; +``` + +We can now insert rows in `corder_event` against rows in `corder`: + +```text +mysql> insert into corder(customer_id, product_id, oname) values (1,1,'gift'),(1,2,'gift'),(2,1,'work'),(3,2,'personal'),(4,1,'personal'); +Query OK, 5 rows affected (0.04 sec) + +mysql> insert into corder_event(corder_id, ename) values(1, 'paid'), (5, 'delivered'); +Query OK, 2 rows affected (0.01 sec) + +mysql> insert into corder_event(corder_id, ename) values(6, 'expect failure'); +ERROR 1105 (HY000): vtgate: http://sougou-lap1:12345/: execInsertSharded: getInsertShardedRoute: could not map [INT64(6)] to a keyspace id +``` + +As expected, inserting a row for a non-existent order results in an error. + +### Reversible Vindexes + +In Vitess, it is insufficient for a table to only have a Lookup Vindex. This is because it is not practical to reshard such a table. The overhead of performing a lookup before redirecting every row event to a new shard would be prohibitively expensive. + +To overcome this limitation, we must add a column with a non-lookup vindex, also known as Functional Vindex to the table. By rule, the Primary Vindex computes the keyspace id of the row. This means that the value of the column should also be such that it yields the same keyspace id. + +A Reversible Vindex is one that can back-compute the column value from a given keyspace id. If such a vindex is used for this new column, then Vitess will automatically perform this work and fill the correct value for it. The list of vindex properties, like Functional, Reversible, etc. are listed in the [Vindexes Reference](../../../features/vindexes). + +In other words, adding a column with a vindex that is both Functional and Reversible allows Vitess to auto-fill the values, thereby avoiding any impact to the application logic. + +The `binary` vindex is one that yields the input value itself as the `keyspace_id`, and is naturally reversible. Using this Vindex will generate the `keyspace_id` as the column value. The modified schema for the table will be as follows: + +```sql +create table corder_event(corder_event_id bigint, corder_id bigint, ename varchar(128), keyspace_id varbinary(10), primary key(corder_id, corder_event_id)); +``` + +We create a vindex instantiation for `binary`: + +```json + "binary": { + "type": "binary" + } +``` + +Modify the table VSchema: + +```json + "corder_event": { + "column_vindexes": [{ + "column": "corder_id", + "name": "corder_keyspace_idx" + }, { + "column": "keyspace_id", + "name": "binary" + }], + "auto_increment": { + "column": "corder_event_id", + "sequence": "product.corder_event_seq" + } + } +``` + +Alternate VSchema DDL: + +```sql +alter vschema on customer.corder_event add vindex `binary`(keyspace_id) using `binary`; +``` + +Note that `binary` needs to be backticked because it is a keyword. + +After these modifications, we can now observe that the `keyspace_id` column is getting automatically populated: + +```text +mysql> insert into corder(customer_id, product_id, oname) values (1,1,'gift'),(1,2,'gift'),(2,1,'work'),(3,2,'personal'),(4,1,'personal'); +Query OK, 5 rows affected (0.01 sec) + +mysql> insert into corder_event(corder_id, ename) values(1, 'paid'), (5, 'delivered'); +Query OK, 2 rows affected (0.01 sec) + +mysql> select corder_event_id, corder_id, ename, hex(keyspace_id) from corder_event; ++-----------------+-----------+-----------+------------------+ +| corder_event_id | corder_id | ename | hex(keyspace_id) | ++-----------------+-----------+-----------+------------------+ +| 1 | 1 | paid | 166B40B44ABA4BD6 | +| 2 | 5 | delivered | D2FD8867D50D2DFE | ++-----------------+-----------+-----------+------------------+ +2 rows in set (0.00 sec) +``` + +There is no support for backfilling the reversible vindex column yet. This will be added soon. + +{{< info >}} +The original `keyspace_id` for all these rows came from `customer_id`. Since `hash` is also a reversible vindex, reversing the `keyspace_id` using `hash` will yield the `customer_id`. We could instead leverage this knowledge to replace `keyspace_id+binary` with `customer_id+hash`. Vitess will auto-populate the correct value. Using this approach may be more beneficial because `customer_id` is a value the application can understand and make use of. +{{< /info >}} diff --git a/content/en/docs/user-guides/vschema-guide/non-unique-lookup.md b/content/en/docs/user-guides/vschema-guide/non-unique-lookup.md new file mode 100644 index 000000000..59526f9f7 --- /dev/null +++ b/content/en/docs/user-guides/vschema-guide/non-unique-lookup.md @@ -0,0 +1,194 @@ +--- +title: Non-Unique Lookup Vindexes +weight: 9 +--- + +The `oname` column in `corder` can contain duplicate values. There may be a need in the application to frequently search by this column: + +```sql +select * from corder where oname='gift' +``` + +To prevent this query from resulting in a full scatter, we will need to create a lookup vindex for it. But this time, it will need to be non-unique. However, the fact that duplicates are allowed leads to a complication with the lookup table approach. Let us look at the insert query: + +```sql +insert into corder(customer_id, product_id, oname) values (1,1,'gift'),(1,2,'gift'),(2,1,'work'),(3,2,'personal'),(4,1,'personal'); +``` + +We see that `customer_id 1` has two rows where the `oname` is `gift`. If we try to create entries for those two in a lookup table, they would be identical: + +```text ++-----------+--------------+ +| oname | hex(keyspace_id) | ++-----------+--------------+ +| gift | 166B40B44ABA4BD6 | (corder_id=1) +| gift | 166B40B44ABA4BD6 | (corder_id=2) ++-----------+--------------+ +``` + +To disambiguate this situation, non-unique lookup vindexes require you to add additional columns to the lookup table. They are typically the Primary Key of the main table. For the sake of demonstration, let us create this as a sharded table in the `customer` keyspace: + +```sql +create table oname_keyspace_idx(oname varchar(128), corder_id bigint, keyspace_id varbinary(10), primary key(oname, corder_id)); +``` + +Note that the primary key includes the `oname` column as well as the `corder_id` column. + +Because `oname` is a text column, the recommended Primary Vindex for it would be `unicode_loose_md5`, which also requires a vindex instantiation: + +“vindexes” section: + +```json + "unicode_loose_md5": { + "type": "unicode_loose_md5" + } +``` + +“tables” section: + +```json + "oname_keyspace_idx": { + "column_vindexes": [{ + "column": "oname", + "name": "unicode_loose_md5" + }] + } +``` + +The lookup vindex should reference these new columns as follows: + +```json + "oname_keyspace_idx": { + "type": "consistent_lookup", + "params": { + "table": "customer.oname_keyspace_idx", + "from": "oname,corder_id", + "to": "keyspace_id" + }, + "owner": "corder" + } +``` + +{{< info >}} +This Vindex could also be seen as a multi-column Unique Lookup Vindex: For a given pair of `oname,corder_id` as input, the result can only yield a single `keyspace_id`. However, the `consistent_lookup` vindex functionality only supports resolution using the first column `oname`. In the future, we may add the ability to use both columns as input if they are present in the `where` clause. This may result in the merger of `consistent_lookup` with a multi-column version of `consistent_lookup_unique` that can also perform non-unique lookups on a subset of the inputs. +{{< /info >}} + +Finally, we tie the associated columns in `corder` to the vindex: + +```json + "corder": { + "column_vindexes": [{ + "column": "customer_id", + "name": "hash" + }, { + "column": "corder_id", + "name": "corder_keyspace_idx" + }, { + "columns": ["oname", "corder_id"], + "name": "oname_keyspace_idx" + }], + "auto_increment": { + "column": "corder_id", + "sequence": "product.corder_seq" + } + } +``` + +Alternate VSchema DDL: + +```sql +alter vschema on customer.oname_keyspace_idx add vindex unicode_loose_md5(oname) using unicode_loose_md5; +alter vschema on customer.corder add vindex oname_keyspace_idx(oname,corder_id) using consistent_lookup with owner=`corder`, table=`customer.oname_keyspace_idx`, from=`oname,corder_id`, to=`keyspace_id`; +``` + +We can now look at the effects of this change: + +```text +mysql> insert into corder(customer_id, product_id, oname) values (1,1,'gift'),(1,2,'gift'),(2,1,'work'),(3,2,'personal'),(4,1,'personal'); +Query OK, 5 rows affected (0.03 sec) + +mysql> use `customer:-80`; +Database changed +mysql> select oname, corder_id, hex(keyspace_id) from oname_keyspace_idx; ++-------+-----------+------------------+ +| oname | corder_id | hex(keyspace_id) | ++-------+-----------+------------------+ +| gift | 1 | 166B40B44ABA4BD6 | +| gift | 2 | 166B40B44ABA4BD6 | +| work | 3 | 06E7EA22CE92708F | ++-------+-----------+------------------+ +3 rows in set (0.00 sec) + +mysql> use `customer:80-`; +Database changed +mysql> select oname, corder_id, hex(keyspace_id) from oname_keyspace_idx; ++----------+-----------+------------------+ +| oname | corder_id | hex(keyspace_id) | ++----------+-----------+------------------+ +| personal | 4 | 4EB190C9A2FA169C | +| personal | 5 | D2FD8867D50D2DFE | ++----------+-----------+------------------+ +2 rows in set (0.00 sec) +``` + +We can see that the lookup table is following its own sharding scheme and distributing its rows according to the value of the `oname` column. + +Deleting one of the `corder` rows results in the corresponding lookup row being deleted: + +```text +mysql> delete from corder where corder_id=1; +Query OK, 1 row affected (0.00 sec) + +mysql> select oname, corder_id, hex(keyspace_id) from oname_keyspace_idx where oname='gift'; ++-------+-----------+------------------+ +| oname | corder_id | hex(keyspace_id) | ++-------+-----------+------------------+ +| gift | 2 | 166B40B44ABA4BD6 | ++-------+-----------+------------------+ +1 row in set (0.00 sec) +``` + +{{< info >}} +You would typically have to create a MySQL non-unique index on `oname` for queries to work efficiently. While these vindexes and indexes improve read performance, the trade-off is that they also increase storage requirements and amplify writes when inserting rows. +{{< /info >}} + +### CreateLookupVindex + +To create such a lookup vindex on a real Vitess cluster, you can use the following instructions: + +Save the following json into a file, say `oname_keyspace_idx.json`: + +```json +{ + "sharded": true, + "vindexes": { + "oname_keyspace_idx": { + "type": "consistent_lookup", + "params": { + "table": "customer.oname_keyspace_idx", + "from": "oname,corder_id", + "to": "keyspace_id" + }, + "owner": "corder" + } + }, + "tables": { + "corder": { + "column_vindexes": [{ + "columns": ["oname", "corder_id"], + "name": "oname_keyspace_idx" + }] + } + } +} +``` + +And issue the vtctlclient command: + +```sh +$ vtctlclient -server CreateLookupVindex -tablet_types=REPLICA customer "$(cat oname_keyspace_idx.json)" +``` + +The workflow will automatically create the necessary Primary Vindex entries for `oname_keyspace_idx` knowing that it is sharded. + +After the backfill is done, you should clean up the workflow. More detailed instructions are available in the [CreateLookupVindex Reference](../../configuration-advanced/createlookupvindex) diff --git a/content/en/docs/user-guides/vschema-guide/overview.md b/content/en/docs/user-guides/vschema-guide/overview.md new file mode 100644 index 000000000..00cfd877d --- /dev/null +++ b/content/en/docs/user-guides/vschema-guide/overview.md @@ -0,0 +1,34 @@ +--- +title: Overview +weight: 2 +--- + +One of the goals for Vitess is to provide a unified view for a large number of MySQL clusters distributed across multiple data centers and regions. + +Vitess achieves this goal by allowing the application to connect to any VTGate server, and that server gives you the semblance of being connected to a single MySQL server. The metadata that maps the logical view to the physical MySQL servers is stored in the topology. + +In this logical view, a Vitess keyspace is the equivalent of a MySQL database. In many cases, this is a one-to-one mapping where a keyspace directly corresponds to a physical MySQL server with a single database. However, a Vitess keyspace can also be sharded. If so, a single keyspace would map to multiple MySQL servers behind the scenes. + +The topology is typically spread across multiple Topo Servers: The Global Topo server contains global information, like the list of keyspaces, shards and cells. This information gets deployed into cell-specific topo servers. Each cell-specific Topo Server contains additional information about vttablets and MySQL servers running in that cell. With this architecture, an outage in one cell does not affect other cells. + +The topo also stores a VSchema for each keyspace. For an unsharded keyspace, the vschema is a simple list of table names. If a keyspace is sharded, then it must contain additional metadata about the sharding scheme for each table, and how they relate to each other. When a query is received by VTGate, the information in the vschema is used to make decisions about how to serve the query. In some cases, it will result in the query being routed to a single shard. In other cases, it could result in the query being sent to all shards, etc. + +This guide explains how to build vschemas for Vitess keyspaces. + +### Demo + +To illustrate the various features of the VSchema, we will make use of the [demo app](https://github.com/vitessio/vitess/tree/master/examples/demo). After installing Vitess, you can launch this demo by running `go run demo.go`. Following this, you can visit http://localhost:8000 to view the tables, issue arbitrary queries, and view their effects. + +Alternatively, you can also connect to Vitess using a MySQL client: `mysql -h 127.0.0.1 -P 12348`. + +The demo models a set of tables that are similar to those presented in the [Getting Started](../../../get-started/local) guide, but with more focus on the VSchema. + +Note that the demo brings up a test process called vtcombo (instead of a real Vitess cluster), which is functionally equivalent to all the components of Vitess, but within a single process. + +You can also use the demo app to follow along the steps of this user guide. If so, you can start by emptying out the files under `schema/product` and `schema/customer`, and incrementally making the changes presented in the steps that follow. + +### VSchema DDL + +The demo describes the VSchema JSON syntax. Many of the changes can be executed by issuing special DDL commands that Vitess understands. Wherever applicable, we have provided the equivalent DDL construct you could apply if you were running a live system. All the DDLs are also listed in the `vschema_ddls.sql` file. + +It is generally recommended that you get familiar with the JSON syntax as it will be useful for troubleshooting if something does not work as intended. diff --git a/content/en/docs/user-guides/vschema-guide/pictorial.md b/content/en/docs/user-guides/vschema-guide/pictorial.md new file mode 100644 index 000000000..f3380f47b --- /dev/null +++ b/content/en/docs/user-guides/vschema-guide/pictorial.md @@ -0,0 +1,14 @@ +--- +title: Pictorial Summary +weight: 12 +--- + +The following two diagrams highlight some of the relationships that exist between VSchema elements and the MySQL tables. + +### product and customer + +![vschema1](../img/vschema1.png) + +### corder + +![vschema2](../img/vschema2.png) diff --git a/content/en/docs/user-guides/vschema-guide/sequences.md b/content/en/docs/user-guides/vschema-guide/sequences.md new file mode 100644 index 000000000..7b5322659 --- /dev/null +++ b/content/en/docs/user-guides/vschema-guide/sequences.md @@ -0,0 +1,95 @@ +--- +title: Sequences +weight: 6 +--- + +The sharded `customer` table we created did not have an auto-increment column. The Vitess Sequence feature can be used to emulate the same behavior as MySQL’s auto-increment. A Vitess sequence is a single row unsharded tablet that keeps track of ids issued so far. Additionally, a configurable number of values can be cached by vttablet to minimize round trips into MySQL. + +We will create the sequence table in the unsharded `product` keyspace as follows: + +```sql +create table customer_seq(id bigint, next_id bigint, cache bigint, primary key(id)) comment 'vitess_sequence'; +insert into customer_seq(id, next_id, cache) values(0, 1, 3); +``` + +Note the special comment `vitess_sequence`. This instructs vttablet that this is a special table. + +The table needs to be pre-populated with a single row where: +* `id` must always be 0 +* `next_id` should be set to the next (starting) value of the sequence +* `cache` is the number of values to cache before updating the table for the next value. This value should be set to a fairly large number like 1000. We have set the value to `3` mainly to demonstrate how the feature works. + +Since this is a special table, we have to inform the vschema by giving it a `sequence` type. + +``` + "customer_seq": { "type": "sequence" } +``` + +Once setup this way, you can use the special `select next` syntax to generate values from this sequence: + + +```text +mysql> select next 2 values from customer_seq; ++---------+ +| nextval | ++---------+ +| 1 | ++---------+ +1 row in set (0.00 sec) + +mysql> select next 1 values from customer_seq; ++---------+ +| nextval | ++---------+ +| 3 | ++---------+ +1 row in set (0.00 sec) +``` + +The construct returns the first of the N values generated. + +However, this is insufficient to emulate MySQL’s auto-increment behavior. To achieve this, we have to inform the VSchema that the `customer_id` column should use this sequence to generate values if no value is specified. This is done by adding the following section to the `customer` table: + +```json + "auto_increment": { + "column": "customer_id", + "sequence": "product.customer_seq" + } +``` + +Alternate VSchema DDL: + +```sql +alter vschema add sequence product.customer_seq; +alter vschema on customer.customer add auto_increment customer_id using product.customer_seq; +``` + +With this, you can insert into `customer` without specifying the `customer_id`: + +```text +mysql> insert into customer(uname) values('alice'),('bob'),('charlie'),('dan'),('eve'); +Query OK, 5 rows affected (0.03 sec) + +mysql> use `customer:-80`; +Database changed +mysql> select * from customer; ++-------------+---------+ +| customer_id | uname | ++-------------+---------+ +| 1 | alice | +| 2 | bob | +| 3 | charlie | +| 5 | eve | ++-------------+---------+ +4 rows in set (0.00 sec) + +mysql> use `customer:80-`; +Database changed +mysql> select * from customer; ++-------------+-------+ +| customer_id | uname | ++-------------+-------+ +| 4 | dan | ++-------------+-------+ +1 row in set (0.00 sec) +``` diff --git a/content/en/docs/user-guides/vschema-guide/sharded.md b/content/en/docs/user-guides/vschema-guide/sharded.md new file mode 100644 index 000000000..e0a2c83bf --- /dev/null +++ b/content/en/docs/user-guides/vschema-guide/sharded.md @@ -0,0 +1,130 @@ +--- +title: Sharded Keyspace +weight: 5 +--- + +A sharded keyspace allows you to split a large database into smaller parts by distributing the rows of each table into different shards. In Vitess, each shard is assigned a `keyrange`. Every row has a keyspace id, and this value decides the shard in which the row lives. For key-value stores, the keyspace id is dictated by the value of the key, also known as the sharding key. In Vitess, this is known as the Primary Vindex. But it differs from a sharding key in the following ways: + +* Any column or a set of columns can be chosen to be the primary vindex. +* The Vindex also decides the sharding function that controls how the data is distributed. +* The sharding function is pluggable, allowing for user-defined sharding schemes. + +Vitess provides many predefined vindex types. The most popular ones are: +* `hash`: for numbers +* `unicode_loose_md5`: for text columns +* `binary_md5`: for binary columns + +In our example, we are going to designate `customer` as a sharded keyspace, and create a `customer` table in it. The schema for the table is as follows: + +```sql +create table customer(customer_id bigint, uname varchar(128), primary key(customer_id)); +``` + +In the VSchema, we need to designate which column should be the Primary Vindex, and choose the vindex type for it. The `customer_id` column seems to be the natural choice. Since it is a number, we will choose `hash` as the vindex type: + +```json +{ + "sharded": true, + "vindexes": { + "hash": { + "type": "hash" + } + }, + "tables": { + "customer": { + "column_vindexes": [{ + "column": "customer_id", + "name": "hash" + }] + } + } +} +``` + +In the above section, we are instantiating a vindex named `hash` from the vindex type `hash`. Such instantiations are listed in the `vindexes` section of the vschema. The tables are expected to refer to the instantiated name. There are a few reasons why this additional level of indirection is necessary: +* As we will see later, vindexes can be instantiated with different input parameters. In such cases, they have to have their own distinct names. +* Vindexes can be shared by tables, and this has special meaning. We will cover this in a later section. +* Vindexes can also be referenced as if they were tables and can be used to compute the keyspace id for a given input. + +The `column_vindexes` section is a list. This is because a table can have multiple vindexes. If so, the first vindex in the list must be the Primary Vindex. More information about vindexes can be found in the [Vindex Reference](../../../reference/features/vindexes). + +Alternate VSchema DDL: + +```sql +alter vschema on customer.customer add vindex hash(customer_id) using hash; +``` + +The DDL creates the `hash` vindex under the `vindexes` section, the `customer` table under the `tables` section, and associates the `customer_id` column to `hash`. For sharded keyspaces, the only way to create a table is using the above construct. This is because a primary vindex is mandatory for sharded tables. + +{{< info >}} +Every sharded table must have a Primary Vindex. A Primary Vindex must be instantiated from a vindex type that is Unique. `hash`, `unicode_loose_md5` and `binary_md5` are unique vindex types. +{{< /info >}} + +The demo brings up the customer as two shards: `-80` and `80-`. For a `hash` vindex, input values of 1, 2 and 3 fall in the `-80` range, and 4 falls in the `80-` range. Restarting the demo with the updated configs should allow you to perform the following: + +```text +mysql> insert into customer(customer_id,uname) values(1,'alice'),(4,'dan'); +Query OK, 2 rows affected (0.00 sec) + +mysql> use `customer:-80`; +Database changed +mysql> select * from customer; ++-------------+-------+ +| customer_id | uname | ++-------------+-------+ +| 1 | alice | ++-------------+-------+ +1 row in set (0.00 sec) + +mysql> use `customer:80-`; +Database changed +mysql> select * from customer; ++-------------+-------+ +| customer_id | uname | ++-------------+-------+ +| 4 | dan | ++-------------+-------+ +1 row in set (0.00 sec) +``` + +You will notice that we used a special shard targeting construct: `use customer:-80`. Vitess allows you to use this hidden database name to bypass its routing logic and directly send queries to a specific shard. Using this construct, we are able to verify that the rows went to different shards. + +At the time of insert, the Primary Vindex is used to compute and assign a keyspace id to each row. This keyspace id gets used to decide where the row will be stored. Although a keyspace id is not explicitly stored anywhere, it must be seen as an unchanging property of that row, as if there was an invisible column for it. + +Consequently, you cannot make changes to a row that can cause the keyspace id to change. Such a change will be supported in the future through a shard move operation. Trying to change the value of a Primary Vindex results in an error: + +```text +mysql> update customer set customer_id=2 where customer_id=1; +ERROR 1235 (HY000): vtgate: http://sougou-lap1:12345/: unsupported: You can't update primary vindex columns. Invalid update on vindex: hash +``` + +A Primary Vindex can also be used to find rows if referenced in a where clause: + +```text +mysql> select * from customer where customer_id=1; ++-------------+-------+ +| customer_id | uname | ++-------------+-------+ +| 1 | alice | ++-------------+-------+ +1 row in set (0.00 sec) +``` + +If you run the above query in the demo app, the panel on the bottom right will show that the query was executed only on one shard. + +On the other hand, the query below will get sent to all shards because there is no where clause: + +```text +mysql> select * from customer; ++-------------+-------+ +| customer_id | uname | ++-------------+-------+ +| 4 | dan | +| 1 | alice | ++-------------+-------+ +2 rows in set (0.01 sec) +``` + +{{< info >}} +There is no implicit or predictable ordering for rows that are gathered from multiple shards. If a specific order is required, the query must include an `order by` clause. +{{< /info >}} diff --git a/content/en/docs/user-guides/vschema-guide/sharding-guidelines.md b/content/en/docs/user-guides/vschema-guide/sharding-guidelines.md new file mode 100644 index 000000000..13dd68c87 --- /dev/null +++ b/content/en/docs/user-guides/vschema-guide/sharding-guidelines.md @@ -0,0 +1,79 @@ +--- +title: Sharding Guidelines +weight: 3 +--- + +The following guidelines are not set in stone. They mainly establish a framework for making decisions. + +### Why + +There was a time when sharding used to be a line that one should avoid crossing for as long as possible. However, with Vitess considerably reducing the pain of sharding, we can look at leveraging some of its benefits much sooner than when a machine runs out of capacity: + +* Smaller blast radius: If a shard goes down, the outage affects a smaller percentage of the users. +* Improved resource utilization: It is difficult to pack large instances of servers efficiently across machines. It is much easier to utilize the capacity of the existing hardware if the shard sizes were relatively small. Orchestration systems like Kubernetes further facilitate such utilization. +* Reduced contention: MySQL itself runs a lot better when instance sizes are small. There is less internal contention, replicas tend to keep up more easily with their master, etc. +* Improved maintenance: Operations like schema deployment can happen in parallel across all shards and finish much sooner than usual. + +There is a general worry that the complexity of deployment increases with the number of servers. However, this becomes a static cost once the necessary automation and monitoring is put in place. + +### Why not + +There are also reasons why you may want to avoid sharding. The main reason is that you may introduce inefficiencies by splitting data that would have been better off if it stayed together. Or, if your database is extremely small. + +However, if you reach a point where the data is starting to grow, sharding may become inevitable. + +### Moving Tables + +Typically, the first step you may perform is to split your database by moving some tables on to other databases. In Vitess parlance, we call this as splitting off keyspaces. The [MoveTables](../../migration/move-tables) workflow allows you to perform this with minimal impact to the application. + +### Resharding + +Beyond a certain point, it may not make sense to separate tables that are strongly related to each other. This is when resharding comes into play. Choosing the “sharding key” is often intuitively obvious. + +If you analyze the query pattern in the application, the query with the highest QPS will dictate the sharding key (or Primary Vindex). In our example below, we will be choosing `customer_id` as the Primary Vindex for the `customer` table. + +If there are queries with other where clauses on the same table, those would be candidates for secondary lookup vindexes. + +### Joins + +The next criteria to take into account are joins. If you are performing joins across tables, it will be beneficial to keep those rows together. In our example, we will be keeping the rows of the order table along with their customer. This grouping will allow us to efficiently perform operations like reading all orders placed by a customer. + +### Transactions + +It is important to keep transactions to be within one shard. Vitess currently does not guarantee atomicity for transactions that go across shards. Grouping related rows together usually results in transactions also falling within the same shard. + +But there are situations where this may not be possible. If so, you can see if it will be possible to avoid this problem by grouping data differently. However, this may not be possible either. A well known use case is one where customers send each other money. + +In such situations, you can look at refactoring the application such that transactions are broken into smaller single shard transactions. + +Vitess will be adding support for distributed transactions soon. + +### Large Tenants + +If your application is tenant-based, it is possible that a single tenant may grow so big that they may not fit in one shard. If so, it is likely that the application is using a different key that has a higher cardinality than the tenant id. + +The question to ask oneself is: if the tenant were a single application by themselves, what would be their sharding key, and then shard by that key instead of the tenant id. + +Vitess has started rolling out support for multi-column Vindexes. Once this feature is fully done, you will be able to shard by the tenant id and a secondary key. The two-column sharding approach will allow you to group all data for a given tenant into a smaller set of shards rather than a random distribution. This may be beneficial for security or compliance reasons, in case the tenant would want their data to be physically isolated from other tenants. + +### Region Sharding + +The Vitess multi-column Vindex feature also allows you to locate data associated with regions in different geographical locations. For more details, see [Region-based Sharding](../../configuration-advanced/region-sharding). + +### Many-to-Many relationships + +Sharding works well only if the relationship between data is hierarchical (one-to-one or one-to-many). If a table has foreign keys into multiple other tables, you have to choose one based on the strongest relationship and group the rows by that foreign key. The rest of the relationships will incur cross-shard overheads. + +If more than one relationship is critically strong, you can look at the [Materialization](../../../reference/vreplication/materialize) feature which allows you to create a materialized view of the table that is sharded using the other relationship. The application will write to the source, and the other view will be automatically updated. + +### Course Correction + +It may happen that the original sharding decision is not working out. It may also be possible that the application evolves in such a way that the sharding decision you previously made does not make sense any more. + +In such cases, the [MoveTables](../../migration/move-tables) feature can be used to change the sharding key of existing tables. + +Essentially, Vitess removes the fear of making the wrong sharding decision because you can always change your mind later. + +### Thumb Rule + +Although a Vitess shard can grow to many terabytes, we have seen that 250GB is the sweet spot. If your data size approaches this limit, it is time to think about splitting your data. diff --git a/content/en/docs/user-guides/vschema-guide/shared-vindexes.md b/content/en/docs/user-guides/vschema-guide/shared-vindexes.md new file mode 100644 index 000000000..2175b0781 --- /dev/null +++ b/content/en/docs/user-guides/vschema-guide/shared-vindexes.md @@ -0,0 +1,172 @@ +--- +title: Shared Vindexes and Foreign Keys +weight: 7 +--- + +Let us now look at creating the `corder` table that will contain orders placed by the customers. It will be beneficial to group the rows of the orders in the same shard as that of the customer that placed the orders. Doing things this way will allow for simpler join queries between `customer` and `corder`. There will also be transactional benefits: any transaction that also updates the customer row along with an order will be a single shard transaction. + +To make this happen in Vitess, all you have to do is specify that `corder.customer_id` uses the `hash` vindex, which is the same one used by `customer.customer_id`. + +This is one situation where a Primary Vindex conceptually differs from a traditional database Primary Key. Whereas a Primary Key makes a row unique, a Vitess Primary Vindex only yields a Unique value. But multiple rows with the same Primary Vindex value can exist. + +In other words, the Primary Vindex column need not be the primary key, or unique within MySQL. This is convenient for the `corder` table because we want customers to place multiple orders. In this case, all orders placed by a customer will have the same `customer_id`. The Primary Vindex for those will yield the same keyspace id as that of the customer. Therefore, all the rows for that customer’s orders will end up in the same shard along with the customer row. + +Since `corder` rows will need to have their own unique identifier, we also need to create a separate sequence for it in the product keyspace. + +```sql +create table corder_seq(id bigint, next_id bigint, cache bigint, primary key(id)) comment 'vitess_sequence'; +insert into corder_seq(id, next_id, cache) values(0, 1, 3); +``` + +VSchema: + +```json + "corder_seq": { "type": "sequence" } +``` + +We create the `corder` table as follows: + +```sql +create table corder(corder_id bigint, customer_id bigint, product_id bigint, oname varchar(128), primary key(corder_id)); +``` + +VSchema: +```json + "corder": { + "column_vindexes": [{ + "column": "customer_id", + "name": "hash" + }], + "auto_increment": { + "column": "corder_id", + "sequence": "product.corder_seq" + } + } +``` + +Alternate VSchema DDL: + +```sql +alter vschema on customer.corder add vindex hash(customer_id); +alter vschema add sequence product.corder_seq; +alter vschema on customer.corder add auto_increment corder_id using product.corder_seq; +``` + +Inserting into `corder` yields the following results: + +```text +mysql> insert into corder(customer_id, product_id, oname) values (1,1,'gift'),(1,2,'gift'),(2,1,'work'),(3,2,'personal'),(4,1,'personal'); +Query OK, 5 rows affected (0.03 sec) + +mysql> use `customer:-80`; +Database changed +mysql> select * from corder; ++-----------+-------------+------------+----------+ +| corder_id | customer_id | product_id | oname | ++-----------+-------------+------------+----------+ +| 1 | 1 | 1 | gift | +| 2 | 1 | 2 | gift | +| 3 | 2 | 1 | work | +| 4 | 3 | 2 | personal | ++-----------+-------------+------------+----------+ +4 rows in set (0.00 sec) + +mysql> use `customer:80-`; +Database changed +mysql> select * from corder; ++-----------+-------------+------------+----------+ +| corder_id | customer_id | product_id | oname | ++-----------+-------------+------------+----------+ +| 5 | 4 | 1 | personal | ++-----------+-------------+------------+----------+ +1 row in set (0.00 sec) +``` + +As expected, orders are created in the same shard as their customer. Selecting orders by their customer id goes to a single shard: + +```text +mysql> select * from corder where customer_id=1; ++-----------+-------------+------------+-------+ +| corder_id | customer_id | product_id | oname | ++-----------+-------------+------------+-------+ +| 1 | 1 | 1 | gift | +| 2 | 1 | 2 | gift | ++-----------+-------------+------------+-------+ +2 rows in set (0.00 sec) +``` + +Joining `corder` with `customer` also goes to a single shard. This is also referred to as a local join: + +```text +mysql> select c.uname, o.oname, o.product_id from customer c join corder o on c.customer_id = o.customer_id where c.customer_id=1; ++-------+-------+------------+ +| uname | oname | product_id | ++-------+-------+------------+ +| alice | gift | 1 | +| alice | gift | 2 | ++-------+-------+------------+ +2 rows in set (0.01 sec) +``` + +Performing the join without a `customer_id` constraint still results in a local join, but the query is scattered across all shards: + +```text +mysql> select c.uname, o.oname, o.product_id from customer c join corder o on c.customer_id = o.customer_id; ++---------+----------+------------+ +| uname | oname | product_id | ++---------+----------+------------+ +| alice | gift | 1 | +| alice | gift | 2 | +| bob | work | 1 | +| charlie | personal | 2 | +| dan | personal | 1 | ++---------+----------+------------+ +5 rows in set (0.00 sec) +``` + +However, adding a join with `product` results in a cross-shard join for the product part ot the query: + +```text +mysql> select c.uname, o.oname, p.pname from customer c join corder o on c.customer_id = o.customer_id join product p on o.product_id = p.product_id; ++---------+----------+----------+ +| uname | oname | pname | ++---------+----------+----------+ +| alice | gift | monitor | +| alice | gift | keyboard | +| bob | work | monitor | +| charlie | personal | keyboard | +| dan | personal | monitor | ++---------+----------+----------+ +5 rows in set (0.01 sec) +``` + +Although the underlying work performed by Vitess is not visible here, you can see it in the bottom right panel if using the demo app. Alternatively, you can also stream this information with the following command: + +```text +curl localhost:12345/debug/querylog +[verbose output not shown] +``` + +### Foreign Keys + +More generically stated: If a table has a foreign key into another table, then Vitess can ensure that the related rows live in the same shard by making them share a common Unique Vindex. + +In cases where you choose to group rows based on their foreign key relationships, you have the option to enforce those constraints within each shard at the MySQL level. You can also configure cascade deletes as needed. However, overuse of foreign key constraints is generally discouraged in MySQL. + +Foreign key constraints across shards or keyspaces are not supported in Vitess. For example, you cannot specify a foreign key between `corder.product_id` and `product.product_id`. + +### Many-to-Many relationships + +In the case where a table has relationships with multiple other tables, you can only choose one of those relationships for shard grouping. All other relationships will end up being cross-shard, and will incur cross-shard penalties. + +If a table has strong relationships with multiple other tables, and if performance becomes a challenge choosing either way, you can explore the [VReplication Materialization](../../../reference/vreplication/materialize) feature that allows you to materialize a table both ways. + +### Enforcing Uniqueness + +To enforce global uniqueness for a row in a sharded table, you have to have: +* A Unique Vindex on the column +* A Unique constraint at the MySQL level + +A Primary Vindex coupled with a Primary Key constraint makes a row globally unique. + +A Unique Vindex can also be specified for a non-unique column. In such cases, it is likely that you will be using that column in a where clause, and will require a secondary non-unique index on it at the MySQL level. diff --git a/content/en/docs/user-guides/vschema-guide/unique-lookup.md b/content/en/docs/user-guides/vschema-guide/unique-lookup.md new file mode 100644 index 000000000..63bcf1acd --- /dev/null +++ b/content/en/docs/user-guides/vschema-guide/unique-lookup.md @@ -0,0 +1,166 @@ +--- +title: Unique Lookup Vindexes +weight: 8 +--- + +Certain application features may require you to point-select orders by their id with a query like this: + +```sql +select * from corder where corder_id=1; +``` + +However, issuing this query to Vitess will cause it to scatter this query across all shards because there is no way to know which shard contains that order id. This would be inefficient if the QPS of this query or the number of shards is too high. + +Vitess supports the concept of lookup vindexes, also known as cross-shard indexes. You can instruct Vitess to create and manage a lookup vindex for the `corder_id` column. Such a vindex needs to maintain a mapping from `corder_id` to the `keyspace_id` of the row, which will be stored in a lookup table. + +This lookup table can be created in any keyspace, and it may or may not be sharded. In this particular case, we are going to create the table in the unsharded product keyspace even though the lookup vindex itself is going to be in the `customer` keyspace: + +```sql +create table corder_keyspace_idx(corder_id bigint not null auto_increment, keyspace_id varbinary(10), primary key(corder_id)); +``` + +The primary key is `corder_id`. The unique constraint on `corder_id` makes the Lookup Vindex unique: for a given `corder_id` as input, at most one `keyspace_id` can be produced. It is not necessary to name the column as `corder_id`, but it is less confusing to do so. + +Since the table is not sharded, we have a trivial VSchema addition: + +```json + "corder_keyspace_idx": {} +``` + +We can now instantiate the Lookup Vindex in the VSchema of the `customer` keyspace: + +```json + "corder_keyspace_idx": { + "type": "consistent_lookup_unique", + "params": { + "table": "product.corder_keyspace_idx", + "from": "corder_id", + "to": "keyspace_id" + }, + "owner": "corder" + } +``` + +* The vindex is given a distinctive name `corder_keyspace_idx` because of its specific input parameters. +* The vindex type is `consistent_lookup_unique`. We expect this lookup vindex to yield at most one keyspace id for a given input. The `consistent` qualifier is explained below. +* The `params` section of a Vindex is a set of key-value strings. Each vindex expects a different set of parameters depending on the implementation. A lookup vindex requires the following three parameters: + * `table` should be the name of the lookup table. It is recommended that it is fully qualified. + * The `from` and `to` fields must reference the column names of the lookup table. +* The `owner` field indicates that `corder` is responsible for populating the lookup table and keeping it up-to-date. This means that an insert into `corder` will result in a corresponding lookup row being inserted in the lookup table, etc. Lookup vindexes can also be shared, but they can have only one owner each. We will later see an example about how to share lookup vindexes. + +{{< info >}} +Since `corder_keyspace_idx` and `corder` are in different keyspaces, any change that affects the lookup column results in a distributed transaction between the `customer` shard and the `product` keyspace. Usually, a two-phase commit (2PC) protocol would need to be used for the distributed transaction. However, the `consistent` lookup vindex utilizes a special algorithm that orders the commits in such a way that a commit failure resulting in partial commits does not result in inconsistent data. This avoids the extra overheads associated with 2PC. +{{< /info >}} + +Finally, we must associate `customer.corder_id` with the lookup vindex: + +```json + "column_vindexes": [{ + "column": "customer_id", + "name": "hash" + }, { + "column": "corder_id", + "name": "corder_keyspace_idx" + }] +``` + +Note that `corder_id` comes after `customer_id` implying that `customer_id` is the Primary Vindex for this table. + +Alternate VSchema DDL: + +```sql +alter vschema add table product.corder_keyspace_idx; +alter vschema on customer.corder add vindex corder_keyspace_idx(corder_id) using consistent_lookup_unique with owner=`corder`, table=`product.corder_keyspace_idx`, from=`corder_id`, to=`keyspace_id`; +``` + +{{< info >}} +An owned lookup vindex (even if unique) cannot be a Primary Vindex because it creates an association against a keyspace id after one has been assigned to the row. The job of computing the keyspace id must therefore be performed by a different unique vindex. +{{< /info >}} + +Bringing up the demo application again, you can now see the lookup table being automatically populated when rows are inserted in `corder`: + +```text +mysql> insert into corder(customer_id, product_id, oname) values (1,1,'gift'),(1,2,'gift'),(2,1,'work'),(3,2,'personal'),(4,1,'personal'); +Query OK, 5 rows affected (0.00 sec) + +mysql> select corder_id, hex(keyspace_id) from corder_keyspace_idx; ++-----------+------------------+ +| corder_id | hex(keyspace_id) | ++-----------+------------------+ +| 1 | 166B40B44ABA4BD6 | +| 2 | 166B40B44ABA4BD6 | +| 3 | 06E7EA22CE92708F | +| 4 | 4EB190C9A2FA169C | +| 5 | D2FD8867D50D2DFE | ++-----------+------------------+ +5 rows in set (0.01 sec) +``` + +And then, issuing a query like `select * from corder where corder_id=1` results in two single-shard round-trips instead of a full scatter. + +### Reversible Vindexes + +Looking at the rows in `corder_keyspace_idx` reveals a few things. We get to now see actual keyspace id values that were previously invisible. We can also notice that two different inputs `1` and `2` map to the same keyspace id `166B40B44ABA4BD6`. In other words, a unique vindex does not necessarily guarantee that two different values yield different keyspace ids. In fact, this is derived from the fact that there are two order rows for customer id `1`. + +Vindexes that do have a one-to-one correspondence between the input value and keyspace id , like `hash`, are known as reversible vindexes: Given a keyspace id, the input value can be back-computed. This property will be used in a later example. + +### Backfill + +Creating a lookup vindex after the main table already contains rows does not automatically backfill the lookup table for the existing entries. Only newer inserts cause automatic population of the lookup table. This backfill can be set up using the `CreateLookupVindex` command covered below. + +### Checklist + +Creating a unique lookup Vindex is an elaborate process. It is good to use the following checklist if this is done manually: +* Create the lookup table as sharded or unsharded. Make the `from` column the primary key. +* Create a VSchema entry for the lookup table. If sharded, assign a Primary Vindex for the `from` column. +* Create the lookup vindex in the VSchema of the sharded keyspace: + * Give it a distinct name + * Specify the type as `consistent_lookup_unique` + * Under `params`: specify the properties of the lookup table + * Specify the `Owner` as the main table +* Associate the column of the owner table with the new Vindex. + +### CreateLookupVindex command + +vtctld supports the [CreateLookupVindex](../../configuration-advanced/createlookupvindex) command that can perform all the above steps as well as the backfill. + +{{< warning >}} +This will not work against the `vtcombo` based demo app because it does not support vreplication. You can only try this against a real Vitess cluster. +{{< /warning >}} + +Save the following json into a file, say `corder_keyspace_idx.json`: + +```json +{ + "sharded": true, + "vindexes": { + "corder_keyspace_idx": { + "type": "consistent_lookup_unique", + "params": { + "table": "product.corder_keyspace_idx", + "from": "corder_id", + "to": "keyspace_id" + }, + "owner": "corder" + } + }, + "tables": { + "corder": { + "column_vindexes": [{ + "column": "corder_id", + "name": "corder_keyspace_idx" + }], + } + } +} +``` + +And issue the vtctlclient command: + +```sh +$ vtctlclient -server CreateLookupVindex -tablet_types=REPLICA customer "$(cat corder_keyspace_idx.json)" +``` + +The workflow automatically infers the schema and vschema for the lookup table and creates it. It also sets up the necessary VReplication streams to backfill the lookup table. + +After the backfill is done, you should clean up the workflow. More detailed instructions are available in the [CreateLookupVindex Reference](../../configuration-advanced/createlookupvindex) diff --git a/content/en/docs/user-guides/vschema-guide/unsharded.md b/content/en/docs/user-guides/vschema-guide/unsharded.md new file mode 100644 index 000000000..299c51bb4 --- /dev/null +++ b/content/en/docs/user-guides/vschema-guide/unsharded.md @@ -0,0 +1,85 @@ +--- +title: Unsharded Keyspace +weight: 4 +--- + +We are going to start with configuring the `product` table in the unsharded keyspace `product`. The schema file should be as follows: + +```sql +create table product(product_id bigint auto_increment, pname varchar(128), primary key(product_id)); +``` + +`product_id` is the primary key for product, and it is also configured to use MySQL’s `auto_increment` feature that allows you to automatically generate unique values for it. + +We also need to create a VSchema for the `product` keyspace and specify that `product` is a table in the keyspace: + +```json +{ + "sharded": false, + "tables": { + "product": {} + } +} +``` + +The json states that the keyspace is not sharded. The product table is specified in the “tables” section of the json. This is because there are other sections that we will introduce later. + +For unsharded keyspaces, no additional metadata is needed for regular tables. So, their entry is empty. + +Alternate VSchema DDL: + +```sql +alter vschema add table product.product; +``` + +{{< info >}} +If `product` is the only keyspace in the cluster, a vschema is unnecessary. Vitess treats single keyspace clusters as a special case and optimistically forwards all queries to that keyspace even if there is no table metadata present in the vschema. But it is best practice to provide a full vschema to avoid future complications. +{{< /info >}} + +Bringing up the cluster will allow you to access the `product` table. You can now insert rows into the table: + +```text +$ mysql -h 127.0.0.1 -P 12348 +[snip] +mysql> insert into product(pname) values ('monitor'), ('keyboard'); +Query OK, 2 rows affected (0.00 sec) + +mysql> select * from product; ++------------+----------+ +| product_id | pname | ++------------+----------+ +| 1 | monitor | +| 2 | keyboard | ++------------+----------+ +2 rows in set (0.00 sec) +``` +The insert does not specify values for `product_id`, because we are relying on MySQL’s `auto_increment` feature to populate it. + +You will notice that we did not connect to the `product` database or issue a `use` statement to select it. This is the ‘unspecified’ mode supported by Vitess. As long as a table name can be uniquely identified from the vschemas, Vitess will automatically direct the query to the correct keyspace. + +You can also connect or specify keyspaces as if they were MySQL databases. The following constructs are valid: + +```text +mysql> select * from product.product; ++------------+----------+ +| product_id | pname | ++------------+----------+ +| 1 | monitor | +| 2 | keyboard | ++------------+----------+ +2 rows in set (0.00 sec) + +mysql> use product; +Reading table information for completion of table and column names +You can turn off this feature to get a quicker startup with -A + +Database changed +mysql> select * from product; ++------------+----------+ +| product_id | pname | ++------------+----------+ +| 1 | monitor | +| 2 | keyboard | ++------------+----------+ +2 rows in set (0.01 sec) +```