Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions content/en/docs/user-guides/configuration-advanced/_index.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: Advanced Configuration
description: User guides covering advanced configuration concepts
weight: 3
---
weight: 5
---
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ aliases: ['/docs/user-guides/createlookupvindex/']
---

{{< info >}}
This guide follows on from the Get Started guides. Please make sure that you have an [Operator](../../../get-started/operator), [local](../../../get-started/local) or [Helm](../../../get-started/helm) installation ready. Make sure you are at the point where you have the sharded keyspace called `customer` setup.
This guide follows on from the Get Started guides. Please make sure that you have an [Operator](../../../get-started/operator) or [local](../../../get-started/local) installation ready. Make sure you are at the point where you have the sharded keyspace called `customer` setup.
{{< /info >}}

**CreateLookupVindex** is a new VReplication workflow in Vitess 6. It is used to create **and** backfill a lookup Vindex automatically for a table that already exists, and may have a significant amount of data in it already.
Expand Down Expand Up @@ -324,18 +324,13 @@ mysql> select sku, hex(keyspace_id) from corder_lookup;
+-----------+------------------+
```

Basically, this shows exactly what we expected. Now, we can clean up the
VReplication streams. Note these commands will clean up all VReplication
streams on these tablets. You may want to filter by `id` if there are other
streams running:
Basically, this shows exactly what we expected. Now, we have to clean-up
the artifacts of the backfill. The `ExternalizeVindex` command will delete
the vreplication streams and also clear the `write_only` flag from the
vindex indicating that it is not backfilling any more.

```sh
$ vtctlclient -server localhost:15999 VReplicationExec zone1-0000000300 "delete from _vt.vreplication"
+
+
$ vtctlclient -server localhost:15999 VReplicationExec zone1-0000000400 "delete from _vt.vreplication"
+
+
$ vtctlclient -server localhost:15999 ExternalizeVindex customer.corder_lookup
```

Next, to confirm the lookup Vindex is doing what we think it should, we can
Expand Down Expand Up @@ -475,3 +470,6 @@ mysql> select sku, hex(keyspace_id) from corder_lookup;
We added a new row to the `corder` table, and now we have a new row in the
lookup table.

### ExternalizeVindex

Once the backfill is done,
4 changes: 2 additions & 2 deletions content/en/docs/user-guides/configuration-basic/_index.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: Configuration
description: User guides covering basic configuration concepts
weight: 1
---
weight: 2
---
4 changes: 2 additions & 2 deletions content/en/docs/user-guides/migration/_index.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: Migration
description: User guides covering migration to Vitess
weight: 2
---
weight: 3
---
4 changes: 2 additions & 2 deletions content/en/docs/user-guides/operating-vitess/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,5 @@
title: Operational
description: User guides for covering operational aspects of Vitess
description: User guides covering operational aspects of Vitess
weight: 4
---
weight: 5
---
4 changes: 2 additions & 2 deletions content/en/docs/user-guides/sql/_index.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: SQL Statement Analysis
description: User guides covering analyzing SQL statements
weight: 3
---
weight: 4
---
5 changes: 5 additions & 0 deletions content/en/docs/user-guides/vschema-guide/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
title: VSchema and Query Serving
description: Configuring VSchema for serving queries
weight: 1
---
135 changes: 135 additions & 0 deletions content/en/docs/user-guides/vschema-guide/advanced-vschema.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
---
title: Advanced VSchema Properties
weight: 11
---

With the exception of Multi-Column Vindexes, advanced VSchema Properties do not have DDL constructs. They can only be updated through `vtctld` CLI commands.

## Multi-Column Vindexes

Multi-Column Vindexes are useful in the following two use cases:

* Grouping customers by their regions so they can be hosted in specific geographical locations. This may be required for compliance, and also to achieve better performance.
* For a multi-tenant system, grouping all rows of a tenant in a separate set of shards. This limits the fan out of queries if searching only for rows that are related to a single tenant.

In both cases the leading column is the region or tenant, and is used to form the first few bits of the `keyspace_id`. The second column is used for the bits that follow. Since Vitess shards by keyrange, this approach will naturally group all rows of a region or tenant within the same shard, or within a group of consecutive shards. Since each shard is its own MySQL cluster, these can then be deployed to different regions as needed.

Please refer to [Region-based Sharding](../../configuration-advanced/region-sharding) for an example on how to use the `region_json` vindex.

Currently, the Vindex gets used for assigning a `keyspace_id` at the time of insert and at the time of resharding. Additional vindexes need to be added to the table for routing query constructs that contain WHERE clauses.

Vitess does not have the capability to route a query based on multiple values of a multi-column vindex in a where clause yet. This feature will be added soon.

#### Alternate approach

You have the option to pre-combine the region and id bits into a single column and use that as an input for a single column vindex. This approach achieves the same goals as a multi-column vindex. Moreover, you avoid having to define additional vindexes for query routing.

The downside of this approach is that it is harder to migrate an id to a different region.

## Reference Tables

Sharded databases often need the ability to join their tables with smaller “reference” tables. For example, the `product` table could be seen as a reference table. Other use cases are tables that map static information like zipcode to city, etc.

Joining against these tables across keyspaces results in cross-shard joins that may not be very efficient or fast.

Vitess allows you to create a table in a sharded keyspace as a reference table. This means that it will treat the table as having an identical set of rows across all shards. A query that joins a sharded table against such reference tables is then performed locally within each shard.

A reference table should not have any vindex, and is defined in the VSchema as a reference type:

```json
{
"sharded": true,
"tables": {
"zip_detail": { "type": "reference" }
}
}
```

It may become a challenge to keep a reference table correctly updated across all shards. Vitess supports the [Materialize](../../migration/materialize) feature that allows you to maintain the original table in an unsharded keyspace and automatically propagate changes to that table in real-time across all shards.

## Column List

The VSchema allows you to specify the list of columns along with their types for every table. This allows Vitess to make optimization decisions where necessary.

For example, specifying that a column contains text allows VTGate to request further collation specific information (`weight_string`) if additional sorting is needed after collecting results from all shards.

For example, issuing this query against `customer` would fail:

```text
mysql> select customer_id, uname from customer order by uname;
ERROR 1105 (HY000): vtgate: http://sougou-lap1:12345/: types are not comparable: VARCHAR vs VARCHAR
```

However, we can modify the VSchema as follows:

```json
"customer": {
"column_vindexes": [{
"column": "customer_id",
"name": "hash"
}],
"auto_increment": {
"column": "customer_id",
"sequence": "product.customer_seq"
},
"columns": [{
"name": "uname",
"type": "VARCHAR"
}]
}
```

Re-issuing the same query will now succeed:

```text
mysql> select customer_id, uname from customer order by uname;
+-------------+---------+
| customer_id | uname |
+-------------+---------+
| 1 | alice |
| 2 | bob |
| 3 | charlie |
| 4 | dan |
| 5 | eve |
+-------------+---------+
5 rows in set (0.00 sec)
```

Specifying columns against tables also allows VTGate to resolve ambiguous naming of columns against the right tables.

#### Authoritative List

If you have listed all columns of a table in the VSchema, you can add the `column_list_authoritative` flag to the table:

```json
"customer": {
"column_vindexes": [{
"column": "customer_id",
"name": "hash"
}],
"auto_increment": {
"column": "customer_id",
"sequence": "product.customer_seq"
},
"columns": [{
"name": "uname",
"type": "VARCHAR"
}],
"column_list_authoritative": true
}
```

This flag causes VTGate to automatically expand expressions like `select *` or insert statements that don’t specify the column list.

The caveat about using this feature is that you have to keep this column list in sync with the underlying schema.

In the future, Vitess will allow you to pull this information from the vttablets and automatically keep it up-to-date.

## Routing Rules

Routing Rules are an advanced method of redirecting queries meant for one table to another. They are just pointers and are analogous to symbolic links in a file system. You should generally not have to use routing rules in Vitess.

Workflows like `MoveTables` make use of routing rules to create the existence of the target tables and manage traffic switch from source to target by manipulating these routing rules.

For more information, please refer to the [Routing Rules](../../../reference/features/schema-routing-rules) section.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
146 changes: 146 additions & 0 deletions content/en/docs/user-guides/vschema-guide/lookup-as-primary.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
---
title: Lookup as Primary Vindex
weight: 10
---

It is likely that a customer order goes through a life cycle of events. This would best be represented in a separate `corder_event` table that will contain a `corder_id` column as a foreign key into `corder.corder_id`. It would also be beneficial to co-locate the event rows with their associated order.

Just like we shared the `hash` vindex between `customer` and `corder`, we can share `corder_keyspace_idx` between `corder` and `corder_event`. We can also make it the Primary Vindex for `corder_event`. When an order is created, the lookup row for it is also created. Subsequently, an insert into `corder_event` will request the vindex to compute the `keyspace_id` for that `corder_id`, and that will succeed because the lookup entry for it already exists. This is where the significance of the owner table comes into play: The owner table creates the entries, whereas other tables only read those entries.

Inserting a `corder_event` row without creating a corresponding `corder` entry will result in an error. This behavior is in line with the traditional foreign key constraint enforced by relational databases.

Sharing the lookup vindex also has the additional benefit of saving space because we avoid creating separate entries for the new table.

We start with creating the sequence table in the `product` keyspace.

Schema:

```sql
create table corder_event_seq(id bigint, next_id bigint, cache bigint, primary key(id)) comment 'vitess_sequence';
insert into corder_event_seq(id, next_id, cache) values(0, 1, 3);
```

VSchema:

```json
"corder_event_seq": { "type": "sequence" }
```

We then create the `corder_event` table in `customer`:

```sql
create table corder_event(corder_event_id bigint, corder_id bigint, ename varchar(128), primary key(corder_id, corder_event_id));
```

In the VSchema, there is no need to create a vindex because we are going to reuse the existing one:

```json
"corder_event": {
"column_vindexes": [{
"column": "corder_id",
"name": "corder_keyspace_idx"
}],
"auto_increment": {
"column": "corder_event_id",
"sequence": "product.corder_event_seq"
}
}
```

Alternate VSchema DDL:

```sql
alter vschema add sequence product.corder_event_seq;
alter vschema on customer.corder_event add vindex corder_keyspace_idx(corder_id);
alter vschema on customer.corder_event add auto_increment corder_event_id using product.corder_event_seq;
```

We can now insert rows in `corder_event` against rows in `corder`:

```text
mysql> insert into corder(customer_id, product_id, oname) values (1,1,'gift'),(1,2,'gift'),(2,1,'work'),(3,2,'personal'),(4,1,'personal');
Query OK, 5 rows affected (0.04 sec)

mysql> insert into corder_event(corder_id, ename) values(1, 'paid'), (5, 'delivered');
Query OK, 2 rows affected (0.01 sec)

mysql> insert into corder_event(corder_id, ename) values(6, 'expect failure');
ERROR 1105 (HY000): vtgate: http://sougou-lap1:12345/: execInsertSharded: getInsertShardedRoute: could not map [INT64(6)] to a keyspace id
```

As expected, inserting a row for a non-existent order results in an error.

### Reversible Vindexes

In Vitess, it is insufficient for a table to only have a Lookup Vindex. This is because it is not practical to reshard such a table. The overhead of performing a lookup before redirecting every row event to a new shard would be prohibitively expensive.

To overcome this limitation, we must add a column with a non-lookup vindex, also known as Functional Vindex to the table. By rule, the Primary Vindex computes the keyspace id of the row. This means that the value of the column should also be such that it yields the same keyspace id.

A Reversible Vindex is one that can back-compute the column value from a given keyspace id. If such a vindex is used for this new column, then Vitess will automatically perform this work and fill the correct value for it. The list of vindex properties, like Functional, Reversible, etc. are listed in the [Vindexes Reference](../../../features/vindexes).

In other words, adding a column with a vindex that is both Functional and Reversible allows Vitess to auto-fill the values, thereby avoiding any impact to the application logic.

The `binary` vindex is one that yields the input value itself as the `keyspace_id`, and is naturally reversible. Using this Vindex will generate the `keyspace_id` as the column value. The modified schema for the table will be as follows:

```sql
create table corder_event(corder_event_id bigint, corder_id bigint, ename varchar(128), keyspace_id varbinary(10), primary key(corder_id, corder_event_id));
```

We create a vindex instantiation for `binary`:

```json
"binary": {
"type": "binary"
}
```

Modify the table VSchema:

```json
"corder_event": {
"column_vindexes": [{
"column": "corder_id",
"name": "corder_keyspace_idx"
}, {
"column": "keyspace_id",
"name": "binary"
}],
"auto_increment": {
"column": "corder_event_id",
"sequence": "product.corder_event_seq"
}
}
```

Alternate VSchema DDL:

```sql
alter vschema on customer.corder_event add vindex `binary`(keyspace_id) using `binary`;
```

Note that `binary` needs to be backticked because it is a keyword.

After these modifications, we can now observe that the `keyspace_id` column is getting automatically populated:

```text
mysql> insert into corder(customer_id, product_id, oname) values (1,1,'gift'),(1,2,'gift'),(2,1,'work'),(3,2,'personal'),(4,1,'personal');
Query OK, 5 rows affected (0.01 sec)

mysql> insert into corder_event(corder_id, ename) values(1, 'paid'), (5, 'delivered');
Query OK, 2 rows affected (0.01 sec)

mysql> select corder_event_id, corder_id, ename, hex(keyspace_id) from corder_event;
+-----------------+-----------+-----------+------------------+
| corder_event_id | corder_id | ename | hex(keyspace_id) |
+-----------------+-----------+-----------+------------------+
| 1 | 1 | paid | 166B40B44ABA4BD6 |
| 2 | 5 | delivered | D2FD8867D50D2DFE |
+-----------------+-----------+-----------+------------------+
2 rows in set (0.00 sec)
```

There is no support for backfilling the reversible vindex column yet. This will be added soon.

{{< info >}}
The original `keyspace_id` for all these rows came from `customer_id`. Since `hash` is also a reversible vindex, reversing the `keyspace_id` using `hash` will yield the `customer_id`. We could instead leverage this knowledge to replace `keyspace_id+binary` with `customer_id+hash`. Vitess will auto-populate the correct value. Using this approach may be more beneficial because `customer_id` is a value the application can understand and make use of.
{{< /info >}}
Loading