qiancai · qiancai · Nov 1, 2024 · Nov 1, 2024 · Nov 1, 2024 · Nov 1, 2024
diff --git a/.coderabbit.yaml b/.coderabbit.yaml
@@ -0,0 +1,178 @@
+---
+title: AUTO_RANDOM
+summary: Learn the AUTO_RANDOM attribute.
+aliases: ['/docs/dev/auto-random/','/docs/dev/reference/sql/attributes/auto-random/']
+---
+
+# AUTO_RANDOM <span class="version-mark">New in v3.1.0</span>
+
+## User scenario
+
+Since the value of `AUTO_RANDOM` is random and unique, `AUTO_RANDOM` is often used in place of [`AUTO_INCREMENT`](/auto-increment.md) to avoid write hotspot in a single storage node caused by TiDB assigning consecutive IDs. If the current `AUTO_INCREMENT` column is a primary key and the type is `BIGINT`, you can execute the `ALTER TABLE t MODIFY COLUMN id BIGINT AUTO_RANDOM(5);` statement to switch from `AUTO_INCREMENT` to `AUTO_RANDOM`.
+
+<CustomContent platform="tidb">
+
+For more information about how to handle highly concurrent write-heavy workloads in TiDB, see [Highly concurrent write best practices](/best-practices/high-concurrency-best-practices.md).
+
+</CustomContent>
+
+The `AUTO_RANDOM_BASE` parameter in the [CREATE TABLE](/sql-statements/sql-statement-create-table.md) statement is used to set the initial incremental part value of `auto_random`. This option can be considered as a part of the internal interface. You can ignore this parameter.
+
+## Basic concepts
+
+`AUTO_RANDOM` is a column attribute that is used to automatically assign values to a `BIGINT` column. Values assigned automatically are **random** and **unique**.
+
+To create a table with an `AUTO_RANDOM` column, you can use the following statements. The `AUTO_RANDOM` column must be included in a primary key, and the `AUTO_RANDOM` column is the first column in the primary key.
+
+```sql
+CREATE TABLE t (a BIGINT AUTO_RANDOM, b VARCHAR(255), PRIMARY KEY (a));
+CREATE TABLE t (a BIGINT PRIMARY KEY AUTO_RANDOM, b VARCHAR(255));
+CREATE TABLE t (a BIGINT AUTO_RANDOM(6), b VARCHAR(255), PRIMARY KEY (a));
+CREATE TABLE t (a BIGINT AUTO_RANDOM(5, 54), b VARCHAR(255), PRIMARY KEY (a));
+CREATE TABLE t (a BIGINT AUTO_RANDOM(5, 54), b VARCHAR(255), PRIMARY KEY (a, b));
+```
+
+You can wrap the keyword `AUTO_RANDOM` in an executable comment. For more details, refer to [TiDB specific comment syntax](/comment-syntax.md#tidb-specific-comment-syntax).
+
+```sql
+CREATE TABLE t (a bigint /*T![auto_rand] AUTO_RANDOM */, b VARCHAR(255), PRIMARY KEY (a));
+CREATE TABLE t (a bigint PRIMARY KEY /*T![auto_rand] AUTO_RANDOM */, b VARCHAR(255));
+CREATE TABLE t (a BIGINT /*T![auto_rand] AUTO_RANDOM(6) */, b VARCHAR(255), PRIMARY KEY (a));
+CREATE TABLE t (a BIGINT  /*T![auto_rand] AUTO_RANDOM(5, 54) */, b VARCHAR(255), PRIMARY KEY (a));
+```
+
+When you execute an `INSERT` statement:
+
+- If you explicitly specify the value of the `AUTO_RANDOM` column, it is inserted into the table as is.
+- If you do not explicitly specify the value of the `AUTO_RANDOM` column, TiDB generates a random value and inserts it into the table.
+
+```sql
+tidb> CREATE TABLE t (a BIGINT PRIMARY KEY AUTO_RANDOM, b VARCHAR(255)) /*T! PRE_SPLIT_REGIONS=2 */ ;
+Query OK, 0 rows affected, 1 warning (0.01 sec)
+
+tidb> INSERT INTO t(a, b) VALUES (1, 'string');
+Query OK, 1 row affected (0.00 sec)
+
+tidb> SELECT * FROM t;
++---+--------+
+| a | b      |
++---+--------+
+| 1 | string |
++---+--------+
+1 row in set (0.01 sec)
+
+tidb> INSERT INTO t(b) VALUES ('string2');
+Query OK, 1 row affected (0.00 sec)
+
+tidb> INSERT INTO t(b) VALUES ('string3');
+Query OK, 1 row affected (0.00 sec)
+
+tidb> SELECT * FROM t;
++---------------------+---------+
+| a                   | b       |
++---------------------+---------+
+|                   1 | string  |
+| 1152921504606846978 | string2 |
+| 4899916394579099651 | string3 |
++---------------------+---------+
+3 rows in set (0.00 sec)
+
+tidb> SHOW CREATE TABLE t;
++-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| Table | Create Table                                                                                                                                                                                                                                                    |
++-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| t     | CREATE TABLE `t` (
+  `a` bigint(20) NOT NULL /*T![auto_rand] AUTO_RANDOM(5) */,
+  `b` varchar(255) DEFAULT NULL,
+  PRIMARY KEY (`a`) /*T![clustered_index] CLUSTERED */
+) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin /*T! PRE_SPLIT_REGIONS=2 */ |
++-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+1 row in set (0.00 sec)
+
+tidb> SHOW TABLE t REGIONS;
++-----------+-----------------------------+-----------------------------+-----------+-----------------+---------------------+------------+---------------+------------+----------------------+------------------+------------------------+------------------+
+| REGION_ID | START_KEY                   | END_KEY                     | LEADER_ID | LEADER_STORE_ID | PEERS               | SCATTERING | WRITTEN_BYTES | READ_BYTES | APPROXIMATE_SIZE(MB) | APPROXIMATE_KEYS | SCHEDULING_CONSTRAINTS | SCHEDULING_STATE |
++-----------+-----------------------------+-----------------------------+-----------+-----------------+---------------------+------------+---------------+------------+----------------------+------------------+------------------------+------------------+
+|     62798 | t_158_                      | t_158_r_2305843009213693952 |     62810 |              28 | 62811, 62812, 62810 |          0 |           151 |          0 |                    1 |                0 |                        |                  |
+|     62802 | t_158_r_2305843009213693952 | t_158_r_4611686018427387904 |     62803 |               1 | 62803, 62804, 62805 |          0 |            39 |          0 |                    1 |                0 |                        |                  |
+|     62806 | t_158_r_4611686018427387904 | t_158_r_6917529027641081856 |     62813 |               4 | 62813, 62814, 62815 |          0 |           160 |          0 |                    1 |                0 |                        |                  |
+|      9289 | t_158_r_6917529027641081856 | 78000000                    |     48268 |               1 | 48268, 58951, 62791 |          0 |         10628 |      43639 |                    2 |             7999 |                        |                  |
++-----------+-----------------------------+-----------------------------+-----------+-----------------+---------------------+------------+---------------+------------+----------------------+------------------+------------------------+------------------+
+4 rows in set (0.00 sec)
+```
+
+The `AUTO_RANDOM(S, R)` column value automatically assigned by TiDB has a total of 64 bits:
+
+- `S` is the number of shard bits. The value ranges from `1` to `15`. The default value is `5`.
+- `R` is the total length of the automatic allocation range. The value ranges from `32` to `64`. The default value is `64`.
+
+The structure of an `AUTO_RANDOM` value with a signed bit is as follows:
+
+| Signed bit | Reserved bits | Shard bits | Auto-increment bits |
+|---------|-------------|--------|--------------|
+| 1 bit | `64-R` bits | `S` bits | `R-1-S` bits |
+
+The structure of an `AUTO_RANDOM` value without a signed bit is as follows:
+
+| Reserved bits | Shard bits | Auto-increment bits |
+|-------------|--------|--------------|
+| `64-R` bits | `S` bits | `R-S` bits |
+
+- Whether a value has a signed bit depends on whether the corresponding column has the `UNSIGNED` attribute.
+- The length of the sign bit is determined by the existence of an `UNSIGNED` attribute. If there is an `UNSIGNED` attribute, the length is `0`. Otherwise, the length is `1`.
+- The length of the reserved bits is `64-R`. The reserved bits are always `0`.
+- The content of the shard bits is obtained by calculating the hash value of the starting time of the current transaction. To use a different length of shard bits (such as 10), you can specify `AUTO_RANDOM(10)` when creating the table.
+- The value of the auto-increment bits is stored in the storage engine and allocated sequentially. Each time a new value is allocated, the value is incremented by 1. The auto-increment bits ensure that the values of `AUTO_RANDOM` are unique globally. When the auto-increment bits are exhausted, an error `Failed to read auto-increment value from storage engine` is reported when the value is allocated again.
+- Value range: the maximum number of bits for the final generated value = shard bits + auto-increment bits. The range of a signed column is `[-(2^(R-1))+1, (2^(R-1))-1]`, and the range of an unsigned column is `[0, (2^R)-1]`.
+- You can use `AUTO_RANDOM` with `PRE_SPLIT_REGIONS`. When a table is created successfully, `PRE_SPLIT_REGIONS` pre-splits data in the table into the number of Regions as specified by `2^(PRE_SPLIT_REGIONS)`.
+
+> **Note:**
+>
+> Selection of shard bits (`S`):
+>
+> - Since there is a total of 64 available bits, the shard bits length affects the auto-increment bits length. That is, as the shard bits length increases, the length of auto-increment bits decreases, and vice versa. Therefore, you need to balance the randomness of allocated values and available space.
+> - The best practice is to set the shard bits as `log(2, x)`, in which `x` is the current number of storage engines. For example, if there are 16 TiKV nodes in a TiDB cluster, you can set the shard bits as `log(2, 16)`, that is `4`. After all regions are evenly scheduled to each TiKV node, the load of bulk writes can be uniformly distributed to different TiKV nodes to maximize resource utilization.
+>
+> Selection of range (`R`):
+>
+> - Typically, the `R` parameter needs to be set when the numeric type of the application cannot represent a full 64-bit integer.
+> - For example, the range of JSON number is `[-(2^53)+1, (2^53)-1]`. TiDB can easily assign an integer beyond this range to a column defined as `AUTO_RANDOM(5)`, causing unexpected behaviors when the application reads the column. In such cases, you can replace `AUTO_RANDOM(5)` with `AUTO_RANDOM(5, 54)` for signed columns, and replace `AUTO_RANDOM(5)` with `AUTO_RANDOM(5, 53)` for unsigned columns, ensuring that TiDB does not assign integers greater than `9007199254740991` (2^53-1) to the column.
+
+Values allocated implicitly to the `AUTO_RANDOM` column affect `last_insert_id()`. To get the ID that TiDB last implicitly allocates, you can use the `SELECT last_insert_id ()` statement.
+
+To view the shard bits number of the table with an `AUTO_RANDOM` column, you can execute the `SHOW CREATE TABLE` statement. You can also see the value of the `PK_AUTO_RANDOM_BITS=x` mode in the `TIDB_ROW_ID_SHARDING_INFO` column in the `information_schema.tables` system table. `x` is the number of shard bits.
+
+After creating a table with an `AUTO_RANDOM` column, you can use `SHOW WARNINGS` to view the maximum implicit allocation times:
+
+```sql
+CREATE TABLE t (a BIGINT AUTO_RANDOM, b VARCHAR(255), PRIMARY KEY (a));
+SHOW WARNINGS;
+```
+
+The output is as follows:
+
+```sql
++-------+------+---------------------------------------------------------+
+| Level | Code | Message                                                 |
++-------+------+---------------------------------------------------------+
+| Note  | 1105 | Available implicit allocation times: 288230376151711743 |
++-------+------+---------------------------------------------------------+
+1 row in set (0.00 sec)
+```
+
+## Implicit allocation rules of IDs
+
+TiDB implicitly allocates values to `AUTO_RANDOM` columns similarly to `AUTO_INCREMENT` columns. They are also controlled by the session-level system variables [`auto_increment_increment`](/system-variables.md#auto_increment_increment) and [`auto_increment_offset`](/system-variables.md#auto_increment_offset). The auto-increment bits (ID) of implicitly allocated values conform to the equation `(ID - auto_increment_offset) % auto_increment_increment == 0`.
+
+## Restrictions
+
+Pay attention to the following restrictions when you use `AUTO_RANDOM`:
+
+- To insert values explicitly, you need to set the value of the `@@allow_auto_random_explicit_insert` system variable to `1` (`0` by default). It is **not** recommended that you explicitly specify a value for the column with the `AUTO_RANDOM` attribute when you insert data. Otherwise, the numeral values that can be automatically allocated for this table might be used up in advance.
+- Specify this attribute for the primary key column **ONLY** as the `BIGINT` type. Otherwise, an error occurs. In addition, when the attribute of the primary key is `NONCLUSTERED`, `AUTO_RANDOM` is not supported even on the integer primary key. For more details about the primary key of the `CLUSTERED` type, refer to [clustered index](/clustered-indexes.md).
+- You cannot use `ALTER TABLE` to modify the `AUTO_RANDOM` attribute, including adding or removing this attribute.
+- You cannot use `ALTER TABLE` to change from `AUTO_INCREMENT` to `AUTO_RANDOM` if the maximum value is close to the maximum value of the column type.
+- You cannot change the column type of the primary key column that is specified with `AUTO_RANDOM` attribute.
+- You cannot specify `AUTO_RANDOM` and `AUTO_INCREMENT` for the same column at the same time.
+- You cannot specify `AUTO_RANDOM` and `DEFAULT` (the default value of a column) for the same column at the same time.
+- When`AUTO_RANDOM` is used on a column, it is difficult to change the column attribute back to `AUTO_INCREMENT` because the auto-generated values might be very large.
diff --git a/glossary.md b/glossary.md
@@ -30,6 +30,10 @@ Batch Create Table is a feature introduced in TiDB v6.0.0. This feature is enabl
 
 Baseline Capturing captures queries that meet capturing conditions and create bindings for them. It is used for [preventing regression of execution plans during an upgrade](/sql-plan-management.md#prevent-regression-of-execution-plans-during-an-upgrade).
 
+### BR
+
+BR is the Backup and Restore tool for TiDB. For more information, see [BR Overview](/br/backup-and-restore-overview.md).
+
 ### Bucket
 
 A [Region](#regionpeerraft-group) is logically divided into several small ranges called bucket. TiKV collects query statistics by buckets and reports the bucket status to PD. For details, see the [Bucket design doc](https://github.com/tikv/rfcs/blob/master/text/0082-dynamic-size-region.md#bucket).
@@ -40,6 +44,10 @@ A [Region](#regionpeerraft-group) is logically divided into several small ranges
 
 With the cached table feature, TiDB loads the data of an entire table into the memory of the TiDB server, and TiDB directly gets the table data from the memory without accessing TiKV, which improves the read performance.
 
+### CF
+
+In RocksDB and TiKV, a Column Family (CF) represents a logical grouping of key-value pairs within a database.
+
 ### Coalesce Partition
 
 Coalesce Partition is a way of decreasing the number of partitions in a Hash or Key partitioned table. For more information, see [Manage Hash and Key partitions](/partitioned-table.md#manage-hash-and-key-partitions).
@@ -48,12 +56,80 @@ Coalesce Partition is a way of decreasing the number of partitions in a Hash or
 
 Introduced in TiDB 5.3.0, Continuous Profiling is a way to observe resource overhead at the system call level. With the support of Continuous Profiling, TiDB provides performance insight as clear as directly looking into the database source code, and helps R&D and operation and maintenance personnel to locate the root cause of performance problems using a flame graph. For details, see [TiDB Dashboard Instance Profiling - Continuous Profiling](/dashboard/continuous-profiling.md).
 
+### CTE
+
+A Common Table Expression (CTE) is a feature in the SQL standard that allows for defining a temporary result set using the [`WITH`](/sql-statements/sql-statement-with.md) clause.
+
 ## D
 
+### DDL
+
+Data Definition Language (DDL) is the part of the SQL standard that deals with creating, modifying, and dropping tables, indexes, columns and other database objects.
+
+### DM
+
+Data Migration (DM) is a tool for migrating data from MySQL-compatible databases into TiDB. It reads data from a MySQL source instance and applies it to a TiDB target instance.
+
+For more information, see [DM Overview](/dm/dm-overview.md).
+
+### DML
+
+Data Modification Language (DML) is a subset of the SQL standard that deals with inserting, updating, and deleting rows in tables.
+
+### DMR
+
+Development Milestone Release (DMR) is a TiDB version that introduces the latest features but does not offer long-term support.
+
+For more information, see [TiDB Versioning](/releases/versioning.md).
+
+### DR
+
+Disaster Recovery (DR) includes solutions that can be used to recover data from a disaster in the future. These solutions typically involve backups and standby clusters.
+
+For more information, see [Overview of TiDB Disaster Recovery Solutions](dr-solution-introduction).
+
+### DXF
+
+Distributed eXecution Framework (DXF) is the framework used by TiDB for accelerating index creation and data import by distributing tasks over all available resources.
+
+For more information, see [DXF Introduction](/tidb-distributed-execution-framework.md).
+
 ### Dynamic Pruning
 
 Dynamic pruning mode is one of the modes that TiDB accesses partitioned tables. In dynamic pruning mode, each operator supports direct access to multiple partitions. Therefore, TiDB no longer uses Union. Omitting the Union operation can improve the execution efficiency and avoid the problem of Union concurrent execution.
 
+## E
+
+### EC2
+
+[Elastic Compute Cloud (EC2)](https://aws.amazon.com/pm/ec2/) is an AWS service that provides scalable compute resources. It can be used with TiUP to deploy and manage a TiDB cluster.
+
+## G
+
+### GA
+
+If a feature is General Available (GA), it indicates it can be used in production environments. Note that even if a feature is GA in a DMR version, it is recommended to use the feature in production environments in a later LTS version.
+
+### GC
+
+Garbage Collection (GC) is a process that clears obsolete data to free up resources. For information on TiKV GC process, see the [Garbage Collection overview](/garbage-collection-overview.md).
+
+### GTID
+
+Global Transaction Identifiers (GTIDs) are unique transaction IDs used in MySQL binary logs to track which transactions have been replicated. Data Migration (DM) uses these IDs to ensure consistent replication.
+
+## H
+
+### HTAP
+
+Hybrid Transactional and Analytical Processing (HTAP) is a database feature that enables both OLTP (Online Transactional Processing) and OLAP (Online Analytical Processing) workloads within the same database. For TiDB, the HTAP feature is provided by using TiKV for row storage and TiFlash for columnar storage.
+
+For more information, see [the definition of HTAP on the Gartner website](https://www.gartner.com/en/information-technology/glossary/htap-enabling-memory-computing-technologies).
+
+### HTTP
+
+Hypertext Transfer Protocol (HTTP) is a application-layer protocol using for transmitting hypermedia dacuments, such as HTML. It is the foundation of data communication on the web, enabling browsers and servers to request and transfer data in a standardized way. HTTP operates as a request-response protocol: a client sends a request to a server, which then returns a response. Common HTTP methods include GET, POST, PUT, and DELETE, which are used to retrieve, create, update, or delete resources, respectively.
+
 ## I
 
 ### Index Merge