Skip to content

Commit aba6bd2

Browse files
authored
Add docs for Analyze Embedded in DDL and new system variable (#21031)
1 parent cf65d0c commit aba6bd2

File tree

3 files changed

+186
-0
lines changed

3 files changed

+186
-0
lines changed

TOC.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1089,6 +1089,7 @@
10891089
- [外部存储服务的 URI 格式](/external-storage-uri.md)
10901090
- [TiDB Workload Repository](/workload-repository.md)
10911091
- [线上负载与 `ADD INDEX` 相互影响测试](/benchmark/online-workloads-and-add-index-operations.md)
1092+
- [内嵌于 DDL 的 Analyze](/ddl_embedded_analyze.md)
10921093
- 常见问题解答 (FAQ)
10931094
- [FAQ 汇总](/faq/faq-overview.md)
10941095
- [产品 FAQ](/faq/tidb-faq.md)

ddl_embedded_analyze.md

Lines changed: 177 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,177 @@
1+
---
2+
title: 内嵌于 DDL 的 Analyze
3+
summary: 本文介绍内嵌于新建或重组索引的 DDL 中的 Analyze 特性,用于确保新索引的统计信息及时更新。
4+
---
5+
6+
# 内嵌于 DDL 的 Analyze <span class="version-mark">从 v8.5.4 和 v9.0.0 开始引入</span>
7+
8+
本文介绍内嵌于以下两类 DDL 的 Analyze 特性:
9+
10+
- 新建索引的 DDL:[`ADD INDEX`](/sql-statements/sql-statement-add-index.md)
11+
- 重组已有索引的 DDL:[`MODIFY COLUMN`](/sql-statements/sql-statement-modify-column.md)[`CHANGE COLUMN`](/sql-statements/sql-statement-change-column.md)
12+
13+
开启该特性后,TiDB 会在新索引对用户可见前自动执行一次 Analyze(统计信息收集),以避免新建或重组索引后因统计信息暂不可用而导致优化器估算不准,从而引起执行计划变更的问题。
14+
15+
## 使用场景
16+
17+
在一些交替执行索引新增或修改的 DDL 操作场景中,已有的稳定查询可能因为新索引缺乏统计信息而出现代价估算偏差,导致优化器生成次优计划。详情可参考 [Issue #57948](https://github.com/pingcap/tidb/issues/57948)
18+
19+
例如:
20+
21+
```sql
22+
CREATE TABLE t (a INT, b INT);
23+
INSERT INTO t VALUES (1, 1), (2, 2), (3, 3);
24+
INSERT INTO t SELECT * FROM t; -- * N times
25+
26+
ALTER TABLE t ADD INDEX idx_a (a);
27+
28+
EXPLAIN SELECT * FROM t WHERE a > 4;
29+
```
30+
31+
```
32+
+-------------------------+-----------+-----------+---------------+--------------------------------+
33+
| id | estRows | task | access object | operator info |
34+
+-------------------------+-----------+-----------+---------------+--------------------------------+
35+
| TableReader_8 | 131072.00 | root | | data:Selection_7 |
36+
| └─Selection_7 | 131072.00 | cop[tikv] | | gt(test.t.a, 4) |
37+
| └─TableFullScan_6 | 393216.00 | cop[tikv] | table:t | keep order:false, stats:pseudo |
38+
+-------------------------+-----------+-----------+---------------+--------------------------------+
39+
3 rows in set (0.002 sec)
40+
```
41+
42+
从以上执行计划可以看到,由于新建索引尚未生成统计信息,TiDB 在路径估算时只能依赖启发式规则。除非索引访问路径无需回表且代价显著更低,否则优化器倾向于选择估算更稳定的现有路径,因此上述示例中使用了全表扫描。然而,从数据分布角度来看,`t.a > 4` 实际返回 0 行,如果能使用新建索引 `idx_a`,查询可以快速定位到相关行,从而避免全表扫描。在该示例中,由于 DDL 创建索引后 TiDB 未能及时收集索引统计信息,生成的执行计划不是最优的,但优化器会继续沿用原有计划,因此查询性能不会出现突变或退化。然而,根据 [Issue #57948](https://github.com/pingcap/tidb/issues/57948),在某些情况下,启发式规则可能会导致新旧索引进行不合理的比较,从而裁剪原查询计划依赖的索引,最终 fallback 到全表扫描。
43+
44+
从 v8.5.0 起,TiDB 对索引的启发式比较和统计信息缺失时的行为进行了优化。但在部分复杂场景中,在 DDL 执行过程中内嵌 Analyze 仍是防止执行计划变更的最佳方案。你可以通过系统变量 [`tidb_stats_update_during_ddl`](/system-variables.md#tidb_stats_update_during_ddl-从-v854-和-v900-版本开始引入) 控制在索引创建或重组阶段是否执行内嵌 Analyze。该变量默认值为 `OFF`
45+
46+
## 新建索引 `ADD INDEX` 的 DDL
47+
48+
`tidb_stats_update_during_ddl` 设置为 `ON` 时,执行 [`ADD INDEX`](/sql-statements/sql-statement-add-index.md) 操作将在 Reorg 阶段结束后自动执行内嵌的 Analyze 命令。此 Analyze 命令会在新索引对用户可见前,分析相关新建索引的统计信息,然后再继续执行 `ADD INDEX` 的剩余阶段。
49+
50+
考虑到 Analyze 可能会有一定的耗时,TiDB 会以首次 Reorg 的执行时间为参考设置超时阈值。若 Analyze 超时,`ADD INDEX` 将不再同步等待 Analyze 完成,而是继续执行后续流程,使索引提前对用户可见。这意味着,该新索引的统计信息会在 Analyze 异步完成后更新。
51+
52+
示例:
53+
54+
```sql
55+
CREATE TABLE t (a INT, b INT, c INT);
56+
Query OK, 0 rows affected (0.011 sec)
57+
58+
INSERT INTO t VALUES (1, 1, 1), (2, 2, 2), (3, 3, 3);
59+
Query OK, 3 rows affected (0.003 sec)
60+
Records: 3 Duplicates: 0 Warnings: 0
61+
62+
SET @@tidb_stats_update_during_ddl = 1;
63+
Query OK, 0 rows affected (0.001 sec)
64+
65+
ALTER TABLE t ADD INDEX idx (a, b);
66+
Query OK, 0 rows affected (0.049 sec)
67+
```
68+
69+
```sql
70+
EXPLAIN SELECT a FROM t WHERE a > 1;
71+
```
72+
73+
```
74+
+------------------------+---------+-----------+--------------------------+----------------------------------+
75+
| id | estRows | task | access object | operator info |
76+
+------------------------+---------+-----------+--------------------------+----------------------------------+
77+
| IndexReader_7 | 4.00 | root | | index:IndexRangeScan_6 |
78+
| └─IndexRangeScan_6 | 4.00 | cop[tikv] | table:t, index:idx(a, b) | range:(1,+inf], keep order:false |
79+
+------------------------+---------+-----------+--------------------------+----------------------------------+
80+
2 rows in set (0.002 sec)
81+
```
82+
83+
```sql
84+
SHOW STATS_HISTOGRAMS WHERE table_name = "t";
85+
```
86+
87+
```
88+
+---------+------------+----------------+-------------+----------+---------------------+----------------+------------+--------------+-------------+-------------+-----------------+----------------+----------------+---------------+
89+
| Db_name | Table_name | Partition_name | Column_name | Is_index | Update_time | Distinct_count | Null_count | Avg_col_size | Correlation | Load_status | Total_mem_usage | Hist_mem_usage | Topn_mem_usage | Cms_mem_usage |
90+
+---------+------------+----------------+-------------+----------+---------------------+----------------+------------+--------------+-------------+-------------+-----------------+----------------+----------------+---------------+
91+
| test | t | | a | 0 | 2025-10-30 20:17:57 | 3 | 0 | 0.5 | 1 | allLoaded | 155 | 0 | 155 | 0 |
92+
| test | t | | idx | 1 | 2025-10-30 20:17:57 | 3 | 0 | 0 | 0 | allLoaded | 182 | 0 | 182 | 0 |
93+
+---------+------------+----------------+-------------+----------+---------------------+----------------+------------+--------------+-------------+-------------+-----------------+----------------+----------------+---------------+
94+
2 rows in set (0.013 sec)
95+
```
96+
97+
```sql
98+
ADMIN SHOW DDL JOBS 1;
99+
```
100+
101+
```
102+
+--------+---------+--------------------------+---------------+----------------------+-----------+----------+-----------+----------------------------+----------------------------+----------------------------+---------+----------------------------------------+
103+
| JOB_ID | DB_NAME | TABLE_NAME | JOB_TYPE | SCHEMA_STATE | SCHEMA_ID | TABLE_ID | ROW_COUNT | CREATE_TIME | START_TIME | END_TIME | STATE | COMMENTS |
104+
+--------+---------+--------------------------+---------------+----------------------+-----------+----------+-----------+----------------------------+----------------------------+----------------------------+---------+----------------------------------------+
105+
| 151 | test | t | add index | write reorganization | 2 | 148 | 6291456 | 2025-10-29 00:14:47.181000 | 2025-10-29 00:14:47.183000 | NULL | running | analyzing, txn-merge, max_node_count=3 |
106+
+--------+---------+--------------------------+---------------+----------------------+-----------+----------+-----------+----------------------------+----------------------------+----------------------------+---------+----------------------------------------+
107+
1 rows in set (0.001 sec)
108+
```
109+
110+
`ADD INDEX` 示例来看,当 `tidb_stats_update_during_ddl` 设置为 `ON` 时,在 `ADD INDEX` DDL 执行结束后,可以看到其之后运行的 `EXPLAIN` 查询中,相关索引 `idx` 的统计信息已经被自动收集并加载到内存中(可通过 `SHOW STATS_HISTOGRAMS` 语句的输出结果得到验证)。因此,优化器可以立即在范围扫描(Range Scan)中使用这些统计信息。如果索引的创建或重组以及 Analyze 过程耗时较长,可以通过 `ADMIN SHOW DDL JOBS` 查看 DDL Job 的状态。当输出结果中的 `COMMENTS` 列包含 `analyzing` 时,表示该 DDL Job 正在执行统计信息收集。
111+
112+
## 重组已有索引的 DDL
113+
114+
`tidb_stats_update_during_ddl` 设置为 `ON` 时,执行 [`MODIFY COLUMN`](/sql-statements/sql-statement-modify-column.md)[`CHANGE COLUMN`](/sql-statements/sql-statement-change-column.md) 操作重组索引时,TiDB 也会在 Reorg 阶段结束后执行内嵌的 Analyze 命令。其机制与 `ADD INDEX` 相同:
115+
116+
- 在索引可见前开始进行统计信息收集。
117+
- 若 Analyze 超时,[`MODIFY COLUMN`](/sql-statements/sql-statement-modify-column.md)[`CHANGE COLUMN`](/sql-statements/sql-statement-change-column.md) 将不会同步等待 Analyze 完成,而是继续执行后续流程,使索引提前对用户可见。这意味着,该新索引的统计信息会在 Analyze 异步完成后更新。
118+
119+
示例:
120+
121+
```sql
122+
CREATE TABLE s (a VARCHAR(10), INDEX idx (a));
123+
Query OK, 0 rows affected (0.012 sec)
124+
125+
INSERT INTO s VALUES (1), (2), (3);
126+
Query OK, 3 rows affected (0.003 sec)
127+
Records: 3 Duplicates: 0 Warnings: 0
128+
129+
SET @@tidb_stats_update_during_ddl = 1;
130+
Query OK, 0 rows affected (0.001 sec)
131+
132+
ALTER TABLE s MODIFY COLUMN a INT;
133+
Query OK, 0 rows affected (0.056 sec)
134+
135+
EXPLAIN SELECT * FROM s WHERE a > 1;
136+
```
137+
138+
```
139+
+------------------------+---------+-----------+-----------------------+----------------------------------+
140+
| id | estRows | task | access object | operator info |
141+
+------------------------+---------+-----------+-----------------------+----------------------------------+
142+
| IndexReader_7 | 2.00 | root | | index:IndexRangeScan_6 |
143+
| └─IndexRangeScan_6 | 2.00 | cop[tikv] | table:s, index:idx(a) | range:(1,+inf], keep order:false |
144+
+------------------------+---------+-----------+-----------------------+----------------------------------+
145+
2 rows in set (0.005 sec)
146+
```
147+
148+
```sql
149+
SHOW STATS_HISTOGRAMS WHERE table_name = "s";
150+
```
151+
152+
```
153+
+---------+------------+----------------+-------------+----------+---------------------+----------------+------------+--------------+-------------+-------------+-----------------+----------------+----------------+---------------+
154+
| Db_name | Table_name | Partition_name | Column_name | Is_index | Update_time | Distinct_count | Null_count | Avg_col_size | Correlation | Load_status | Total_mem_usage | Hist_mem_usage | Topn_mem_usage | Cms_mem_usage |
155+
+---------+------------+----------------+-------------+----------+---------------------+----------------+------------+--------------+-------------+-------------+-----------------+----------------+----------------+---------------+
156+
| test | s | | a | 0 | 2025-10-30 20:10:18 | 3 | 0 | 2 | 1 | allLoaded | 158 | 0 | 158 | 0 |
157+
| test | s | | a | 0 | 2025-10-30 20:10:18 | 3 | 0 | 1 | 1 | allLoaded | 155 | 0 | 155 | 0 |
158+
| test | s | | idx | 1 | 2025-10-30 20:10:18 | 3 | 0 | 0 | 0 | allLoaded | 158 | 0 | 158 | 0 |
159+
| test | s | | idx | 1 | 2025-10-30 20:10:18 | 3 | 0 | 0 | 0 | allLoaded | 155 | 0 | 155 | 0 |
160+
+---------+------------+----------------+-------------+----------+---------------------+----------------+------------+--------------+-------------+-------------+-----------------+----------------+----------------+---------------+
161+
4 rows in set (0.008 sec)
162+
```
163+
164+
```sql
165+
ADMIN SHOW DDL JOBS 1;
166+
```
167+
168+
```
169+
+--------+---------+------------------+---------------+----------------------+-----------+----------+-----------+----------------------------+----------------------------+----------------------------+---------+-----------------------------+
170+
| JOB_ID | DB_NAME | TABLE_NAME | JOB_TYPE | SCHEMA_STATE | SCHEMA_ID | TABLE_ID | ROW_COUNT | CREATE_TIME | START_TIME | END_TIME | STATE | COMMENTS |
171+
+--------+---------+------------------+---------------+----------------------+-----------+----------+-----------+----------------------------+----------------------------+----------------------------+---------+-----------------------------+
172+
| 153 | test | s | modify column | write reorganization | 2 | 148 | 12582912 | 2025-10-29 00:26:49.240000 | 2025-10-29 00:26:49.244000 | NULL | running | analyzing |
173+
+--------+---------+------------------+---------------+----------------------+-----------+----------+-----------+----------------------------+----------------------------+----------------------------+---------+-----------------------------+
174+
1 rows in set (0.001 sec)
175+
```
176+
177+
`MODIFY COLUMN` 示例来看,当 `tidb_stats_update_during_ddl` 设置为 `ON` 时,在 `MODIFY COLUMN` DDL 执行结束后,可以看到其之后运行的 `EXPLAIN` 查询中,相关索引 `idx` 的统计信息已经被自动收集并加载到内存中(可通过 `SHOW STATS_HISTOGRAMS` 语句的输出结果得到验证),因此优化器能够立即在范围扫描(Range Scan)中使用这些统计信息。如果索引的创建或重组以及 Analyze 过程耗时较长,可以通过 `ADMIN SHOW DDL JOBS` 查看 DDL Job 的状态。当输出结果中的 `COMMENTS` 列包含 `analyzing` 时,表示该 DDL Job 正在执行统计信息收集。

system-variables.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1488,6 +1488,14 @@ mysql> SELECT job_info FROM mysql.analyze_jobs ORDER BY end_time DESC LIMIT 1;
14881488
>
14891489
> * 在升级到 v6.5.0 及以上版本时,请确保 TiDB 的 [`temp-dir`](/tidb-configuration-file.md#temp-dir-从-v630-版本开始引入) 路径已正确挂载了 SSD 磁盘,并确保运行 TiDB 的操作系统用户对该目录有读写权限,否则在运行时可能产生不可预知的问题。该参数是 TiDB 的配置参数,设置后需要重启 TiDB 才能生效。因此,在升级前提前进行设置,可以避免再次重启。
14901490

1491+
### `tidb_stats_update_during_ddl` <span class="version-mark">从 v8.5.4 和 v9.0.0 版本开始引入</span>
1492+
1493+
- 作用域:SESSION | GLOBAL
1494+
- 是否持久化到集群:是
1495+
- 是否受 Hint [SET_VAR](/optimizer-hints.md#set_varvar_namevar_value) 控制:否
1496+
- 默认值:`OFF`
1497+
- 这个变量用于控制是否开启 DDL 内嵌的 Analyze 的行为。开启后,涉及新建索引的 DDL [`ADD INDEX`](/sql-statements/sql-statement-add-index.md),以及重组已有索引的 DDL([`MODIFY COLUMN`](/sql-statements/sql-statement-modify-column.md) 和 [`CHANGE COLUMN`](/sql-statements/sql-statement-change-column.md))将会在索引可见前自动执行统计信息收集。详情请参考[内嵌于 DDL 的 Analyze](/ddl_embedded_analyze.md)。
1498+
14911499
### `tidb_enable_dist_task` <span class="version-mark">从 v7.1.0 版本开始引入</span>
14921500

14931501
- 作用域:GLOBAL

0 commit comments

Comments
 (0)