Skip to content

Commit fe33d98

Browse files
authored
update the default collation of GBK from gbk_bin to gbk_chinese_ci (pingcap#20818)
1 parent 5685cad commit fe33d98

File tree

3 files changed

+22
-45
lines changed

3 files changed

+22
-45
lines changed

character-set-and-collation.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -104,7 +104,7 @@ SHOW CHARACTER SET;
104104
+---------+-------------------------------------+-------------------+--------+
105105
| ascii | US ASCII | ascii_bin | 1 |
106106
| binary | binary | binary | 1 |
107-
| gbk | Chinese Internal Code Specification | gbk_bin | 2 |
107+
| gbk | Chinese Internal Code Specification | gbk_chinese_ci | 2 |
108108
| latin1 | Latin1 | latin1_bin | 1 |
109109
| utf8 | UTF-8 Unicode | utf8_bin | 3 |
110110
| utf8mb4 | UTF-8 Unicode | utf8mb4_bin | 4 |

character-set-gbk.md

Lines changed: 10 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,9 @@ summary: This document provides details about the TiDB support of the GBK charac
55

66
# GBK
77

8-
Since v5.4.0, TiDB supports the GBK character set. This document provides the TiDB support and compatibility information of the GBK character set.
8+
Starting from v5.4.0, TiDB supports the GBK character set. This document provides the TiDB support and compatibility information of the GBK character set.
9+
10+
Starting from v6.0.0, TiDB enables the [new framework for collations](/character-set-and-collation.md#new-framework-for-collations) by default. The default collation for TiDB GBK character set is `gbk_chinese_ci`, which is consistent with MySQL.
911

1012
```sql
1113
SHOW CHARACTER SET WHERE CHARSET = 'gbk';
@@ -15,7 +17,7 @@ SHOW CHARACTER SET WHERE CHARSET = 'gbk';
1517
+---------+-------------------------------------+-------------------+--------+
1618
| Charset | Description | Default collation | Maxlen |
1719
+---------+-------------------------------------+-------------------+--------+
18-
| gbk | Chinese Internal Code Specification | gbk_bin | 2 |
20+
| gbk | Chinese Internal Code Specification | gbk_chinese_ci | 2 |
1921
+---------+-------------------------------------+-------------------+--------+
2022
1 row in set (0.00 sec)
2123
```
@@ -40,48 +42,22 @@ This section provides the compatibility information between MySQL and TiDB.
4042

4143
### Collations
4244

43-
The default collation of the GBK character set in MySQL is `gbk_chinese_ci`. Unlike MySQL, the default collation of the GBK character set in TiDB is `gbk_bin`. Additionally, because TiDB converts GBK to `utf8mb4` and then uses a binary collation, the `gbk_bin` collation in TiDB is not the same as the `gbk_bin` collation in MySQL.
44-
4545
<CustomContent platform="tidb">
4646

47-
To make TiDB compatible with the collations of MySQL GBK character set, when you first initialize the TiDB cluster, you need to set the TiDB option [`new_collations_enabled_on_first_bootstrap`](/tidb-configuration-file.md#new_collations_enabled_on_first_bootstrap) to `true` to enable the [new framework for collations](/character-set-and-collation.md#new-framework-for-collations). This is the default setting for new deployments.
47+
The default collation of the GBK character set in MySQL is `gbk_chinese_ci`. The default collation for the GBK character set in TiDB depends on the value of the TiDB configuration item [`new_collations_enabled_on_first_bootstrap`](/tidb-configuration-file.md#new_collations_enabled_on_first_bootstrap):
48+
49+
- By default, the TiDB configuration item [`new_collations_enabled_on_first_bootstrap`](/tidb-configuration-file.md#new_collations_enabled_on_first_bootstrap) is set to `true`, which means that the [new framework for collations](/character-set-and-collation.md#new-framework-for-collations) is enabled and the default collation for the GBK character set is `gbk_chinese_ci`.
50+
- When the TiDB configuration item [`new_collations_enabled_on_first_bootstrap`](/tidb-configuration-file.md#new_collations_enabled_on_first_bootstrap) is set to `false`, the [new framework for collations](/character-set-and-collation.md#new-framework-for-collations) is disabled, and the default collation for the GBK character set is `gbk_bin`.
4851

4952
</CustomContent>
5053

5154
<CustomContent platform="tidb-cloud">
5255

53-
To make TiDB compatible with the collations of MySQL GBK character set, when you first initialize the TiDB cluster, TiDB Cloud enables the [new framework for collations](/character-set-and-collation.md#new-framework-for-collations) by default.
56+
By default, TiDB Cloud enables the [new framework for collations](/character-set-and-collation.md#new-framework-for-collations) and the default collation for the GBK character set is `gbk_chinese_ci`.
5457

5558
</CustomContent>
5659

57-
After enabling the new framework for collations, if you check the collations corresponding to the GBK character set, you can see that the TiDB GBK default collation is changed to `gbk_chinese_ci`.
58-
59-
```sql
60-
SHOW CHARACTER SET WHERE CHARSET = 'gbk';
61-
```
62-
63-
```
64-
+---------+-------------------------------------+-------------------+--------+
65-
| Charset | Description | Default collation | Maxlen |
66-
+---------+-------------------------------------+-------------------+--------+
67-
| gbk | Chinese Internal Code Specification | gbk_chinese_ci | 2 |
68-
+---------+-------------------------------------+-------------------+--------+
69-
1 row in set (0.00 sec)
70-
```
71-
72-
```sql
73-
SHOW COLLATION WHERE CHARSET = 'gbk';
74-
```
75-
76-
```
77-
+----------------+---------+----+---------+----------+---------+---------------+
78-
| Collation | Charset | Id | Default | Compiled | Sortlen | Pad_attribute |
79-
+----------------+---------+----+---------+----------+---------+---------------+
80-
| gbk_bin | gbk | 87 | | Yes | 1 | PAD SPACE |
81-
| gbk_chinese_ci | gbk | 28 | Yes | Yes | 1 | PAD SPACE |
82-
+----------------+---------+----+---------+----------+---------+---------------+
83-
2 rows in set (0.00 sec)
84-
```
60+
Additionally, because TiDB converts GBK to `utf8mb4` and then uses a binary collation, the `gbk_bin` collation in TiDB is not the same as the `gbk_bin` collation in MySQL.
8561

8662
### Illegal character compatibility
8763

sql-statements/sql-statement-show-character-set.md

Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -26,16 +26,17 @@ SHOW CHARACTER SET;
2626
```
2727

2828
```
29-
+---------+---------------+-------------------+--------+
30-
| Charset | Description | Default collation | Maxlen |
31-
+---------+---------------+-------------------+--------+
32-
| utf8 | UTF-8 Unicode | utf8_bin | 3 |
33-
| utf8mb4 | UTF-8 Unicode | utf8mb4_bin | 4 |
34-
| ascii | US ASCII | ascii_bin | 1 |
35-
| latin1 | Latin1 | latin1_bin | 1 |
36-
| binary | binary | binary | 1 |
37-
+---------+---------------+-------------------+--------+
38-
5 rows in set (0.00 sec)
29+
+---------+-------------------------------------+-------------------+--------+
30+
| Charset | Description | Default collation | Maxlen |
31+
+---------+-------------------------------------+-------------------+--------+
32+
| ascii | US ASCII | ascii_bin | 1 |
33+
| binary | binary | binary | 1 |
34+
| gbk | Chinese Internal Code Specification | gbk_chinese_ci | 2 |
35+
| latin1 | Latin1 | latin1_bin | 1 |
36+
| utf8 | UTF-8 Unicode | utf8_bin | 3 |
37+
| utf8mb4 | UTF-8 Unicode | utf8mb4_bin | 4 |
38+
+---------+-------------------------------------+-------------------+--------+
39+
6 rows in set (0.00 sec)
3940
```
4041

4142
```sql

0 commit comments

Comments
 (0)