Skip to content

Commit

Permalink
update user manual (#326)
Browse files Browse the repository at this point in the history
Signed-off-by: haojinming <[email protected]>

Signed-off-by: haojinming <[email protected]>
  • Loading branch information
haojinming authored Dec 10, 2022
1 parent 5662608 commit 0966ba8
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 250 deletions.
79 changes: 2 additions & 77 deletions br/README-cn.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,84 +27,9 @@ make test // 运行测试用例

编译成功后,会在 `bin` 目录生成二进制文件

## 部署使用 TiKV-BR 工具
## 使用手册

### 推荐部署配置
- 推荐 TiKV-BR 部署在 PD 节点上。
- 推荐使用一块高性能 SSD 网盘,挂载到 TiKV-BR 节点和所有 TiKV 节点上,网盘推荐万兆网卡,否则带宽有可能成为备份恢复时的性能瓶颈。
- TiKV-BR 只支持版本大于 v5.0.0 的 TiKV 集群中 RawKV 模式数据的备份和恢复。

### 最佳实践
下面是使用 TiKV-BR 进行备份恢复的几种推荐操作:
- 推荐在业务低峰时执行备份操作,这样能最大程度地减少对业务的影响。
- TiKV-BR 支持在不同拓扑的集群上执行恢复,但恢复期间对在线业务影响很大,建议低峰期或者限速 (rate-limit) 执行恢复。
- TiKV-BR 备份最好串行执行。不同备份任务并行会导致备份性能降低,同时也会影响在线业务。
- TiKV-BR 恢复最好串行执行。不同恢复任务并行会导致 Region 冲突增多,恢复的性能降低。
- 推荐在 -s 指定的备份路径上挂载一个共享存储,例如 NFS。这样能方便收集和管理备份文件。
- 在使用共享存储时,推荐使用高吞吐的存储硬件,因为存储的吞吐会限制备份或恢复的速度。
- 可以通过指定 `--checksum=true`,在备份、恢复完成后进行一轮数据校验。数据校验将分别计算备份数据与 TiKV 集群中数据的 checksum,并对比二者是否相同。请注意,如果需要进行数据校验,请确保在备份或恢复的全过程,TiKV 集群没有数据变更和 TTL 过期。
- TiKV-BR 可用于实现 [`api-version`](https://docs.pingcap.com/zh/tidb/stable/tikv-configuration-file#api-version-%E4%BB%8E-v610-%E7%89%88%E6%9C%AC%E5%BC%80%E5%A7%8B%E5%BC%95%E5%85%A5) 从 V1 到 V2 的集群数据迁移。通过指定 `--dst-api-version V2``api-version=1` 的 TiKV 集群备份为 V2 格式,然后将备份文件恢复到新的 `api-version=2` TiKV 集群中。

### TiKV-BR 命令行描述
一条 `tikv-br` 命令是由子命令、选项和参数组成的。子命令即不带 `-` 或者 `--` 的字符。选项即以 `-` 或者 `--` 开头的字符。参数即子命令或选项字符后紧跟的、并传递给命令和选项的字符。
#### 备份集群 Raw 模式数据
要备份 TiKV 集群中 Raw 模式数据,可使用 `tikv-br backup raw` 命令。该命令的使用帮助可以通过 `tikv-br backup raw --help` 来获取。
用例:将 TiKV 集群中 Raw 模式数据备份到 `/tmp/backup` 目录中。
```
tikv-br backup raw \
--pd="&{PDIP}:2379" \
-s="local:///tmp/backup" \
--dst-api-version v2 \
--log-file="/tmp/br_backup.log \
--gcttl=5m \
--start="a" \
--end="z" \
--format="raw"
```
命令行各部分的解释如下:
- `backup``tikv-br` 的子命令
- `raw``backup` 的子命令
- `-s``--storage`:备份保存的路径
- `"local:///tmp/backup"``-s` 的参数,保存的路径为各个 TiKV 节点本地磁盘的 `/tmp/backup`
- `--pd``PD` 服务地址
- `"${PDIP}:2379"``--pd` 的参数
- `--dst-api-version`: 指定备份文件的 `api-version`,请见 [tikv-server config](https://docs.pingcap.com/zh/tidb/stable/tikv-configuration-file#api-version-%E4%BB%8E-v610-%E7%89%88%E6%9C%AC%E5%BC%80%E5%A7%8B%E5%BC%95%E5%85%A5)
- `v2`: `--dst-api-version` 的参数,可选参数为 `v1``v1ttl``v2`(不区分大小写),如果不指定 `dst-api-version` 参数,则备份文件的 `api-version` 与指定 `--pd` 所属的 TiKV 集群 `api-version` 相同。
- `gcttl`: GC 暂停时长。可用于确保从存量数据备份到 [启动 TiKV-CDC 同步任务](https://github.com/tikv/migration/blob/main/cdc/manual-cn.md#%E5%88%9B%E5%BB%BA%E5%90%8C%E6%AD%A5%E4%BB%BB%E5%8A%A1) 的这段时间内,增量数据不会被 GC 清除。默认为 5 分钟。
- `5m`: `gcttl` 的参数,数据格式为`数字 + 时间单位`, 例如 `24h` 表示 24 小时,`60m` 表示 60 分钟。
- `start`, `end`: 用于指定需要备份的数据区间,为左闭右开区间 `[start, end)`。默认为`["", "")`, 即全部数据。
- `format``start``end` 的格式,支持 `raw`[`hex`](https://zh.wikipedia.org/wiki/%E5%8D%81%E5%85%AD%E8%BF%9B%E5%88%B6)[`escaped`](https://zh.wikipedia.org/wiki/%E8%BD%AC%E4%B9%89%E5%AD%97%E7%AC%A6) 三种格式。

备份期间会有进度条在终端中显示,当进度条前进到 100% 时,说明备份已完成。

可以通过指定 `--checksum=true``backup` 结束时进行一轮数据校验,将文本数据同集群数据比较,来保证正确性。

#### 恢复 Raw 模式备份数据

要将 Raw 模式备份数据恢复到集群中来,可使用 `tikv-br restore raw` 命令。该命令的使用帮助可以通过 `tikv-br restore raw --help` 来获取。
用例:将 `/tmp/backup` 路径中的 Raw 模式备份数据恢复到集群中。
```
tikv-br restore raw \
--pd "${PDIP}:2379" \
--storage "local:///tmp/backup" \
--log-file restoreraw.log
```
以上命令中,`--log-file` 选项指定把 `TiKV-BR` 的 log 写到 `restoreraw.log` 文件中。
恢复期间会有进度条在终端中显示,当进度条前进到 100% 时,说明恢复已完成。

可以通过指定 `--checksum=true``restore` 结束时进行一轮数据校验,将文本数据同集群数据比较,来保证正确性。

### 备份文件的数据校验

TiKV-BR 可以在 TiKV 集群备份和恢复操作完成后执行 `checksum` 来确保备份文件的完整性和正确性。 checksum 可以通过 `--checksum` 来开启。

checksum 开启时,备份或恢复操作完成后,会使用 [client-go](https://github.com/tikv/client-go)[checksum](https://github.com/tikv/client-go/blob/ffaaf7131a8df6ab4e858bf27e39cd7445cf7929/rawkv/rawkv.go#L584) 接口来计算 TiKV 集群中有效数据的 checksum 结果,并与备份文件保存的 checksum 结果进行对比。

在某些场景中,TiKV 集群中的数据具有 [TTL](https://docs.pingcap.com/zh/tidb/stable/tikv-configuration-file#enable-ttl) 属性,如果在备份和恢复过程中,数据的 TTL 过期,此时 TiKV 集群的 checksum 结果跟备份文件的 checksum 会不相同,因此不建议在此场景中开启 `checksum`。客户可以选择使用 [client-go](https://github.com/tikv/client-go)[scan](https://github.com/tikv/client-go/blob/ffaaf7131a8df6ab4e858bf27e39cd7445cf7929/rawkv/rawkv.go#L492) 接口,在恢复操作完成后扫描出需要校验的数据,来确保备份文件的正确性。

### 备份恢复操作的安全性

TiKV-BR 支持在开启了 [TLS 配置](https://docs.pingcap.com/zh/tidb/dev/enable-tls-between-components) 的 TiKV 集群中执行备份和恢复操作,用户可以通过设置 `--ca``--cert``--key` 参数来指定客户端证书。
详细信息,请参考 [TiKV-BR 用户文档](https://tikv.org/docs/latest/concepts/explore-tikv-features/backup-restore-cn/)

## License

Expand Down
175 changes: 2 additions & 173 deletions br/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,6 @@

<img src="docs/images/tikv-br-architecture.png?sanitize=true" alt="architecture" width="600"/>

## Documentation

*TODO: Add documents link*

## Building

To build binary and run test:
Expand All @@ -29,176 +25,9 @@ $ make test // run unit test

When TiKV-BR is built successfully, you can find binary in the `bin` directory.

## Quick Start

```bash
# Using tiup to start a TiKV cluster and record the PD_ADDR
tiup playground --db 0 --pd 1 --kv 3 --monitor

# Using go-ycsb to generate test data.
git clone [email protected]:pingcap/go-ycsb.git
cd go-ycsb; make
./bin/go-ycsb load tikv -P workloads/workloada -p tikv.pd="${PD_ADDR}:2379" \
-p tikv.type="raw" -p recordcount=100000 -p operationcount=100000 --threads 100

# Backup ycsb test data.
bin/tikv-br backup raw \
-s s3://backup-data/2022-09-16/_test/ \
--pd ${PD_ADDR}:2379 \
--log-file backup_test.log \

# Restore from the backup.
bin/tikv-br restore raw \
-s s3://backup-data/2022-09-16/_test/ \
--pd ${PD_ADDR}:2379 \
--log-file restore_test.log
```

## Deploy

### Recommended Deployment Configuration
- In production environments, deploy `TiKV-BR` on a node with at least 8 cores CPU and 16 GB memory. Select an appropriate OS version by following [Linux OS version requirements](https://docs.pingcap.com/tidb/dev/hardware-and-software-requirements#linux-os-version-requirements).

- Save backup data to Amazon S3 or one mounted network disk on all `TiKV-BR` and `TiKV` nodes.

- Allocate sufficient resources for backup and restoration.

- TiKV-BR only supports raw data backup/restoration in TiKV cluster with version >= `5.0.0`.

TiKV-BR, TiKV nodes, and the backup storage system should provide network bandwidth that is greater than the backup speed. If the target cluster is particularly large, the threshold of backup and restoration speed is limited by the bandwidth of the backup network.
The backup storage system should also provide sufficient write/read performance (IOPS). Otherwise, the IOPS might become a performance bottleneck during backup or restoration.
TiKV nodes need to have at least two additional spared CPU cores and disk bandwidth (related to `ratelimit` parameter) for backups. Otherwise, the backup might have an impact on the services running on the cluster.

### Best practice
The following are some recommended operations for using `TiKV-BR` for backup and restoration:
- It is recommended that you perform the backup operation during off-peak hours to minimize the impact on applications.
- `TiKV-BR` supports restore on clusters of different topologies. However, the online applications will be greatly impacted during the restore operation. It is recommended that you perform restore during the off-peak hours or use `ratelimit` to limit the rate.
- It is recommended that you execute multiple backup operations serially. Running different backup operations in parallel reduces backup performance and also affects the online application.
- It is recommended that you execute multiple restore operations serially. Running different restore operations in parallel increases Region conflicts and also reduces restore performance.
- `TiKV-BR` supports checksum between `TiKV` cluster and backup files after backup or restore with the config `--checksum=true`. Note that, if checksum is enabled, please make sure no data is changed or `TTL` expired in `TiKV` cluster during backup or restore.
- TiKV-BR supports [`api-version`](https://docs.pingcap.com/tidb/stable/tikv-configuration-file#api-version-new-in-v610) conversion from V1 to V2 with config `--dst-api-version V2`. Then restore the backup files to APIV2 `TiKV` cluster. This is mainly used to upgrade from APIV1 cluster to APIV2 cluster.

### TiKV-BR Command Line Description
A tikv-br command consists of sub-commands, options, and parameters.

- Sub-command: the characters without - or --, including `backup`, `restore`, `raw` and `help`.
- Option: the characters that start with - or --.
- Parameter: the characters that immediately follow behind and are passed to the sub-command or the option.
#### Backup Raw Data
To back up the cluster raw data, use the `tikv-br backup raw` command. To get help on this command, execute `tikv-br backup raw -h` or `tikv-br backup raw --help`.
For example, backup raw data in TiKV cluster to s3 `/backup-data/2022-09-16` directory.

```
export AWS_ACCESS_KEY_ID=&{AWS_KEY_ID};
export AWS_SECRET_ACCESS_KEY=&{AWS_KEY};
tikv-br backup raw \
--pd "&{PDIP}:2379" \
-s "s3://backup-data/2022-09-16/" \
--ratelimit 128 \
--dst-api-version v2 \
--log-file="/tmp/br_backup.log \
--gcttl=5m \
--start="a" \
--end="z" \
--format="raw"
```
Explanations for some options in the above command are as follows:
- `backup`: Sub-command of `tikv-br`.
- `raw`: Sub-command of `backup`.
- `-s` or `--storage`: Storage of backup files.
- `"s3://backup-data/2022-09-16/"`: Parameter of `-s`, save the backup files in `"s3://backup-data/2022-09-16/"`.
- `--ratelimit`: The maximum speed at which a backup operation is performed on each `TiKV` node.
- `128`: The value of `ratelimit`, unit is MiB/s.
- `--pd`: Service address of `PD`.
- `"${PDIP}:2379"`: Parameter of `--pd`.
- `--dst-api-version`: The `api-version`, please see [tikv-server config](https://docs.pingcap.com/tidb/stable/tikv-configuration-file#api-version-new-in-v610).
- `v2`: Parameter of `--dst-api-version`, the optionals are `v1`, `v1ttl`, `v2`(Case insensitive). If no `dst-api-version` is specified, the `api-version` is the same with TiKV cluster of `--pd`.
- `gcttl`: The pause duration of GC. This can be used to make sure that the incremental data from backup start to TiKV-CDC [create changefeed](https://github.com/tikv/migration/blob/main/cdc/README.md#create-a-replication-task) will NOT be deleted by GC. 5 minutes by default.
- `5m`: Paramater of `gcttl`. Its format is `number + unit`, e.g. `24h` means 24 hours, `60m` means 60 minutes.
- `start`, `end`: The backup key range. It's closed left and open right `[start, end)`.
- `format`: Format of `start` and `end`. Supported formats are `raw`[`hex`](https://en.wikipedia.org/wiki/Hexadecimal) and [`escaped`](https://en.wikipedia.org/wiki/Escape_character).

A progress bar is displayed in the terminal during the backup. When the progress bar advances to 100%, the backup is complete. The progress bar is displayed as follows:
```
br backup raw \
--pd "${PDIP}:2379" \
--storage "s3://backup-data/2022-09-16/" \
--log-file backupfull.log
Backup Raw <---------/................................................> 17.12%.
```

After backup finish, the result message is displayed as follows:
```
[2022/09/20 18:01:10.125 +08:00] [INFO] [collector.go:67] ["Raw backup success summary"] [total-ranges=3] [ranges-succeed=3] [ranges-failed=0] [backup-total-regions=3] [total-take=5.050265883s] [backup-ts=436120585518448641] [total-kv=100000] [total-kv-size=108.7MB] [average-speed=21.11MB/s] [backup-data-size(after-compressed)=78.3MB]
```
Explanations for the above message are as follows:
- `total-ranges`: Number of ranges that the whole backup task is split to. Equals to `ranges-succeed` + `ranges-failed`.
- `ranges-succeed`: Number of succeeded ranges.
- `ranges-failed`: Number of failed ranges.
- `backup-total-regions`: The tikv regions that backup takes.
- `total-take`: The backup duration.
- `backup-ts`: The backup start timestamp, only take effect for APIV2 TiKV cluster, which can be used as `start-ts` of `TiKV-CDC` when creating replication tasks. Refer to [Create a replication task](https://github.com/tikv/migration/blob/main/cdc/README.md#create-a-replication-task).
- `total-kv`: Total kv count in backup files.
- `total-kv-size`: Total kv size in backup files. Note that this is the original size before compression.
- `average-speed`: The backup speed, which approximately equals to `total-kv-size` / `total-take`.
- `backup-data-size(after-compressed)`: The backup file size.

#### Restore Raw Data

To restore raw data to the cluster, execute the `tikv-br restore raw` command. To get help on this command, execute `tikv-br restore raw -h` or `tikv-br restore raw --help`.
For example, restore the raw backup files in s3 `/backup-data/2022-09-16` to `TiKV` cluster.

```
export AWS_ACCESS_KEY_ID=&{AWS_KEY_ID};
export AWS_SECRET_ACCESS_KEY=&{AWS_KEY};
tikv-br restore raw \
--pd "${PDIP}:2379" \
--storage "s3://backup-data/2022-09-16/" \
--ratelimit 128 \
--log-file restoreraw.log
```
Explanations for some options in the above command are as follows:

- `--ratelimit`: The maximum speed at which a restoration operation is performed (MiB/s) on each `TiKV` node.
- `--log-file`: Writing the TiKV-BR log to the `restorefull.log` file.

A progress bar is displayed in the terminal during the restoration. When the progress bar advances to 100%, the restoration is complete. The progress bar is displayed as follows:
```
tikv-br restore raw \
--pd "${PDIP}:2379" \
--storage "s3://backup-data/2022-09-16/" \
--ratelimit 128 \
--log-file restoreraw.log
Restore Raw <---------/...............................................> 17.12%.
```

After restoration finish, the result message is displayed as follows:
```
[2022/09/20 18:02:12.540 +08:00] [INFO] [collector.go:67] ["Raw restore success summary"] [total-ranges=3] [ranges-succeed=3] [ranges-failed=0] [restore-files=3] [total-take=950.460846ms] [restore-data-size(after-compressed)=78.3MB] [total-kv=100000] [total-kv-size=108.7MB] [average-speed=114.4MB/s]
```
Explanations for the above message are as follows:
- `total-ranges`: Number of ranges that the whole backup task is split to. Equals to `ranges-succeed` + `ranges-failed`.
- `ranges-succeed`: Number of succeeded ranges.
- `ranges-failed`: Number of failed ranges.
- `restore-files`: Number of restored files.
- `total-take`: The restoration duration.
- `total-kv`: Total restored kv count.
- `total-kv-size`: Total restored kv size. Note that this is the original size before compression.
- `average-speed`: The restoration speed, which approximately equals to `total-kv-size` / `total-take`.
- `restore-data-size(after-compressed)`: The restoration file size.


### Data Verification of Backup & Restore

TiKV-BR can do checksum between TiKV cluster and backup files after backup or restoration finish with the config `--checksum=true`. Checksum is using the [checksum](https://github.com/tikv/client-go/blob/ffaaf7131a8df6ab4e858bf27e39cd7445cf7929/rawkv/rawkv.go#L584) interface in TiKV [client-go](https://github.com/tikv/client-go), which send checksum request to all TiKV regions to calculate the checksum of all **VALID** data. Then compare to the checksum value of backup files which is calculated during backup process.

In some scenario, data is stored in TiKV with [TTL](https://docs.pingcap.com/tidb/stable/tikv-configuration-file#enable-ttl). If data is expired during backup & restore, the persisted checksum in backup files is different from the checksum of TiKV cluster. So checksum should not enabled in this scenario. User can perform a full comparison for all existing non-expired data between backup cluster and restore cluster with [scan](https://github.com/tikv/client-go/blob/ffaaf7131a8df6ab4e858bf27e39cd7445cf7929/rawkv/rawkv.go#L492) interface in [client-go](https://github.com/tikv/client-go).

### Security During Backup & Restoration

TiKV-BR supports TLS if [TLS config](https://docs.pingcap.com/tidb/dev/enable-tls-between-components) in TiKV cluster is enabled.
## User Manual

Please specify the client certification with config `--ca`, `--cert` and `--key`.
For details, see [TiKV-BR User Docs](https://tikv.org/docs/latest/concepts/explore-tikv-features/backup-restore/).

## Contributing

Expand Down

0 comments on commit 0966ba8

Please sign in to comment.