Skip to content

Commit

Permalink
Merge branch 'master' into cooper-lzy-patch-1
Browse files Browse the repository at this point in the history
  • Loading branch information
cooper-lzy authored Apr 15, 2022
2 parents 0330b4c + a94958b commit aab09e7
Show file tree
Hide file tree
Showing 10 changed files with 321 additions and 13 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Nebula Graph 由三种服务构成:Graph 服务、Meta 服务和 Storage 服

下图展示了 Nebula Graph 集群的经典架构。

![Nebula Graph architecture](https://docs-cdn.nebula-graph.com.cn/docs-2.0/1.introduction/2.nebula-graph-architecture/nebula-graph-architecture-1.png "Nebula Graph architecture")
![Nebula Graph architecture](https://docs-cdn.nebula-graph.com.cn/figures/nebula-graph-architecture_3.png "Nebula Graph architecture")

## Meta 服务

Expand Down
2 changes: 1 addition & 1 deletion docs-2.0/20.appendix/0.FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ Nebula Graph 一直在持续开发,功能或操作的行为可能会有变化

从 Nebula Graph 3.0.0 开始,查询语句`LOOKUP``GO``FETCH`必须用`YIELD`子句指定输出结果。详情请参见[YIELD](../3.ngql-guide/8.clauses-and-options/yield.md)

### 如何处理错误信息 `Zone not enough!`
### 如何处理错误信息 `Host not enough!`

从 3.0.0 版本开始,在配置文件中添加的 Storage 节点无法直接读写,配置文件的作用仅仅是将 Storage 节点注册至 Meta 服务中。必须使用`ADD HOSTS`命令后,才能正常读写 Storage 节点。详情参见[管理 Storage 主机](../4.deployment-and-installation/manage-storage-host.md)

Expand Down
19 changes: 19 additions & 0 deletions docs-2.0/3.ngql-guide/4.job-statements.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,25 @@

所有作业管理命令都需要先选择图空间后才能执行。

## SUBMIT JOB BALANCE DATA

!!! enterpriseonly

仅企业版支持本功能。

`SUBMIT JOB BALANCE DATA`语句会在当前图空间内启动任务均衡分布分片。该命令会返回任务 ID。

示例:

```ngql
nebula> SUBMIT JOB BALANCE DATA;
+------------+
| New Job Id |
+------------+
| 28 |
+------------+
```

<!-- balance-3.1
## SUBMIT JOB BALANCE IN ZONE
Expand Down
11 changes: 11 additions & 0 deletions docs-2.0/3.ngql-guide/9.space-statements/4.describe-space.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,3 +20,14 @@ nebula> DESCRIBE SPACE basketballplayer;
| 1 | "basketballplayer" | 10 | 1 | "utf8" | "utf8_bin" | "FIXED_STRING(32)" | "default" | |
+----+--------------------+------------------+----------------+---------+------------+--------------------+-----------+---------+
```

<!--
```ngql
nebula> DESCRIBE SPACE basketballplayer;
+----+--------------------+------------------+----------------+---------+------------+--------------------+-------------+-----------+---------+
| ID | Name | Partition Number | Replica Factor | Charset | Collate | Vid Type | Atomic Edge | Zones | Comment |
+----+--------------------+------------------+----------------+---------+------------+--------------------+-------------+-----------+---------+
| 1 | "basketballplayer" | 10 | 1 | "utf8" | "utf8_bin" | "FIXED_STRING(32)" | false | "default" | |
+----+--------------------+------------------+----------------+---------+------------+--------------------+-------------+-----------+---------+
```
-->
113 changes: 111 additions & 2 deletions docs-2.0/8.service-tuning/load-balance.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,118 @@

用户可以使用`BALANCE`语句平衡分片和 Raft leader 的分布,或者清空某些 Storage 服务器方便进行维护。详情请参见 [BALANCE](../synchronization-and-migration/2.balance-syntax.md)

!!! compatibility "历史版本兼容性"
!!! danger

`BALANCE`命令通过创建和执行一组子任务来迁移数据和均衡分片分布,**禁止**停止集群中的任何机器或改变机器的 IP 地址,直到所有子任务完成,否则后续子任务会失败。

## 均衡分片分布

!!! enterpriseonly

仅企业版支持均衡分片分布。

`BALANCE DATA`语句会开始一个任务,将当前图空间的分片平均分配到所有 Storage 服务器。通过创建和执行一组子任务来迁移数据和均衡分片分布。

### 示例

以横向扩容 Nebula Graph 为例,集群中增加新的 Storage 主机后,新主机上没有分片。

1. 执行命令`SHOW HOSTS`检查分片的分布。

```ngql
nebual> SHOW HOSTS;
+-----------------+------+-----------+----------+--------------+-----------------------+------------------------+-------------+
| Host | Port | HTTP port | Status | Leader count | Leader distribution | Partition distribution | Version |
+-----------------+------+-----------+----------+--------------+-----------------------+------------------------+-------------+
| "192.168.8.101" | 9779 | 19669 | "ONLINE" | 0 | "No valid partition" | "No valid partition" | "3.1.0-ent" |
| "192.168.8.100" | 9779 | 19669 | "ONLINE" | 15 | "basketballplayer:15" | "basketballplayer:15" | "3.1.0-ent" |
+-----------------+------+-----------+----------+--------------+-----------------------+------------------------+-------------+
```

2. 进入图空间`basketballplayer`,然后执行命令`BALANCE DATA`将所有分片均衡分布。

```ngql
nebula> USE basketballplayer;
nebula> BALANCE DATA;
+------------+
| New Job Id |
+------------+
| 2 |
+------------+
```

3. 根据返回的任务ID,执行命令`SHOW JOB <job_id>`检查任务状态。

```ngql
nebula> SHOW JOB 2;
+------------------------+------------------------------------------+-------------+---------------------------------+---------------------------------+-------------+
| Job Id(spaceId:partId) | Command(src->dst) | Status | Start Time | Stop Time | Error Code |
+------------------------+------------------------------------------+-------------+---------------------------------+---------------------------------+-------------+
| 2 | "DATA_BALANCE" | "FINISHED" | "2022-04-12T03:41:43.000000000" | "2022-04-12T03:41:53.000000000" | "SUCCEEDED" |
| "2, 1:1" | "192.168.8.100:9779->192.168.8.101:9779" | "SUCCEEDED" | 2022-04-12T03:41:43.000000 | 2022-04-12T03:41:53.000000 | "SUCCEEDED" |
| "2, 1:2" | "192.168.8.100:9779->192.168.8.101:9779" | "SUCCEEDED" | 2022-04-12T03:41:43.000000 | 2022-04-12T03:41:53.000000 | "SUCCEEDED" |
| "2, 1:3" | "192.168.8.100:9779->192.168.8.101:9779" | "SUCCEEDED" | 2022-04-12T03:41:43.000000 | 2022-04-12T03:41:53.000000 | "SUCCEEDED" |
| "2, 1:4" | "192.168.8.100:9779->192.168.8.101:9779" | "SUCCEEDED" | 2022-04-12T03:41:43.000000 | 2022-04-12T03:41:53.000000 | "SUCCEEDED" |
| "2, 1:5" | "192.168.8.100:9779->192.168.8.101:9779" | "SUCCEEDED" | 2022-04-12T03:41:43.000000 | 2022-04-12T03:41:53.000000 | "SUCCEEDED" |
| "2, 1:6" | "192.168.8.100:9779->192.168.8.101:9779" | "SUCCEEDED" | 2022-04-12T03:41:43.000000 | 2022-04-12T03:41:43.000000 | "SUCCEEDED" |
| "2, 1:7" | "192.168.8.100:9779->192.168.8.101:9779" | "SUCCEEDED" | 2022-04-12T03:41:43.000000 | 2022-04-12T03:41:53.000000 | "SUCCEEDED" |
| "Total:7" | "Succeeded:7" | "Failed:0" | "In Progress:0" | "Invalid:0" | "" |
+------------------------+------------------------------------------+-------------+---------------------------------+---------------------------------+-------------+
```

4. 等待所有子任务完成,负载均衡进程结束,执行命令`SHOW HOSTS`确认分片已经均衡分布。

!!! Note

`BALANCE DATA`不会均衡 leader 的分布。均衡 leader 请参见[均衡leader分布](#leader)。

```ngql
nebula> SHOW HOSTS;
+-----------------+------+-----------+----------+--------------+----------------------+------------------------+-------------+
| Host | Port | HTTP port | Status | Leader count | Leader distribution | Partition distribution | Version |
+-----------------+------+-----------+----------+--------------+----------------------+------------------------+-------------+
| "192.168.8.101" | 9779 | 19669 | "ONLINE" | 7 | "basketballplayer:7" | "basketballplayer:7" | "3.1.0-ent" |
| "192.168.8.100" | 9779 | 19669 | "ONLINE" | 8 | "basketballplayer:8" | "basketballplayer:8" | "3.1.0-ent" |
+-----------------+------+-----------+----------+--------------+----------------------+------------------------+-------------+
```

如果有子任务失败,请执行`RECOVER JOB <job_id>`。如果重做负载均衡仍然不能解决问题,请到[Nebula Graph社区](https://discuss.nebula-graph.com.cn/)寻求帮助。

### 停止负载均衡作业

停止负载均衡作业,请执行命令`STOP JOB <job_id>`

- 如果没有正在执行的负载均衡作业,会返回错误。

- 如果有正在执行的负载均衡作业,会返回`Job stopped`

!!! note

- `STOP JOB <job_id>`不会停止正在执行的子任务,而是取消所有后续子任务,状态会置为`INVALID`,然后等待正在执行的子任执行完毕根据结果置为`SUCCEEDED`或`FAILED`。用户可以执行命令`SHOW JOB <job_id>`检查停止的作业状态。
- 宕机重启后,作业状态变为`QUEUE`,子任务如果之前是`INVALID`或`FAILED`,状态会置为`IN_PROGRESS`,如果是`IN_PROGRESS`或`SUCCEEDED`则保持不变。

一旦所有子任务都完成或停止,用户可以再次执行命令`RECOVER JOB <job_id>`重启作业,子任务按原有的状态继续执行。

### 迁移分片

迁移指定的 Storage 主机中的分片来缩小集群规模,可以使用命令`BALANCE DATA REMOVE <ip:port> [,<ip>:<port> ...]`

例如需要迁移`192.168.8.100:9779`中的分片,请执行如下命令:

```ngql
nebula> BALANCE DATA REMOVE 192.168.8.100:9779;
nebula> SHOW HOSTS;
+-----------------+------+-----------+----------+--------------+-----------------------+------------------------+-------------+
| Host | Port | HTTP port | Status | Leader count | Leader distribution | Partition distribution | Version |
+-----------------+------+-----------+----------+--------------+-----------------------+------------------------+-------------+
| "192.168.8.101" | 9779 | 19669 | "ONLINE" | 15 | "basketballplayer:15" | "basketballplayer:15" | "3.1.0-ent" |
| "192.168.8.100" | 9779 | 19669 | "ONLINE" | 0 | "No valid partition" | "No valid partition" | "3.1.0-ent" |
+-----------------+------+-----------+----------+--------------+-----------------------+------------------------+-------------+
```

!!! note

不支持`BALANCE DATA`命令
该命令仅迁移分片,不会将 Storage 主机从集群中删除。删除 Storage 主机请参见[管理 Storage 主机](../4.deployment-and-installation/manage-storage-host.md)

<!-- balance-3.1
!!! danger
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ Dashboard 企业版的可视化监控大屏帮助用户一目了然地把握集

在 Dashboard 顶部导航栏,单击**集群管理**,然后单击**集群监控**->**监控大屏**进入监控大屏页面。

<!-- 增加大屏图片 -->
![tv-dashboard](https://docs-cdn.nebula-graph.com.cn/figures/screen_2022-04-13_cn.png)

| 大屏区域 | 显示信息 |
| ------------ | ------------------------------------------------------------ |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -53,11 +53,15 @@

### Partition 分布

选择指定图空间,查看指定图空间的分片分布情况。显示所有 Storage 服务的 IP 地址、端口,及对应 Storage 服务中的分片数量。
左上方选择指定图空间:

- 查看指定图空间的分片分布情况。显示所有 Storage 服务的 IP 地址、端口,及对应 Storage 服务中的分片数量。
- 单击 **Balance Data** 均衡分布当前图空间中的所有分片。
- 单击 **Balance Data Remove** 迁移指定的 Storage 服务中的所有分片至其他 Storage 服务中,操作前系统会先引导用户选择 Storage 服务所在的节点 IP。


<!-- 增加balance data
- 单击`balance data`,
- 单击`balance remove` -->
-->
单击右上角的**详情**,查看更多信息。

### 分片信息
Expand Down
Loading

0 comments on commit aab09e7

Please sign in to comment.