Skip to content

Commit

Permalink
curvefs/monitor: promethus
Browse files Browse the repository at this point in the history
1. add curvefs monitor
2. curvefs_tool status-mds show dummy port for dummy mds
3. add port in mountpoints, change mountpoint from {hostname}:{mountpoint} to {hostname}:{port}:{mountpath}
  • Loading branch information
Cyber-SiKu committed Apr 15, 2022
1 parent 9801f3a commit 73b5fc3
Show file tree
Hide file tree
Showing 21 changed files with 6,736 additions and 8 deletions.
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -97,8 +97,10 @@ curvefs/BUILD_MODE
*.pyc
.facts/
*retry
# monitor
curvefs/monitor/prometheus/target.json

curvefs/docker/curvefs
curvefs/docker/*/curvefs
curvefs/docker/base/*
!curvefs/docker/base/Dockerfile
!curvefs/docker/base/Makefile
Expand All @@ -109,3 +111,4 @@ __not_found__
thirdparties/rocksdb/lib/
thirdparties/rocksdb/include/
thirdparties/rocksdb/rocksdb/
thirdparties/rocksdb/*.tar.gz
131 changes: 131 additions & 0 deletions curvefs/monitor/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
# 目录结构介绍

```
monitor
├── curve-monitor.sh # curve集群监控的控制脚本,用于启动、停止、重启监控功能。
├── docker-compose.yml # 编排监控系统相关容器的配置文件,包括prometheus容器、grafana容器。
| # 修改该文件来配置各组件的配置参数。
├── grafana # grafana相关目录
│ ├── dashboards # grafana所有dashboards的json文件存放目录,grafana将从该目录加载文件来创建dashboards;
| | | # 通过update_dashboard.sh脚本来更新最新的dashboards。
│ │ ├── etcd.json
│ │ ├── mds.json
│ │ ├── metaserver.json
│ │ └── clinet.json
│ ├── grafana.ini # grafana的启动配置文件,将映射到容器的 `/etc/grafana/grafana.ini` 上
│ ├── provisioning # grafana预配置相关目录,将映射到容器的`/etc/grafana/provisioning`上
│ │ ├── dashboards
│ │ │ └── all.yml
│ │ └── datasources # grafana的datasources的json文件存放目录,grafana将从该目录加载文件来创建datasources。
│ │ └── all.yml
│ └── report # grafana日报临时目录,将映射到reporter容器的`/tmp/report`目录上
│ └── README
├── grafana-report.py
├── prometheus # prometheus相关目录
│ ├── prometheus.yml # prometheus的配置文件
│ └── target.json
├── README.md
├── target.ini # target_json.py脚本依赖的一些配置
├── target_json.py # 用于生成prometheus监控对象的python脚本,每隔一段时间用curvefs_tool拉取监控目标并更新。
└── update_dashboard.sh # 从grafana界面配置环境当中拉取最新的dashboard,用于更新该环境上grafana的界面。
```

## 使用说明

以下步骤为不使用puppet进行部署的过程。

### 环境初始化

1.部署监控系统的机器需要安装如下组件:

node_exporter、docker、docker-compose、jq

* docker安装

```
$ curl -fsSL get.docker.com -o get-docker.sh
$ sudo sh get-docker.sh --mirror Aliyun
```

或者直接安装

```
apt-get install docker-ce
apt-get install docker-ce-cli
```

* docker-compose

* ```
curl -L https://github.com/docker/compose/releases/download/1.18.0/docker-compose-`uname -s`-`uname -m` -o /usr/local/bin/docker-compose
chmod +x /usr/local/bin/docker-compose
```

或者直接安装

```
apt-get install docker-compose
```

* node_exporter

可能很多节点都要安装,可以用脚本来一起装,如下面的方式:

```
for i in {1..4};
do
scp -P 1046 ~/Downloads/node_exporter-0.18.1.linux-amd64.tar.gz yangyaokai@pubt1-curve$i.yq.163.org:~/
ssh -p 1046 yangyaokai@pubt1-curve$i.yq.163.org "tar zxvf node_exporter-0.18.1.linux-amd64.tar.gz ; cd node_exporter-0.18.1.linux-amd64 ; nohup ./node_exporter >/dev/null 2>log &"
echo $i
done
```

* jq

update_dashboard.sh脚本需要依赖jq命令,这个一般机器上都没装

```
apt-get install jq
```

2.chunkserver上安装node_exporter(机器监控可以依赖哨兵,可以不装)


### 部署监控系统

* 修改相关配置

1.修改target_json.py文件中相应的配置

2.修改update_dashboard.sh,将 URL 和 LOGIN 改为对应的地址和用户名密码

3.修改docker-compose.yml文件,主要是映射的目录路径

* 启动docker-compose

在当前目录下执行如下命令即可

```curve-monitor.sh start ```

* 部署grafana每日报表

crontab配置定时任务,添加如下任务:
30 8 * * * python /etc/curve/monitor/grafana-report.py >> /etc/curve/monitor/cron.log 2>&1
如果机器上没有配置其他的定时任务,可直接用下面命令
echo "30 8 * * * python /etc/curve/monitor/grafana-report.py >> /etc/curve/monitor/cron.log 2>&1" >> conf && crontab conf && rm -f conf

#### 对接puppet

如果对接puppet,配置相关文件都会放到puppet上,配置的变更都要上传到puppet上。

puppet上管理的配置包括:docker-compose.yml、target.ini、grafana.ini、prometheus.yml

通过安装包安装完curve-monitor以后,会将curve-monitor.sh拷贝到/usr/bin目录下,可以通过以下命令管理监控系统:

启动:```curve-monitor.sh start```

停止:```curve-monitor.sh stop```

重启:```curve-monitor.sh restart```

上面环境初始化中的依赖的包puppet基本都会帮忙安装,除了node_exporter需要自己安装。
84 changes: 84 additions & 0 deletions curvefs/monitor/curve-monitor.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
#!/bin/sh

#
# Copyright (c) 2022 NetEase Inc.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License along
# with this program; if not, write to the Free Software Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
#

#sh update_dashboard.sh
#echo "update dashboards success!"

WORKDIR=/etc/curvefs/monitor

if [ ! -d $WORKDIR ]; then
echo "${WORKDIR} not exists"
exit 1
fi

cd $WORKDIR
chmod -R 777 prometheus
chmod -R 777 grafana

start() {
echo "==========start==========="

echo "" > monitor.log

stdbuf -oL python3 target_json.py >> monitor.log 2>&1 &
echo "start prometheus targets service success!"

docker-compose up >> monitor.log 2>&1 &
echo "start metric system success!"
}

stop() {
echo "===========stop============"

docker-compose down

ID=`(ps -ef | grep "target_json.py"| grep -v "grep") | awk '{print $2}'`
for id in $ID
do
kill -9 $id
echo "killed $id"
done
}

restart() {
stop
echo "sleeping........."
sleep 3
start
}

case "$1" in
'start')
start
;;
'stop')
stop
;;
'status')
status
;;
'restart')
restart
;;
*)
echo "usage: $0 {start|stop|restart}"
exit 1
;;
esac
57 changes: 57 additions & 0 deletions curvefs/monitor/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
#
# Copyright (c) 2020 NetEase Inc.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License along
# with this program; if not, write to the Free Software Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
#

version: '2.0'

services:

prometheus:
image: prom/prometheus:latest
volumes:
- ./prometheus/:/etc/prometheus/:rw
- ./prometheus/data:/prometheus:rw
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--storage.tsdb.retention.time=7d'
- '--storage.tsdb.retention.size=256GB'
- '--web.console.libraries=/usr/share/prometheus/console_libraries'
- '--web.console.templates=/usr/share/prometheus/consoles'
- '--web.listen-address=:9090'
network_mode: host

grafana:
image: grafana/grafana
depends_on:
- prometheus
network_mode: host
volumes:
- ./grafana/data:/var/lib/grafana:rw
- ./grafana/grafana.ini:/etc/grafana/grafana.ini:rw
environment:
- GF_INSTALL_PLUGINS=grafana-piechart-panel
- GF_SECURITY_ADMIN_USER=admin
- GF_SECURITY_ADMIN_PASSWORD=curve

reporter:
image: promoon/reporter:latest
volumes:
- /etc/localtime:/etc/localtime:ro
- /etc/timezone:/etc/timezone:ro
- ./grafana/report:/tmp/report:rw
network_mode: host
Loading

0 comments on commit 73b5fc3

Please sign in to comment.