forked from aliyun/ros-templates
-
Notifications
You must be signed in to change notification settings - Fork 0
/
use-gpu-ecs-to-deploy-chatGLM.yaml
417 lines (385 loc) · 17.3 KB
/
use-gpu-ecs-to-deploy-chatGLM.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
Outputs:
WebUIUrl:
Description:
zh-cn: WebUI访问域名。
en: URL of WebUI.
Value:
Fn::Sub:
- http://${PublicIp}:7860
- PublicIp:
Fn::GetAtt:
- EcsInstance
- PublicIp
ROSTemplateFormatVersion: '2015-09-01'
Description:
zh-cn: 创建ECS实例与GPDB数据库,配置安全组与网络环境,安装ChatGLM模型及依赖,通过WebUI提供服务,自动检查与启动服务,对外暴露7860端口。
en: Create ECS instances and GPDB databases, configure security groups and network
environments, install the ChatGLM model and its dependencies, provide services
via WebUI, automatically monitor and start the service, and expose port 7860 externally.
Parameters:
SystemDiskCategory:
AssociationProperty: ALIYUN::ECS::Disk::SystemDiskCategory
AssociationPropertyMetadata:
InstanceType: ${InstanceType}
ZoneId: ${ZoneId}
Type: String
Description:
zh-cn: '<font color=''blue''><b>可选值:</b></font><br>[cloud_efficiency: <font color=''green''>高效云盘</font>]<br>[cloud_ssd: <font color=''green''>SSD云盘</font>]<br>[cloud_essd: <font color=''green''>ESSD云盘</font>]<br>[cloud: <font color=''green''>普通云盘</font>]<br>[ephemeral_ssd: <font color=''green''>本地SSD盘</font>]'
en: '<font color=''blue''><b>Optional values:</b></font><br>[cloud_efficiency: <font color=''green''>Efficient Cloud Disk</font>]<br>[cloud_ssd: <font color=''green''>SSD Cloud Disk</font>]<br>[cloud_essd: <font color=''green''>ESSD Cloud Disk</font>]<br>[cloud: <font color=''green''>Cloud Disk</font>]<br>[ephemeral_ssd: <font color=''green''>Local SSD Cloud Disk</font>]'
Label:
zh-cn: 系统磁盘类型
en: System Disk Category
InstancePassword:
ConstraintDescription:
zh-cn: 长度8-30,必须包含大写字母、小写字母、数字、特殊符号三种;特殊字符包括:()`~!@#$%^&*_-+=|{}[]:;' <>,.?/
en: 'Length 8-30, must contain upper case letters, lower case letters, Numbers, special symbols three; special characters include: ()`~!@#$%^&*_-+=|{}[]:;''<>,.?/'
Description:
zh-cn: 长度8-30,必须包含大写字母、小写字母、数字、特殊符号三个;<br>特殊字符包括:()`~!@#$%^&*_-+=|{}[]:;'<>,.?/
en: The 8-30 long login password of instance, consists of the uppercase, lowercase letter and number. <br> special characters include()`~!@#$%^&*_-+=|{}[]:;'<>,.?/
MinLength: '8'
Label:
zh-cn: 实例密码
en: Instance Password
AllowedPattern: '[0-9A-Za-z\_\-&:;''<>,=%`~!@#\(\)\$\^\*\+\|\{\}\[\]\.\?\/]+$'
NoEcho: true
MaxLength: '30'
Type: String
AccountName:
Default: mytest
Type: String
Label:
zh-cn: 数据库账号名称
en: DB Account
AccountPassword:
NoEcho: true
Type: String
Label:
zh-cn: 数据库账号密码
en: DB AccountPassword
AssociationProperty: ALIYUN::RDS::Instance::AccountPassword
InstanceType:
AssociationProperty: ALIYUN::ECS::Instance::InstanceType
AssociationPropertyMetadata:
ZoneId: ${ZoneId}
Type: String
Label:
zh-cn: 实例类型
en: Instance Type
ZoneId:
AssociationProperty: ALIYUN::ECS::Instance::ZoneId
Type: String
Description:
zh-cn: 可用区ID。<br><b>注: <font color='blue'>选择可用区前请确认该可用区是否支持创建ECS资源的规格</font></b>
en: Availability Zone ID,<br><b>note: <font color='blue'>Before selecting, please confirm that the Availability Zone supports the specification of creating ECS resources</font></b>
Label:
zh-cn: 可用区ID
en: Available Zone ID
ADBPGInstanceSpec:
Type: String
Label:
en: DBInstanceSpec
zh-cn: 实例规格
ADBPGSegmentStorage:
Type: Number
Label:
en: SegmentStorageSize
zh-cn: 节点存储容量(G)
Default: 200
Resources:
Account:
Type: ALIYUN::GPDB::Account
Properties:
DBInstanceId:
Ref: DBInstance
AccountPassword:
Ref: AccountPassword
AccountName:
Ref: AccountName
EcsSecurityGroup:
Type: ALIYUN::ECS::SecurityGroup
Properties:
SecurityGroupIngress:
- Priority: 100
PortRange: 22/22
NicType: internet
SourceCidrIp: 0.0.0.0/0
IpProtocol: tcp
- Priority: 100
PortRange: 80/80
NicType: intranet
SourceCidrIp: 0.0.0.0/0
IpProtocol: tcp
- Priority: 100
PortRange: 7860/7860
NicType: intranet
SourceCidrIp: 0.0.0.0/0
IpProtocol: tcp
- Priority: 100
PortRange: -1/-1
NicType: intranet
SourceCidrIp: 0.0.0.0/0
IpProtocol: icmp
- Priority: 100
PortRange: 443/443
NicType: intranet
SourceCidrIp: 0.0.0.0/0
IpProtocol: tcp
- Priority: 100
PortRange: 3389/3389
NicType: intranet
SourceCidrIp: 0.0.0.0/0
IpProtocol: tcp
VpcId:
Ref: EcsVpc
WaitConditionHandle:
Type: ALIYUN::ROS::WaitConditionHandle
EcsVSwitch:
Type: ALIYUN::ECS::VSwitch
Properties:
VpcId:
Ref: EcsVpc
CidrBlock: 192.168.1.0/24
ZoneId:
Ref: ZoneId
DBInstance:
Type: ALIYUN::GPDB::ElasticDBInstance
Properties:
SegNodeNum: 4
InstanceSpec:
Ref: ADBPGInstanceSpec
DBInstanceCategory: Basic
EngineVersion: '6.0'
ZoneId:
Ref: ZoneId
VPCId:
Ref: EcsVpc
VSwitchId:
Ref: EcsVSwitch
SegStorageType: cloud_essd
StorageSize:
Ref: ADBPGSegmentStorage
DBInstanceMode: StorageElastic
SecurityIPList:
Fn::GetAtt:
- EcsInstance
- PrivateIp
WaitCondition:
Type: ALIYUN::ROS::WaitCondition
Properties:
Count: 1
Handle:
Ref: WaitConditionHandle
Timeout: 1800
DependsOn: EcsInstance
EcsInstance:
Type: ALIYUN::ECS::Instance
Properties:
SystemDiskCategory:
Ref: SystemDiskCategory
VpcId:
Fn::GetAtt:
- EcsVpc
- VpcId
SecurityGroupId:
Ref: EcsSecurityGroup
ImageId: ubuntu_22
InternetMaxBandwidthOut: 80
IoOptimized: optimized
VSwitchId:
Ref: EcsVSwitch
Password:
Ref: InstancePassword
InstanceType:
Ref: InstanceType
EcsVpc:
Type: ALIYUN::ECS::VPC
Properties:
CidrBlock: 192.168.0.0/16
InstallChatGLM:
Type: ALIYUN::ECS::RunCommand
Properties:
InstanceIds:
- Ref: EcsInstance
Type: RunShellScript
Sync: true
Timeout: 3600
CommandContent:
Fn::Sub: |-
#!/bin/sh
cd /root
echo "---------- Download Data Center Driver For Ubuntu 22.04 ---------- \n" | tee /root/runinit.log
echo "---------- Begin to download ... @ `date` ---------- \n" | tee -a /root/runinit.log
wget -O nvidia-driver-local-repo-ubuntu2204-525.105.17_1.0-1_amd64.deb "https://cn.download.nvidia.com/tesla/525.105.17/nvidia-driver-local-repo-ubuntu2204-525.105.17_1.0-1_amd64.deb" | tee -a /root/runinit.log
echo "---------- Begin to install nvidia & pgdriver ... @ `date` ---------- \n" | tee -a /root/runinit.log
sudo dpkg -i nvidia-driver-local-repo-ubuntu2204-525.105.17_1.0-1_amd64.deb | tee -a /root/runinit.log
sudo cp /var/nvidia-driver-local-repo-ubuntu2204-525.105.17/nvidia-driver-local-321ACFBA-keyring.gpg /usr/share/keyrings/
sudo apt-get update | tee -a /root/runinit.log
sudo DEBIAN_FRONTEND=noninteractive apt-get install nvidia-driver-525 -y | tee -a /root/runinit.log
sudo DEBIAN_FRONTEND=noninteractive apt-get install postgresql-server-dev-all -y | tee -a /root/runinit.log
echo "---------- Check driver ... @ `date` ---------- \n" | tee -a /root/runinit.log
nvidia-smi | tee -a /root/runinit.log
echo "---------- pip3.10 upgrade ... @ `date` ---------- \n" | tee -a /root/runinit.log
pip3.10 install --upgrade pip
pip3.10 cache purge
echo "---------- Prepare requirements.txt ... @ `date` ---------- \n" | tee -a /root/runinit.log
cat > /root/requirements.txt << EOF
langchain==0.0.146
transformers==4.27.1
unstructured[local-inference]
layoutparser[layoutmodels,tesseract]
nltk
sentence-transformers
beautifulsoup4
icetk
cpm_kernels
faiss-cpu
accelerate
gradio==3.28.3
fastapi
uvicorn
peft
EOF
echo "---------- pip install ... @ `date` ---------- \n" | tee -a /root/runinit.log
pip3.10 install -r requirements.txt | tee -a /root/runinit.log
pip3.10 install psycopg2 | tee -a /root/runinit.log
pip3.10 install psycopg2cffi | tee -a /root/runinit.log
pip3.10 install tabulate | tee -a /root/runinit.log
echo -e "\n PreRun Completely @ `date '+%Y-%m-%d %H:%M:%S'` ... " | tee -a /root/runinit.log
cat > /root/chatbot.py <<EOF
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os, time
from subprocess import Popen, PIPE
import argparse
import logging
import warnings
warnings.filterwarnings("ignore")
logging.basicConfig(level=logging.DEBUG,
format='%(asctime)s %(levelname)s %(funcName)s %(message)s',
datefmt='%a, %d %b %Y %H:%M:%S',
filename='chatbot.log',
filemode='w')
console = logging.StreamHandler()
console.setLevel(logging.WARN)
formatter = logging.Formatter('%(asctime)s %(levelname)s %(funcName)s %(message)s')
console.setFormatter(formatter)
logging.getLogger('').addHandler(console)
parser = argparse.ArgumentParser(description='deploy chatGLM.')
parser.add_argument('-db_connection', '--db_connection', action="store", dest='db_connection',
help='input alicloud GPDB connection info.')
parser.add_argument('-db_name', '--db_name', action="store", dest='db_name',
help='input alicloud GPDB name.')
parser.add_argument('-db_port', '--db_port', action="store", dest='db_port',
help='input alicloud GPDB port.')
parser.add_argument('-db_username', '--db_username', action="store", dest='db_username',
help='input alicloud GPDB account username.')
parser.add_argument('-db_password', '--db_password', action="store", dest='db_password',
help='input alicloud GPDB account password.')
parser.add_argument('-ecs_public_ip', '--ecs_public_ip', action="store", dest='ecs_public_ip',
help='input alicloud ECS instance public ip.')
args = parser.parse_args()
def LocalShellCmd(cmd, env=None, shell=True):
p = Popen(
cmd,
stdin = PIPE,
stdout = PIPE,
stderr = PIPE,
env = env,
shell = shell
)
stdout, stderr = p.communicate()
rc = p.wait()
logging.debug("LocalShellCmd => cmd = [%s] \n stdout => [%s] \n" % (cmd, stdout))
assert (rc == 0)
return stdout.strip()
def envCheck():
cmd = "tail -n 1 /root/runinit.log | grep 'PreRun Completely' > /dev/null 2>&1"
LocalShellCmd(cmd)
cmd = "nvidia-smi > /dev/null 2>&1"
LocalShellCmd(cmd)
cmd = "dpkg -l | grep nvidia-driver-525"
LocalShellCmd(cmd)
cmd = "dpkg -l | grep postgresql-server-dev-all"
LocalShellCmd(cmd)
if __name__ == '__main__':
print("\n" + "*"*30 + """ 提示:\n
1)如果脚本执行过程中报错, 可以通过查看 /root/chatbot.log 文件进行自助排错(很简单的)!
2)如果需要重启 WEBUI 等服务或者查看数据库信息等, 可以参考 /root/env.txt 文件!\n"""+ "*"*30 + "\n")
print("*"*30 + "Step0: 正在进行环境检查, 比如驱动和安装依赖包等" + "*"*30)
envCheck()
print("*"*30 + "Step4: 设置操作系统环境变量,准备下载模型并且启动WEB程序,耗时很长!" + "*"*30)
# setting os system variables
os.chdir("/root")
ecsPubIpAddr = args.ecs_public_ip if args.ecs_public_ip else ""
os.environ["PG_HOST"] = args.db_connection if args.db_connection else ""
os.environ["PG_PORT"] = args.db_port if args.db_port else "5432"
os.environ["PG_USER"] = args.db_username if args.db_username else ""
os.environ["PG_PASSWORD"] = args.db_password if args.db_password else ""
os.environ["PG_DATABASE"] = args.db_name if args.db_name else ""
logging.debug("""ADBPG SYSTEM VARIABLE =>
export PG_HOST=%s
export PG_PORT=%s
export PG_USER=%s
export PG_PASSWORD=%s
export PG_DATABASE=%s
""" % (os.environ["PG_HOST"], os.environ["PG_PORT"], os.environ["PG_USER"], os.environ["PG_PASSWORD"], os.environ["PG_DATABASE"]))
with open("env.txt", "w") as fw:
fw.write("export PG_HOST=%s\n" % os.environ["PG_HOST"])
fw.write("export PG_PORT=%s\n" % os.environ["PG_PORT"])
fw.write("export PG_USER=%s\n" % os.environ["PG_USER"])
fw.write("export PG_PASSWORD=%s\n" % os.environ["PG_PASSWORD"])
fw.write("export PG_DATABASE=%s\n" % os.environ["PG_DATABASE"])
fw.write("#webui url=> %s:7860\n" % ecsPubIpAddr)
cmd1 = "cd /root; git clone https://github.com/wangxuqi/langchain-ChatGLM.git ; cd langchain-ChatGLM ; git checkout analyticdb_store"
cmd2 = "nohup python3.10 /root/langchain-ChatGLM/webui.py > webui.log 2>&1 &"
print("*"*35 + "Step4.1: 下载langchain代码!" + "*"*30)
LocalShellCmd(cmd1)
print("*"*35 + """Step4.2: 开始运行chatGLM模型, 由于模型比较大(17GB左右),下载需要较长的时间, 预计需要耗时15分钟左右,请耐心等待,
具体进度可以通过 \033[1;5;32;4m tail -f webui.log \033[0m 来查看 ...""" + "*"*30)
LocalShellCmd(cmd2)
print("*="*30)
print("""
【阿里云不对您在镜像上使用的第三方模型的合法性、安全性、准确性进行任何保证,并不对由此引发的任何损害承担责任;您应自觉遵守在镜像上安装的第三方模型的用户协议、使用规范和相关法律法规,并就使用第三方模型的合法性、合规性自行承担相关责任。】
环境一切准备就绪,您可以通过浏览器打开\n\t\t\t=>=>=> %s:7860 <=<=<=\n\t 来访问和体验有记忆能力的Chatbot了!!!
""" % ecsPubIpAddr)
print("*="*30)
EOF
python3.10 /root/chatbot.py --ecs_public_ip=${EcsInstance.PublicIp} --db_connection=${DBInstance.ConnectionString} --db_port=${DBInstance.Port} --db_username=${Account.AccountName} --db_password=${AccountPassword} --db_name=${Account.AccountName}
sleep 30
i=1
while [ $i -le 10 ]
do
netstat -ntlp | grep 7860
if [ $? -eq 0 ];then
echo 'web service start success.' >> /root/web_service.log
${WaitConditionHandle.CurlCli} --data-binary '{"status": "SUCCESS"}'
break
else
echo 'web service start failed.' >> /root/web_service.log
python3.10 /root/chatbot.py --ecs_public_ip=${EcsInstance.PublicIp} --db_connection=${DBInstance.ConnectionString} --db_port=${DBInstance.Port} --db_username=${Account.AccountName} --db_password=${AccountPassword} --db_name=${Account.AccountName}
sleep 30
let "i++"
fi
done
DependsOn:
- Account
- EcsInstance
Metadata:
ALIYUN::ROS::Interface:
ParameterGroups:
- Parameters:
- ZoneId
- InstanceType
- SystemDiskCategory
- InstancePassword
Label:
default: ECS
- Parameters:
- ADBPGInstanceSpec
- ADBPGSegmentStorage
- AccountName
- AccountPassword
Label:
default: Database
TemplateTags:
- acs:technical-solution:AI:向量数据库构建企业智能知识库-tech_solu_20