-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathmycserverlog
32 lines (32 loc) · 3.22 KB
/
mycserverlog
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
/usr/local/lib/python3.11/dist-packages/paramiko/pkey.py:100: CryptographyDeprecationWarning: TripleDES has been moved to cryptography.hazmat.decrepit.ciphers.algorithms.TripleDES and will be removed from this module in 48.0.0.
"cipher": algorithms.TripleDES,
/usr/local/lib/python3.11/dist-packages/paramiko/transport.py:259: CryptographyDeprecationWarning: TripleDES has been moved to cryptography.hazmat.decrepit.ciphers.algorithms.TripleDES and will be removed from this module in 48.0.0.
"class": algorithms.TripleDES,
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.
INFO:server:Set CUDA_VISIBLE_DEVICES to 0
INFO:server:http://0.0.0.0:30000, ports: PortArgs(tokenizer_port=10000, router_port=10001, detokenizer_port=10002, nccl_port=10003, migrate_port=10004, model_rpc_ports=[10005, 10006, 10007])
INFO:model_rpc:Use sleep forwarding: False
INFO:model_rpc:schedule_heuristic: fcfs-s
INFO:model_runner:Rank 0: load weight begin.
INFO:model_runner:Rank 0: load weight end.
INFO:model_runner:kv one token size: 32 * 128 * 32 * 2 * 2 = 524288 bytes
INFO:model_runner:kv one token size: 32 * 128 * 32 * 2 * 2 = 524288 bytes
INFO:model_runner:total_cpu_memory_GB : 194.90850448608398, max_total_num_token : 7806, max_cpu_num_token : 278663
INFO:model_rpc:Rank 0: max_total_num_token=7806, max_prefill_num_token=33768, context_len=33768,
INFO:model_rpc:server_args: enable_flashinfer=True, attention_reduce_in_fp32=False, disable_radix_cache=False, disable_regex_jump_forward=False, disable_disk_cache=False,
INFO:sglang.srt.managers.router.radix_cache:using RadixCache
INFO: Started server process [6083]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8001 (Press CTRL+C to quit)
INFO:model_rpc:Cache flushed successfully!
INFO:model_rpc:Cache flushed successfully!
INFO:model_rpc:GPU 0: decode out of memory happened, #retracted_reqs: 1, #new_token_ratio: 0.3921 -> 0.4421
INFO:model_rpc:GPU 0: decode out of memory happened, #retracted_reqs: 1, #new_token_ratio: 0.4405 -> 0.4905
INFO:model_rpc:GPU 0: decode out of memory happened, #retracted_reqs: 1, #new_token_ratio: 0.4895 -> 0.5395
INFO:model_rpc:GPU 0: decode out of memory happened, #retracted_reqs: 1, #new_token_ratio: 0.5393 -> 0.5893
INFO:model_rpc:GPU 0: decode out of memory happened, #retracted_reqs: 1, #new_token_ratio: 0.5813 -> 0.6313
INFO:model_rpc:GPU 0: decode out of memory happened, #retracted_reqs: 1, #new_token_ratio: 0.6217 -> 0.6717
INFO:model_rpc:GPU 0: decode out of memory happened, #retracted_reqs: 1, #new_token_ratio: 0.6598 -> 0.7098
INFO:model_rpc:GPU 0: decode out of memory happened, #retracted_reqs: 1, #new_token_ratio: 0.7012 -> 0.7512
INFO:sglang.srt.managers.router.radix_cache:len(self.cnt_time): 0