baichuan13B load error #106

leiwen83 · 2023-08-28T13:08:52Z

Hi,

I try use lightllm with baichuan13B model, but get below error. I cannot find any TrainingArguments in the code, so is there anything else need to be configured?... The same checkpoint could be loaded by vllm and works well...

The launch command is as:

python -m lightllm.server.api_server --trust_remote_code --model_dir  /xxxx/  --host 0.0.0.0 --port 8000 --tp 2

Error log:

load model error: Can't get attribute 'TrainingArguments' on <module 'lightllm.server.api_server' from '/usr/local/lib/python3.10/dist-packages/lightllm-1.0.0-py3.10.egg/lightllm/server/api_server.py'> Can't get attribute 'TrainingArguments' on <module 'lightllm.server.api_server' from '/usr/local/lib/python3.10/dist-packages/lightllm-1.0.0-py3.10.egg/lightllm/server/api_server.py'> <class 'AttributeError'>
################
load model error: Can't get attribute 'TrainingArguments' on <module 'lightllm.server.api_server' from '/usr/local/lib/python3.10/dist-packages/lightllm-1.0.0-py3.10.egg/lightllm/server/api_server.py'> Can't get attribute 'TrainingArguments' on <module 'lightllm.server.api_server' from '/usr/local/lib/python3.10/dist-packages/lightllm-1.0.0-py3.10.egg/lightllm/server/api_server.py'> <class 'AttributeError'>
router init state: Traceback (most recent call last):

  File "/usr/local/lib/python3.10/dist-packages/lightllm-1.0.0-py3.10.egg/lightllm/server/router/manager.py", line 257, in start_router_process
    asyncio.run(router.wait_to_model_ready())

  File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)

  File "uvloop/loop.pyx", line 1517, in uvloop.loop.Loop.run_until_complete

  File "/usr/local/lib/python3.10/dist-packages/lightllm-1.0.0-py3.10.egg/lightllm/server/router/manager.py", line 62, in wait_to_model_ready
    await asyncio.gather(*init_model_ret)

  File "/usr/local/lib/python3.10/dist-packages/lightllm-1.0.0-py3.10.egg/lightllm/server/router/model_infer/model_rpc.py", line 211, in init_model
    await ans

  File "/usr/local/lib/python3.10/dist-packages/lightllm-1.0.0-py3.10.egg/lightllm/server/router/model_infer/model_rpc.py", line 189, in _func
    return ans.value

  File "/usr/local/lib/python3.10/dist-packages/rpyc-5.3.1-py3.10.egg/rpyc/core/async_.py", line 108, in value
    raise self._obj

_get_exception_class.<locals>.Derived: Can't get attribute 'TrainingArguments' on <module 'lightllm.server.api_server' from '/usr/local/lib/python3.10/dist-packages/lightllm-1.0.0-py3.10.egg/lightllm/server/api_server.py'>

========= Remote Traceback (1) =========
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/rpyc-5.3.1-py3.10.egg/rpyc/core/protocol.py", line 359, in _dispatch_request
    res = self._HANDLERS[handler](self, *args)
  File "/usr/local/lib/python3.10/dist-packages/rpyc-5.3.1-py3.10.egg/rpyc/core/protocol.py", line 837, in _handle_call
    return obj(*args, **dict(kwargs))
  File "/usr/local/lib/python3.10/dist-packages/lightllm-1.0.0-py3.10.egg/lightllm/server/router/model_infer/model_rpc.py", line 77, in exposed_init_model
    raise e
  File "/usr/local/lib/python3.10/dist-packages/lightllm-1.0.0-py3.10.egg/lightllm/server/router/model_infer/model_rpc.py", line 63, in exposed_init_model
    self.model = Baichuan13bTpPartModel(rank_id, world_size, weight_dir, max_total_token_num, load_way, mode)
  File "/usr/local/lib/python3.10/dist-packages/lightllm-1.0.0-py3.10.egg/lightllm/models/baichuan13b/model.py", line 21, in __init__
    super().__init__(tp_rank, world_size, weight_dir, max_total_token_num, load_way, mode)
  File "/usr/local/lib/python3.10/dist-packages/lightllm-1.0.0-py3.10.egg/lightllm/models/llama/model.py", line 30, in __init__
    super().__init__(tp_rank, world_size, weight_dir, max_total_token_num, load_way, mode)
  File "/usr/local/lib/python3.10/dist-packages/lightllm-1.0.0-py3.10.egg/lightllm/common/basemodel/basemodel.py", line 37, in __init__
    self._init_weights()
  File "/usr/local/lib/python3.10/dist-packages/lightllm-1.0.0-py3.10.egg/lightllm/common/basemodel/basemodel.py", line 70, in _init_weights
    load_hf_weights(
  File "/usr/local/lib/python3.10/dist-packages/lightllm-1.0.0-py3.10.egg/lightllm/common/basemodel/layer_weights/hf_load_utils.py", line 25, in load_hf_weights
    weights = torch.load(os.path.join(weight_dir, file_), 'cpu')
  File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 789, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 1131, in _load
    result = unpickler.load()
  File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 1124, in find_class
    return super().find_class(mod_name, name)
AttributeError: Can't get attribute 'TrainingArguments' on <module 'lightllm.server.api_server' from '/usr/local/lib/python3.10/dist-packages/lightllm-1.0.0-py3.10.egg/lightllm/server/api_server.py'>

The text was updated successfully, but these errors were encountered:

leiwen83 · 2023-08-28T14:21:22Z

it could be fixed by below change:

diff --git a/lightllm/common/basemodel/layer_weights/hf_load_utils.py b/lightllm/common/basemodel/layer_weights/hf_load_utils.py
index 30be3a5..d9ef3ad 100644
--- a/lightllm/common/basemodel/layer_weights/hf_load_utils.py
+++ b/lightllm/common/basemodel/layer_weights/hf_load_utils.py
@@ -15,7 +15,8 @@ def load_hf_weights(data_type, weight_dir, pre_post_layer=None, transformer_laye
     candidate_files = list(filter(lambda x : x.endswith('.safetensors'), files))
     if len(candidate_files) == 0:
         use_safetensors = False
-        candidate_files = list(filter(lambda x : x.endswith('.bin'), files))
+        candidate_files = list(filter(lambda x : x.endswith('.bin') and not x.endswith("training_args.bin"), files))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

baichuan13B load error #106

baichuan13B load error #106

leiwen83 commented Aug 28, 2023 •

edited

Loading

leiwen83 commented Aug 28, 2023

baichuan13B load error #106

baichuan13B load error #106

Comments

leiwen83 commented Aug 28, 2023 • edited Loading

leiwen83 commented Aug 28, 2023

leiwen83 commented Aug 28, 2023 •

edited

Loading