Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

baichuan13B load error #106

Open
leiwen83 opened this issue Aug 28, 2023 · 1 comment
Open

baichuan13B load error #106

leiwen83 opened this issue Aug 28, 2023 · 1 comment

Comments

@leiwen83
Copy link

leiwen83 commented Aug 28, 2023

Hi,

I try use lightllm with baichuan13B model, but get below error. I cannot find any TrainingArguments in the code, so is there anything else need to be configured?... The same checkpoint could be loaded by vllm and works well...

The launch command is as:

python -m lightllm.server.api_server --trust_remote_code --model_dir  /xxxx/  --host 0.0.0.0 --port 8000 --tp 2

Error log:

load model error: Can't get attribute 'TrainingArguments' on <module 'lightllm.server.api_server' from '/usr/local/lib/python3.10/dist-packages/lightllm-1.0.0-py3.10.egg/lightllm/server/api_server.py'> Can't get attribute 'TrainingArguments' on <module 'lightllm.server.api_server' from '/usr/local/lib/python3.10/dist-packages/lightllm-1.0.0-py3.10.egg/lightllm/server/api_server.py'> <class 'AttributeError'>
################
load model error: Can't get attribute 'TrainingArguments' on <module 'lightllm.server.api_server' from '/usr/local/lib/python3.10/dist-packages/lightllm-1.0.0-py3.10.egg/lightllm/server/api_server.py'> Can't get attribute 'TrainingArguments' on <module 'lightllm.server.api_server' from '/usr/local/lib/python3.10/dist-packages/lightllm-1.0.0-py3.10.egg/lightllm/server/api_server.py'> <class 'AttributeError'>
router init state: Traceback (most recent call last):

  File "/usr/local/lib/python3.10/dist-packages/lightllm-1.0.0-py3.10.egg/lightllm/server/router/manager.py", line 257, in start_router_process
    asyncio.run(router.wait_to_model_ready())

  File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)

  File "uvloop/loop.pyx", line 1517, in uvloop.loop.Loop.run_until_complete

  File "/usr/local/lib/python3.10/dist-packages/lightllm-1.0.0-py3.10.egg/lightllm/server/router/manager.py", line 62, in wait_to_model_ready
    await asyncio.gather(*init_model_ret)

  File "/usr/local/lib/python3.10/dist-packages/lightllm-1.0.0-py3.10.egg/lightllm/server/router/model_infer/model_rpc.py", line 211, in init_model
    await ans

  File "/usr/local/lib/python3.10/dist-packages/lightllm-1.0.0-py3.10.egg/lightllm/server/router/model_infer/model_rpc.py", line 189, in _func
    return ans.value

  File "/usr/local/lib/python3.10/dist-packages/rpyc-5.3.1-py3.10.egg/rpyc/core/async_.py", line 108, in value
    raise self._obj

_get_exception_class.<locals>.Derived: Can't get attribute 'TrainingArguments' on <module 'lightllm.server.api_server' from '/usr/local/lib/python3.10/dist-packages/lightllm-1.0.0-py3.10.egg/lightllm/server/api_server.py'>

========= Remote Traceback (1) =========
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/rpyc-5.3.1-py3.10.egg/rpyc/core/protocol.py", line 359, in _dispatch_request
    res = self._HANDLERS[handler](self, *args)
  File "/usr/local/lib/python3.10/dist-packages/rpyc-5.3.1-py3.10.egg/rpyc/core/protocol.py", line 837, in _handle_call
    return obj(*args, **dict(kwargs))
  File "/usr/local/lib/python3.10/dist-packages/lightllm-1.0.0-py3.10.egg/lightllm/server/router/model_infer/model_rpc.py", line 77, in exposed_init_model
    raise e
  File "/usr/local/lib/python3.10/dist-packages/lightllm-1.0.0-py3.10.egg/lightllm/server/router/model_infer/model_rpc.py", line 63, in exposed_init_model
    self.model = Baichuan13bTpPartModel(rank_id, world_size, weight_dir, max_total_token_num, load_way, mode)
  File "/usr/local/lib/python3.10/dist-packages/lightllm-1.0.0-py3.10.egg/lightllm/models/baichuan13b/model.py", line 21, in __init__
    super().__init__(tp_rank, world_size, weight_dir, max_total_token_num, load_way, mode)
  File "/usr/local/lib/python3.10/dist-packages/lightllm-1.0.0-py3.10.egg/lightllm/models/llama/model.py", line 30, in __init__
    super().__init__(tp_rank, world_size, weight_dir, max_total_token_num, load_way, mode)
  File "/usr/local/lib/python3.10/dist-packages/lightllm-1.0.0-py3.10.egg/lightllm/common/basemodel/basemodel.py", line 37, in __init__
    self._init_weights()
  File "/usr/local/lib/python3.10/dist-packages/lightllm-1.0.0-py3.10.egg/lightllm/common/basemodel/basemodel.py", line 70, in _init_weights
    load_hf_weights(
  File "/usr/local/lib/python3.10/dist-packages/lightllm-1.0.0-py3.10.egg/lightllm/common/basemodel/layer_weights/hf_load_utils.py", line 25, in load_hf_weights
    weights = torch.load(os.path.join(weight_dir, file_), 'cpu')
  File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 789, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 1131, in _load
    result = unpickler.load()
  File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 1124, in find_class
    return super().find_class(mod_name, name)
AttributeError: Can't get attribute 'TrainingArguments' on <module 'lightllm.server.api_server' from '/usr/local/lib/python3.10/dist-packages/lightllm-1.0.0-py3.10.egg/lightllm/server/api_server.py'>

@leiwen83
Copy link
Author

it could be fixed by below change:

diff --git a/lightllm/common/basemodel/layer_weights/hf_load_utils.py b/lightllm/common/basemodel/layer_weights/hf_load_utils.py
index 30be3a5..d9ef3ad 100644
--- a/lightllm/common/basemodel/layer_weights/hf_load_utils.py
+++ b/lightllm/common/basemodel/layer_weights/hf_load_utils.py
@@ -15,7 +15,8 @@ def load_hf_weights(data_type, weight_dir, pre_post_layer=None, transformer_laye
     candidate_files = list(filter(lambda x : x.endswith('.safetensors'), files))
     if len(candidate_files) == 0:
         use_safetensors = False
-        candidate_files = list(filter(lambda x : x.endswith('.bin'), files))
+        candidate_files = list(filter(lambda x : x.endswith('.bin') and not x.endswith("training_args.bin"), files))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant