You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
/home/jhpark/miniconda3/envs/kullm-39/lib/python3.9/site-packages/torch/distributed/launch.py:183: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use-env is set by default in torchrun.
If your script expects --local-rank argument to be set, please
change it to read from os.environ['LOCAL_RANK'] instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions
accelerate active - gpu device : {'': 0}
world size : 1
The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.
Loading checkpoint shards: 100%|████████████████████████████████████| 28/28 [00:09<00:00, 3.05it/s]
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'PreTrainedTokenizerFast'.
The class this function is called from is 'GPTNeoXTokenizerFast'.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
trainable params: 6,553,600 || all params: 12,900,157,440 || trainable%: 0.0508
Map: 100%|██████████████████████████████████████████| 150630/150630 [02:42<00:00, 926.18 examples/s]
Map: 100%|██████████████████████████████████████████████| 2000/2000 [00:02<00:00, 936.61 examples/s]
/home/jhpark/miniconda3/envs/kullm-39/lib/python3.9/site-packages/transformers/training_args.py:1494: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead
warnings.warn(
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
0%| | 0/1176 [00:00<?, ?it/s]/home/jhpark/miniconda3/envs/kullm-39/lib/python3.9/site-packages/torch/utils/checkpoint.py:464: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.4 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.
warnings.warn(
/home/jhpark/miniconda3/envs/kullm-39/lib/python3.9/site-packages/bitsandbytes/autograd/_functions.py:316: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization
warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")
[rank0]: Traceback (most recent call last):
[rank0]: File "/raid/workspace/jhpark/work/KULLM/KULLM/finetune_polyglot.py", line 291, in <module>
[rank0]: fire.Fire(train)
[rank0]: File "/home/jhpark/miniconda3/envs/kullm-39/lib/python3.9/site-packages/fire/core.py", line 143, in Fire
[rank0]: component_trace = _Fire(component, args, parsed_flag_args, context, name)
[rank0]: File "/home/jhpark/miniconda3/envs/kullm-39/lib/python3.9/site-packages/fire/core.py", line 477, in _Fire
[rank0]: component, remaining_args = _CallAndUpdateTrace(
[rank0]: File "/home/jhpark/miniconda3/envs/kullm-39/lib/python3.9/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
[rank0]: component = fn(*varargs, **kwargs)
[rank0]: File "/raid/workspace/jhpark/work/KULLM/KULLM/finetune_polyglot.py", line 275, in train
[rank0]: trainer.train(resume_from_checkpoint=resume_from_checkpoint)
[rank0]: File "/home/jhpark/miniconda3/envs/kullm-39/lib/python3.9/site-packages/transformers/trainer.py", line 1932, in train
[rank0]: return inner_training_loop(
[rank0]: File "/home/jhpark/miniconda3/envs/kullm-39/lib/python3.9/site-packages/transformers/trainer.py", line 2268, in _inner_training_loop
[rank0]: tr_loss_step = self.training_step(model, inputs)
[rank0]: File "/home/jhpark/miniconda3/envs/kullm-39/lib/python3.9/site-packages/transformers/trainer.py", line 3324, in training_step
[rank0]: self.accelerator.backward(loss, **kwargs)
[rank0]: File "/home/jhpark/miniconda3/envs/kullm-39/lib/python3.9/site-packages/accelerate/accelerator.py", line 2130, in backward
[rank0]: self.scaler.scale(loss).backward(**kwargs)
[rank0]: File "/home/jhpark/miniconda3/envs/kullm-39/lib/python3.9/site-packages/torch/_tensor.py", line 525, in backward
[rank0]: torch.autograd.backward(
[rank0]: File "/home/jhpark/miniconda3/envs/kullm-39/lib/python3.9/site-packages/torch/autograd/__init__.py", line 267, in backward
[rank0]: _engine_run_backward(
[rank0]: File "/home/jhpark/miniconda3/envs/kullm-39/lib/python3.9/site-packages/torch/autograd/graph.py", line 744, in _engine_run_backward
[rank0]: return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
[rank0]: File "/home/jhpark/miniconda3/envs/kullm-39/lib/python3.9/site-packages/torch/autograd/function.py", line 301, in apply
[rank0]: return user_fn(self, *args)
[rank0]: File "/home/jhpark/miniconda3/envs/kullm-39/lib/python3.9/site-packages/torch/utils/checkpoint.py", line 320, in backward
[rank0]: torch.autograd.backward(outputs_with_grad, args_with_grad)
[rank0]: File "/home/jhpark/miniconda3/envs/kullm-39/lib/python3.9/site-packages/torch/autograd/__init__.py", line 267, in backward
[rank0]: _engine_run_backward(
[rank0]: File "/home/jhpark/miniconda3/envs/kullm-39/lib/python3.9/site-packages/torch/autograd/graph.py", line 744, in _engine_run_backward
[rank0]: return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
[rank0]: RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the `forward` function. Please make sure model parameters are not shared across multiple concurrent forward-backward passes. or try to use _set_static_graph() as a workaround if this module graph does not change during training loop.2) Reused parameters in multiple reentrant backward passes. For example, if you use multiple `checkpoint` functions to wrap the same part of your model, it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases in default. You can try to use _set_static_graph() as a workaround if your module graph does not change over iterations.
[rank0]: Parameter at index 79 with name base_model.model.gpt_neox.layers.39.attention.query_key_value.lora_B.default.weight has been marked as ready twice. This means that multiple autograd engine hooks have fired for this particular parameter during this iteration.
0%| | 0/1176 [00:02<?, ?it/s]
E0701 23:20:30.502648 140707882366784 torch/distributed/elastic/multiprocessing/api.py:826] failed (exitcode: 1) local_rank: 0 (pid: 3118915) of binary: /home/jhpark/miniconda3/envs/kullm-39/bin/python3
Traceback (most recent call last):
File "/home/jhpark/miniconda3/envs/kullm-39/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/jhpark/miniconda3/envs/kullm-39/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/jhpark/miniconda3/envs/kullm-39/lib/python3.9/site-packages/torch/distributed/launch.py", line 198, in <module>
main()
File "/home/jhpark/miniconda3/envs/kullm-39/lib/python3.9/site-packages/torch/distributed/launch.py", line 194, in main
launch(args)
File "/home/jhpark/miniconda3/envs/kullm-39/lib/python3.9/site-packages/torch/distributed/launch.py", line 179, in launch
run(args)
File "/home/jhpark/miniconda3/envs/kullm-39/lib/python3.9/site-packages/torch/distributed/run.py", line 870, in run
elastic_launch(
File "/home/jhpark/miniconda3/envs/kullm-39/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/jhpark/miniconda3/envs/kullm-39/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 263, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
finetune_polyglot.py FAILED
------------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
------------------------------------------------------------
학습시 아래의 에러가 발생하였는데, 도움 부탁드려요..
/home/jhpark/miniconda3/envs/kullm-39/lib/python3.9/site-packages/torch/distributed/launch.py:183: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use-env is set by default in torchrun.
If your script expects
--local-rank
argument to be set, pleasechange it to read from
os.environ['LOCAL_RANK']
instead. Seehttps://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions
warnings.warn(
ddp active - gpu device : 0
Training Alpaca-LoRA model with params:
base_model: EleutherAI/polyglot-ko-12.8b
data_path: data/kullm-v2.jsonl
output_dir: ckpt/
batch_size: 128
micro_batch_size: 4
num_epochs: True
learning_rate: True
cutoff_len: 512
val_set_size: 2000
lora_r: 8
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules: ['query_key_value', 'xxx']
train_on_inputs: True
add_eos_token: False
group_by_length: True
wandb_project:
wandb_run_name:
wandb_watch:
wandb_log_model:
resume_from_checkpoint: False
prompt template: kullm
============================================================
python 3.9 에서 하기 실행시 발생하는 에러이다.
python3 -m torch.distributed.launch --master_port=34322 --nproc_per_node 1 finetune_polyglot.py
--fp16
--base_model 'EleutherAI/polyglot-ko-12.8b'
--data_path data/kullm-v2.jsonl
--output_dir ckpt/$SAVE_DIR
--prompt_template_name kullm
--batch_size 128
--micro_batch_size 4
--num_epochs $EPOCH
--learning_rate $LR
--cutoff_len 512
--val_set_size 2000
--lora_r 8
--lora_alpha 16
--lora_dropout 0.05
--lora_target_modules "[query_key_value, xxx]"
--train_on_inputs
--logging_steps 1
--eval_steps 40
--weight_decay 0.
--warmup_steps 0
--warmup_ratio 0.1
--lr_scheduler_type "cosine"
--group_by_length
python packages 목록
(kullm-39) jhpark@dgx-a100:/raid/workspace/jhpark/work/KULLM/KULLM$ pip list
Package Version
accelerate 0.31.0
aiofiles 23.2.1
aiohttp 3.9.5
aiosignal 1.3.1
altair 5.3.0
annotated-types 0.7.0
anyio 4.4.0
appdirs 1.4.4
asttokens 2.4.1
async-timeout 4.0.3
attrs 23.2.0
bitsandbytes 0.43.1
black 24.4.2
certifi 2024.6.2
charset-normalizer 3.3.2
click 8.1.7
contourpy 1.2.1
cycler 0.12.1
datasets 2.20.0
decorator 5.1.1
dill 0.3.8
dnspython 2.6.1
email_validator 2.2.0
exceptiongroup 1.2.1
executing 2.0.1
fastapi 0.111.0
fastapi-cli 0.0.4
ffmpy 0.3.2
filelock 3.15.4
fire 0.6.0
fonttools 4.53.0
frozenlist 1.4.1
fsspec 2024.5.0
gradio 4.37.2
gradio_client 1.0.2
h11 0.14.0
httpcore 1.0.5
httptools 0.6.1
httpx 0.27.0
huggingface-hub 0.23.4
idna 3.7
importlib_resources 6.4.0
ipython 8.18.1
jedi 0.19.1
Jinja2 3.1.4
jsonschema 4.22.0
jsonschema-specifications 2023.12.1
kiwisolver 1.4.5
loralib 0.1.2
markdown-it-py 3.0.0
MarkupSafe 2.1.5
matplotlib 3.9.0
matplotlib-inline 0.1.7
mdurl 0.1.2
mpmath 1.3.0
multidict 6.0.5
multiprocess 0.70.16
mypy-extensions 1.0.0
networkx 3.2.1
numpy 1.26.4
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12 8.9.2.26
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106
nvidia-nccl-cu12 2.20.5
nvidia-nvjitlink-cu12 12.5.40
nvidia-nvtx-cu12 12.1.105
orjson 3.10.5
packaging 24.1
pandas 2.2.2
parso 0.8.4
pathspec 0.12.1
peft 0.11.2.dev0
pexpect 4.9.0
pillow 10.3.0
pip 24.0
platformdirs 4.2.2
prompt_toolkit 3.0.47
psutil 6.0.0
ptyprocess 0.7.0
pure-eval 0.2.2
pyarrow 16.1.0
pyarrow-hotfix 0.6
pydantic 2.7.4
pydantic_core 2.18.4
pydub 0.25.1
Pygments 2.18.0
pyparsing 3.1.2
python-dateutil 2.9.0.post0
python-dotenv 1.0.1
python-multipart 0.0.9
pytz 2024.1
PyYAML 6.0.1
referencing 0.35.1
regex 2024.5.15
requests 2.32.3
rich 13.7.1
rpds-py 0.18.1
ruff 0.5.0
safetensors 0.4.3
scipy 1.13.1
semantic-version 2.10.0
sentencepiece 0.2.0
setuptools 69.5.1
shellingham 1.5.4
six 1.16.0
sniffio 1.3.1
stack-data 0.6.3
starlette 0.37.2
sympy 1.12.1
termcolor 2.4.0
tokenize-rt 5.2.0
tokenizers 0.19.1
tomli 2.0.1
tomlkit 0.12.0
toolz 0.12.1
torch 2.3.1
tqdm 4.66.4
traitlets 5.14.3
transformers 4.42.3
triton 2.3.1
typer 0.12.3
typing_extensions 4.12.2
tzdata 2024.1
ujson 5.10.0
urllib3 2.2.2
uvicorn 0.30.1
uvloop 0.19.0
watchfiles 0.22.0
wcwidth 0.2.13
websockets 11.0.3
wheel 0.43.0
xxhash 3.4.1
yarl 1.9.4
zipp 3.19.2
The text was updated successfully, but these errors were encountered: