Errors when directly calling the "run_exp()" function under the "train" command #7155

Soever · 2025-03-04T12:35:52Z

Reminder

I have read the above rules and searched the existing issues.

System Info

`

llamafactory version: 0.9.2.dev0
Platform: Linux-4.19.90-2107.6.0.0192.8.oe1.bclinux.x86_64-x86_64-with-glibc2.35
Python version: 3.11.11
PyTorch version: 2.6.0+cu124 (GPU)
Transformers version: 4.48.3
Datasets version: 3.2.0
Accelerate version: 1.2.1
PEFT version: 0.12.0
TRL version: 0.9.6
GPU type: NVIDIA A800 80GB PCIe
GPU number: 1
GPU memory: 79.14GB
`

Reproduction

`
from LLaMAFactory.src.llamafactory.train.tuner import run_exp,export_model
from LLaMAFactory.src.llamafactory.extras.misc import is_env_enabled,get_device_count,use_ray
from pathlib import Path
import yaml,os

if name == "main":
config_path = "myconfigFile/llama2_lora_sft.yaml"
config = yaml.safe_load(Path(config_path ).absolute().read_text())
run_exp(args=config)
for i in range(10):
force_torchrun = is_env_enabled("FORCE_TORCHRUN")
if force_torchrun or (get_device_count() > 1 and not use_ray()):
print("pass")
else:
run_exp(args=config)
`

myconfigFile/llama2_lora_sft.yaml :
`

model

model_name_or_path: meta-llama/Llama-2-7b-hf
trust_remote_code: true

method

stage: sft
do_train: true
finetuning_type: lora
lora_rank: 8
lora_target: all

dataset

dataset_dir: /LLaMAFactory/data
dataset: alpaca_en_demo
template: llama2
cutoff_len: 2048
max_samples: 100
overwrite_cache: true
preprocessing_num_workers: 16

output

output_dir: saves/llama2/lora/sft
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true

train

per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 1.0e-4
num_train_epochs: 1
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000

`

error :
Traceback (most recent call last): File "/root/autodl-tmp/Code/AgentGym/testIter.py", line 17, in <module> run_exp(args=config) File "/root/autodl-tmp/Code/AgentGym/LLaMAFactory/src/llamafactory/train/tuner.py", line 93, in run_exp _training_function(config={"args": args, "callbacks": callbacks}) File "/root/autodl-tmp/Code/AgentGym/LLaMAFactory/src/llamafactory/train/tuner.py", line 67, in _training_function run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks) File "/root/autodl-tmp/Code/AgentGym/LLaMAFactory/src/llamafactory/train/sft/workflow.py", line 52, in run_sft model = load_model(tokenizer, model_args, finetuning_args, training_args.do_train) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/autodl-tmp/Code/AgentGym/LLaMAFactory/src/llamafactory/model/loader.py", line 160, in load_model model = load_class.from_pretrained(**init_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/agent311/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 564, in from_pretrained return model_class.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/agent311/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4245, in from_pretrained ) = cls._load_pretrained_model( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/agent311/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4815, in _load_pretrained_model new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/agent311/lib/python3.11/site-packages/transformers/modeling_utils.py", line 873, in _load_state_dict_into_meta_model set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs) File "/root/miniconda3/envs/agent311/lib/python3.11/site-packages/accelerate/utils/modeling.py", line 329, in set_module_tensor_to_device new_value = value.to(device) ^^^^^^^^^^^^^^^^ torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 86.00 MiB. GPU 0 has a total capacity of 79.14 GiB of which 76.75 MiB is free. Process 813498 has 79.05 GiB memory in use. Of the allocated memory 78.51 GiB is allocated by PyTorch, and 42.60 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
显存使用情况，多次调用run_exp() 显存不会释放，并依次叠加

Others

How to Release GPU Memory When Directly Calling "run_exp()" ?
在直接调用 "run_exp()" 的情况下需要手动清楚显存占用吗？

The text was updated successfully, but these errors were encountered:

Soever added bug Something isn't working pending This problem is yet to be addressed labels Mar 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Errors when directly calling the "run_exp()" function under the "train" command #7155

Errors when directly calling the "run_exp()" function under the "train" command #7155

Soever commented Mar 4, 2025 •

edited

Loading

Errors when directly calling the "run_exp()" function under the "train" command #7155

Errors when directly calling the "run_exp()" function under the "train" command #7155

Comments

Soever commented Mar 4, 2025 • edited Loading

Reminder

System Info

Reproduction

model

method

dataset

output

train

Others

Soever commented Mar 4, 2025 •

edited

Loading