Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG: Error during training #85

Open
Chasapas opened this issue Jul 25, 2024 · 1 comment
Open

[BUG: Error during training #85

Chasapas opened this issue Jul 25, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@Chasapas
Copy link

Python Version

Python 3.10.12 (main, Mar 22 2024, 16:50:05) [GCC 11.4.0]

Error:

TypeError: DataArgs.__init__() got an unexpected keyword argument 'no_eval'

RuntimeError: Couldn't instantiate class <class 'finetune.data.args.DataArgs'> using init args dict_keys(['data', 'no_eval']): DataArgs.__init__() got an unexpected keyword argument 'no_eval'
[2024-07-25 08:25:20,114] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 1444114) of binary: /home/path/venv/bin/python

Pip Freeze

absl-py==2.1.0
annotated-types==0.7.0
attrs==23.2.0
certifi==2024.7.4
charset-normalizer==3.3.2
docstring_parser==0.16
filelock==3.15.4
fire==0.6.0
fsspec==2024.6.1
grpcio==1.65.1
idna==3.7
Jinja2==3.1.4
jsonschema==4.21.1
jsonschema-specifications==2023.12.1
Markdown==3.6
MarkupSafe==2.1.5
mistral_common==1.3.3
mpmath==1.3.0
networkx==3.3
numpy==1.25.0
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.19.3
nvidia-nvjitlink-cu12==12.5.82
nvidia-nvtx-cu12==12.1.105
protobuf==4.25.3
pydantic==2.6.1
pydantic_core==2.16.2
PyYAML==6.0.1
referencing==0.35.1
regex==2024.5.15
requests==2.32.3
rpds-py==0.19.0
safetensors==0.4.3
sentencepiece==0.2.0
simple_parsing==0.1.5
six==1.16.0
sympy==1.13.1
tensorboard==2.17.0
tensorboard-data-server==0.7.2
termcolor==2.4.0
tiktoken==0.7.0
torch==2.2.0
tqdm==4.66.4
triton==2.2.0
typing_extensions==4.12.2
urllib3==2.2.2
Werkzeug==3.0.3
xformers==0.0.24

Reproduction Steps

torchrun --nproc-per-node 2 --master_port $RANDOM -m train mistral-7b-v0.3/7B.yaml

Expected Behavior

Normal training as described

Additional Context

data:
data: "path/mistral-finetune/mistral-7b-v0.3/output.jsonl" # Path to your general training
no_eval: True

Suggested Solutions

No response

@Chasapas Chasapas added the bug Something isn't working label Jul 25, 2024
@one-and-only
Copy link

Did you remove the no_eval declaration that is already included in the #other section of the examle YAML? This seemed to solve that error for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants