Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sample采样时,遇到超长数据报错,如何避免 #3073

Open
DogeWatch opened this issue Feb 12, 2025 · 2 comments
Open

Sample采样时,遇到超长数据报错,如何避免 #3073

DogeWatch opened this issue Feb 12, 2025 · 2 comments
Labels
bug Something isn't working

Comments

@DogeWatch
Copy link

Describe the bug
复用的rft.py的代码,采用命令行方式启动sample

sample_cmd = (f'{conda_prefix} CUDA_VISIBLE_DEVICES={device} swift sample '
                      f'--model {model} --model_type {model_type} '
                      f'--dataset {" ".join(dataset)} '
                      f'--data_range {device} {device_count} '
                      f'--max_length 8192 '
                      # f'--system "You are a math model, you should **think step by step** carefully, '
                      # f'and always consider the basic math principles to avoid making calculating mistakes.'
                      # f'Give the final answer wrapped with \\boxed{{}}" '
                      f'--load_args false '
                      f'--sampler_engine lmdeploy '
                      f'--max_new_tokens 1024 '
                      f'--override_exist_file true '
                      f'--num_sampling_per_gpu_batch_size 2 '
                      f'--num_return_sequences 4 '
                      f'--cache_files sample_output/iter_{iter}_proc_{device}_cache.jsonl '
                      f'--output_file iter_{iter}_proc_{device}_cache.jsonl '
                      f'--top_p 1.0 '
                      f'--temperature 1.0 ')

其中指定了 max_length=8192,但是运行时遇到超长数据报错

Image 这里的疑问是 max_length(32768) 怎么来的,以及怎样才能对这种超长数据做截断或者丢弃?

Your hardware and system info
swift==v3.1.0

@DogeWatch
Copy link
Author

DogeWatch commented Feb 12, 2025

另外我在启动时加上了 --truncation_strategy delete 好像也没有用?依然报相同的错误
以及在日志里看到这句话:[TM][WARNING] [LlamaTritonModel] max_context_token_num is not set, default to 32768. 不知道是不是有关系

@Jintao-Huang
Copy link
Collaborator

稍等下

@Jintao-Huang Jintao-Huang added the bug Something isn't working label Feb 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants