Skip to content

Conversation

anton-l
Copy link
Member

@anton-l anton-l commented Nov 19, 2024

The cleanup seems to be called from multiple processes in data parallel mode, so this just ensures there's no error due to the already deleted model object.

@NathanHB
Copy link
Member

Thanks for the PR ! I'm pretty sure that in data // mode, each process has it's own model so it should not be an issue. Did you run into an issue were the model was already deleted ?
Also, I think the cleanup is only ran on the first process

@anton-l
Copy link
Member Author

anton-l commented Nov 19, 2024

@NathanHB yes, I catch an error there due to the model already being None. It only happens with vllm,data_parallel_size=2 and up, no issues if I disable data parallelism.

@NathanHB
Copy link
Member

Ohh I did not take vllm // into account, it indeed works deifferently, good catch !

@clefourrier clefourrier merged commit 85c0d9f into huggingface:main Nov 20, 2024
2 checks passed
@anton-l anton-l deleted the vllm_cleanup branch November 20, 2024 13:00
JoelNiklaus pushed a commit to JoelNiklaus/lighteval that referenced this pull request Nov 25, 2024
hynky1999 pushed a commit that referenced this pull request Nov 29, 2024
Co-authored-by: Nathan Habib <[email protected]>
hynky1999 pushed a commit that referenced this pull request May 22, 2025
Co-authored-by: Nathan Habib <[email protected]>
hynky1999 pushed a commit that referenced this pull request May 22, 2025
Co-authored-by: Nathan Habib <[email protected]>
NathanHB added a commit that referenced this pull request Sep 19, 2025
Co-authored-by: Nathan Habib <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants