Skip to content

Conversation

@ArthurZucker
Copy link
Collaborator

@ArthurZucker ArthurZucker commented Aug 29, 2023

What does this PR do?

Fixes #25813 which indicates that ROPE is slow. It's a follow up of #22785, were ROPE was improved for Llama model.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Aug 29, 2023

The documentation is not available anymore as the PR was closed or merged.

@ArthurZucker ArthurZucker changed the title Faster rotary embedding for GPTNeoX [GPTNeoX] Faster rotary embedding for GPTNeoX (based on llama changes) Aug 30, 2023
@ArthurZucker ArthurZucker marked this pull request as ready for review August 31, 2023 14:15
Copy link
Contributor

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this!

Only question is about the dtype casting in Idefics. Could you run slow tests and some checks on the effects of the model outputs when rope scaling is used?

Copy link
Contributor

@gante gante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (and nice touch adding the copied from!)

@LysandreJik
Copy link
Member

cc @StellaAthena FYI, this PR should greatly speed up the ROPE embeddings of the GPTNeoX model, similarly to how it was done for the LLaMa model.

Let us know if you want to review/have any comments!

@ArthurZucker
Copy link
Collaborator Author

cc @Narsil as this touches buffers that will no longer be persistent, will wait for you in case this is conflicting with TGI?

@ArthurZucker ArthurZucker merged commit 253f9a3 into huggingface:main Oct 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GPTNeoXRotaryEmbedding has a defect when using sin/cos cache

5 participants