Skip to content

Conversation

@baskrahmer
Copy link
Contributor

What does this PR do?

What the title says :)
In order to make the tests with key-value cache work, the input format had to be changed in order to be compatible with the Bloom cache format. Perhaps this could be automated by adding to Transformers directly (for the generate method this is done in huggingface/transformers#20213)

Fixes #1175

Duplicate PR of #1187 but I messed up a bit with rebasing there, so I closed that one ;)

Before submitting

  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@nlpcat
Copy link

nlpcat commented Aug 4, 2023

can i ask when we can get this PR merged to optimize bloom? pytorch just added attention bias memory efficient optimization. pytorch/pytorch#104310

@nlpcat
Copy link

nlpcat commented Aug 4, 2023

@fxmarty
Copy link
Contributor

fxmarty commented Aug 7, 2023

Thank you, I'll make sure it is merged this week!

Copy link
Contributor

@fxmarty fxmarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks a lot for the addition @baskrahmer it is working smoothly!

About the untangle batch_size from self.num_heads, was it necessary for numerical equivalence?

@fxmarty fxmarty merged commit 456b28f into huggingface:main Aug 11, 2023
@baskrahmer
Copy link
Contributor Author

@fxmarty thanks for the review. IIRC the untangling of the batch size is there since the key/value cache of Bloom is structured a bit differently than other autoregressive language models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BetterTransformer support for BLOOM

4 participants