Skip to content

Conversation

@younesbelkada
Copy link
Contributor

What does this PR do?

While testing out #26414 I realised the current tests silently pass as we only check for the immediate next predicted token
For small models FA2 is quite flaky so I have decided to not increase max_new_tokens in test_flash_attn_2_generate_padding_right and test_flash_attn_2_generate_left_padding and rather have a separate test that will run generate with use_cache with a relatively large max_new_tokens to catch issues with caching when porting models to FA

cc @LysandreJik

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Sep 26, 2023

The documentation is not available anymore as the PR was closed or merged.

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for your PR @younesbelkada!

@LysandreJik LysandreJik merged commit 153755e into huggingface:main Sep 27, 2023
@younesbelkada younesbelkada deleted the flash-add-use-cache-tests branch September 27, 2023 10:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants