Skip to content

BetterTransformer support training & autocast for all archs#1225

Merged
fxmarty merged 8 commits intohuggingface:mainfrom
fxmarty:training-support-bt
Jul 26, 2023
Merged

BetterTransformer support training & autocast for all archs#1225
fxmarty merged 8 commits intohuggingface:mainfrom
fxmarty:training-support-bt

Conversation

@fxmarty
Copy link
Contributor

@fxmarty fxmarty commented Jul 25, 2023

WIP, support training for (almost) all archs.

Though for now for encoders we pass attention_mask to SDPA, so it will only dispatch to the math path. I tried to use nestedtensor pytorch/pytorch#105913 without much success.

We probably should be more flexible to allow to use xformers / Hazy-flash that do support either custom mask or indexing. Or for training, simply ignore the mask.

cc @younesbelkada

Fixes #1081 #952 #971

Still some tests to add / make pass, and precise the doc

@fxmarty fxmarty requested a review from younesbelkada July 26, 2023 13:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Autocast is not supported for BetterTransformer integration.

1 participant