Skip to content

Enable Falcon FP8 inference#94

Merged
mandy-li merged 5 commits into
habana-mainfrom
schoi/falcon_180b_quant_PR
Mar 13, 2024
Merged

Enable Falcon FP8 inference#94
mandy-li merged 5 commits into
habana-mainfrom
schoi/falcon_180b_quant_PR

Conversation

@schoi-habana
Copy link
Copy Markdown

falcon-7b, falcon-40b, falcon-180b quantization enabled with --reuse_cache

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@schoi-habana schoi-habana marked this pull request as ready for review March 6, 2024 18:43
@schoi-habana schoi-habana requested a review from a user March 6, 2024 18:43
Copy link
Copy Markdown

@mandy-li mandy-li left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sun, please add the Falcon-180B fp8 inference command to README

@vivekgoe , please review the code and have your team to test FT with this PR.

Comment thread examples/text-generation/quantization_config/maxabs_measure_falcon.json Outdated
Comment thread optimum/habana/transformers/models/falcon/modeling_falcon.py Outdated
Copy link
Copy Markdown

@vivekgoe vivekgoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@schoi-habana since the changes in modeling_falcon.py are large it is difficult to go over each of them. I would suggest you to request QA to check Falcon-40B & Falcon-180B fine-tuning with this PR to avoid any degrades.

Another generic comment, for the model classes you are overriding, please add in comment which transformer commit these are derived from and list the differences from that base commit. This will be useful when model is rebased to new transformer versions in future.

@schoi-habana schoi-habana force-pushed the schoi/falcon_180b_quant_PR branch from deb2e75 to c852094 Compare March 8, 2024 19:54
@schoi-habana schoi-habana force-pushed the schoi/falcon_180b_quant_PR branch from c852094 to af025a7 Compare March 8, 2024 23:48
@mandy-li
Copy link
Copy Markdown

@libinta , since your team worked on Falcon-40B FT enablement, please review the code and test Falcon-40B FT as well. Thanks

Comment thread examples/text-generation/README.md
@schoi-habana schoi-habana force-pushed the schoi/falcon_180b_quant_PR branch from ecac5fa to 7b9852a Compare March 11, 2024 23:31
@mandy-li mandy-li merged commit d984ded into habana-main Mar 13, 2024
schoi-habana pushed a commit that referenced this pull request Mar 19, 2024
squash of the following 5 commits

enable Falcon FP8 inference

added example command in readme, code cleanup

resolve issues in finetuning

enable non reuse cache flow for fp8

revert non reuse_cache flow for training due to perf drop
@astachowiczhabana
Copy link
Copy Markdown

huggingface#831

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants