Enable Falcon FP8 inference by schoi-habana · Pull Request #94 · HabanaAI/optimum-habana-fork

schoi-habana · 2024-03-06T18:43:49Z

falcon-7b, falcon-40b, falcon-180b quantization enabled with --reuse_cache

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

mandy-li

Sun, please add the Falcon-180B fp8 inference command to README

@vivekgoe , please review the code and have your team to test FT with this PR.

vivekgoe

@schoi-habana since the changes in modeling_falcon.py are large it is difficult to go over each of them. I would suggest you to request QA to check Falcon-40B & Falcon-180B fine-tuning with this PR to avoid any degrades.

Another generic comment, for the model classes you are overriding, please add in comment which transformer commit these are derived from and list the differences from that base commit. This will be useful when model is rebased to new transformer versions in future.

mandy-li · 2024-03-11T17:07:32Z

@libinta , since your team worked on Falcon-40B FT enablement, please review the code and test Falcon-40B FT as well. Thanks

squash of the following 5 commits enable Falcon FP8 inference added example command in readme, code cleanup resolve issues in finetuning enable non reuse cache flow for fp8 revert non reuse_cache flow for training due to perf drop

astachowiczhabana · 2024-06-07T14:23:33Z

huggingface#831

enable Falcon FP8 inference

c72250a

schoi-habana marked this pull request as ready for review March 6, 2024 18:43

schoi-habana requested review from bhargaveede, libinta, mandy-li, ssarkar2 and vivekgoe as code owners March 6, 2024 18:43

schoi-habana requested a review from a user March 6, 2024 18:43

mandy-li reviewed Mar 7, 2024

View reviewed changes

Comment thread examples/text-generation/quantization_config/maxabs_measure_falcon.json Outdated

Comment thread optimum/habana/transformers/models/falcon/modeling_falcon.py Outdated

schoi-habana requested a review from bgoldberg-habana March 7, 2024 21:49

added example command in readme, code cleanup

63fd6b2

vivekgoe reviewed Mar 8, 2024

View reviewed changes

schoi-habana force-pushed the schoi/falcon_180b_quant_PR branch from deb2e75 to c852094 Compare March 8, 2024 19:54

resolve issues in finetuning

af025a7

schoi-habana force-pushed the schoi/falcon_180b_quant_PR branch from c852094 to af025a7 Compare March 8, 2024 23:48

mandy-li requested changes Mar 11, 2024

View reviewed changes

Comment thread examples/text-generation/README.md

enable non reuse cache flow for fp8

7b9852a

schoi-habana force-pushed the schoi/falcon_180b_quant_PR branch from ecac5fa to 7b9852a Compare March 11, 2024 23:31

revert non reuse_cache flow for training due to perf drop

8a18736

mandy-li approved these changes Mar 13, 2024

View reviewed changes

mandy-li merged commit d984ded into habana-main Mar 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable Falcon FP8 inference#94

Enable Falcon FP8 inference#94
mandy-li merged 5 commits into
habana-mainfrom
schoi/falcon_180b_quant_PR

schoi-habana commented Mar 6, 2024

Uh oh!

mandy-li left a comment

Uh oh!

Uh oh!

Uh oh!

vivekgoe left a comment

Uh oh!

mandy-li commented Mar 11, 2024

Uh oh!

Uh oh!

astachowiczhabana commented Jun 7, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

schoi-habana commented Mar 6, 2024

Before submitting

Uh oh!

mandy-li left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

vivekgoe left a comment

Choose a reason for hiding this comment

Uh oh!

mandy-li commented Mar 11, 2024

Uh oh!

Uh oh!

astachowiczhabana commented Jun 7, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants