Skip to content

Support loading hugging face checkpoint#1165

Merged
regisss merged 8 commits into
huggingface:mainfrom
HabanaAI:dev/ulivne/upstream_1_17_load_cp
Aug 8, 2024
Merged

Support loading hugging face checkpoint#1165
regisss merged 8 commits into
huggingface:mainfrom
HabanaAI:dev/ulivne/upstream_1_17_load_cp

Conversation

@ulivne
Copy link
Copy Markdown
Contributor

@ulivne ulivne commented Jul 28, 2024

  • Support loading checkpoint with INC

  • load_cp explanation

  • Add torch_dtype bf16 for the model

  • Support getting quantized weight tensor

* Support loading checkpoint with INC

* load_cp explanation

* Add torch_dtype bf16 for the model

* Support getting quantized weight tensor

---------

Co-authored-by: yan tomsinsky <ytomsinsky@habana.ai>
@ulivne ulivne requested review from libinta and mandy-li as code owners July 28, 2024 10:22
@ulivne ulivne requested a review from a user July 28, 2024 10:22
@ulivne ulivne requested a review from regisss as a code owner July 28, 2024 10:22
@libinta libinta added the synapse 1.17_dependency PR not backward compatible can be merged only when synapse 1.17 is available. label Jul 28, 2024
**model_kwargs,
)
elif args.load_cp:
from neural_compressor.torch.quantization import load
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ulivne sounds like neural_compressor is missing from the requirements consider adding it!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ulivne sounds like neural_compressor is missing from the requirements consider adding it!

neural_compressor is installed automatically as part of habana software stack. it replaces habana_quantization_toolkit which was also not part of requirements.

vidyasiv pushed a commit to emascarenhas/optimum-habana that referenced this pull request Aug 1, 2024
vidyasiv added a commit to emascarenhas/optimum-habana that referenced this pull request Aug 2, 2024
Support loading hugging face checkpoint huggingface#1165
@ulivne
Copy link
Copy Markdown
Contributor Author

ulivne commented Aug 6, 2024

Removed test as it is uses a huggingface model, and we are not sure that we can use in our open source code (in terms of licence, intel regulations)

Copy link
Copy Markdown
Collaborator

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think load_cp is not explicit enough. Transformers has load_in_4bit which I think would be better here.

@libinta You'll let me know when this PR can be merged.

@ulivne
Copy link
Copy Markdown
Contributor Author

ulivne commented Aug 8, 2024

I think load_cp is not explicit enough. Transformers has load_in_4bit which I think would be better here.

@libinta You'll let me know when this PR can be merged.

this flag triggers neural compressor load api, which also supports loading of high precision models, and is planned to support loading of fp8 models in the future. it is correct that currently for gaudi only 4bit is suppoerted, however i think we should keep the load_cp name to avoid changing it in the future when we add loading of fp8 models.

@regisss
Copy link
Copy Markdown
Collaborator

regisss commented Aug 8, 2024

I think load_cp is not explicit enough. Transformers has load_in_4bit which I think would be better here.
@libinta You'll let me know when this PR can be merged.

this flag triggers neural compressor load api, which also supports loading of high precision models, and is planned to support loading of fp8 models in the future. it is correct that currently for gaudi only 4bit is suppoerted, however i think we should keep the load_cp name to avoid changing it in the future when we add loading of fp8 models.

load_in_4_bit is just a proposal but honestly, if I'm a user and I don't know the arguments, load_cp doesn't tell me anything about what it does. I would prefer something more explicit like load_quantized_model or even better automatically detect if this is a 4/8-bit checkpoint.

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@regisss regisss merged commit 3ea3145 into huggingface:main Aug 8, 2024
action="store_true",
help="Whether to load model from hugging face checkpoint.",
)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one line missing here

regisss pushed a commit that referenced this pull request Aug 8, 2024
Co-authored-by: yan tomsinsky <ytomsinsky@habana.ai>
Co-authored-by: Libin Tang <litang@habana.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

synapse 1.17_dependency PR not backward compatible can be merged only when synapse 1.17 is available.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants