add support for TinyLlama model by tjs-intel · Pull Request #25 · HabanaAI/optimum-habana-fork

tjs-intel · 2024-02-05T19:35:14Z

What does this PR do?

Reopening #19 because #15 does not work for TinyLlama/TinyLlama-1.1B-Chat-v1.0. This model does not have the checkpoint index files.

ValueError: Can't find a checkpoint index (pytorch_model.bin.index.json or model.safetensors.index.json) in /root/.cache/huggingface/hub/models--TinyLlama--TinyLlama-1.1B-Chat-v1.0/snapshots/77e23968eed12d195bd46c519aa679cc22a27ddc.

The TinyLlama model only has checkpoints in the form of model.safetensors. This checkpoint needs to be included in the list of checkpoints that is passed to DeepSpeed in order for the model to function properly when initialized with DeepSpeed.

This PR adds the safetensor checkpoints to the list of checkpoints passed to DeepSpeed.

Note: This change requires an upstream commit in microsoft/DeepSpeed to be merged downstream to HabanaAI/DeepSpeed in order for DeepSpeed to support the provided safetensor format.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

tjs-intel · 2024-02-05T19:43:05Z

@vivekgoe I needed to repoen #19 because #15 did not actually resolve my issue with the TinyLlama model

tjs-intel · 2024-02-05T20:54:52Z

I tested this and it is working with tiiuae/falcon-7b, FYI @schoi-habana

tjs-intel · 2024-02-06T18:10:13Z

Tested and working with NousResearch/Nous-Hermes-2-SOLAR-10.7B, upstage/SOLAR-10.7B-Instruct-v1.0, tiiuae/falcon-7b and TinyLlama/TinyLlama-1.1B-Chat-v1.0

vivekgoe · 2024-02-13T07:24:49Z

@schoi-habana @mandy-li since this is related to #15 can you please help review this and merge it if it looks ok to you?

mandy-li · 2024-02-13T19:56:24Z

LGTM. @tjs-intel , please rebase your fork repo to get the latest code changes from oh-fork habana-main branch

tjs-intel · 2024-02-13T20:04:26Z

@mandy-li done

Co-authored-by: Sun Choi <schoi@habana.ai>

astachowiczhabana · 2024-06-07T14:19:17Z

huggingface#831

* Problem: output of _sample function was filled with padding tokens for for bart model. * Cause: Bart model uses the same token as decoder_start_token_id and end of string. See: https://huggingface.co/facebook/bart-large-cnn/blob/main/config.json Because of that mechanism filling model output with padding tokens after EOS (end of string) toke was replacing whole response with padding. * Solution: Skip check for EOS for first token in padding filling loop.

* Fix clip test * Skip falcon tests * Fix clip test * [SW-209062] Disable default sdpa in Albert (#23) Transformers' default sdpa implementation caused performance drop in Albert. Adding Albert to the list of models which don't yet have sdpa implementation in Gaudi and use eager attention. * [SW-209210] skip first token in EOS check. (#25) (#27) * Problem: output of _sample function was filled with padding tokens for for bart model. * Cause: Bart model uses the same token as decoder_start_token_id and end of string. See: https://huggingface.co/facebook/bart-large-cnn/blob/main/config.json Because of that mechanism filling model output with padding tokens after EOS (end of string) toke was replacing whole response with padding. * Solution: Skip check for EOS for first token in padding filling loop. * Update CODEOWNERS * Adding labels clone as workaround to avoid crash (#28) * [SW-0] Fix style --------- Co-authored-by: Urszula Golowicz <urszula.golowicz@intel.com> Co-authored-by: Marcin Łapiński <mlapinskix@habana.ai> Co-authored-by: Bhargav <beede@habana.ai>

* Problem: output of _sample function was filled with padding tokens for for bart model. * Cause: Bart model uses the same token as decoder_start_token_id and end of string. See: https://huggingface.co/facebook/bart-large-cnn/blob/main/config.json Because of that mechanism filling model output with padding tokens after EOS (end of string) toke was replacing whole response with padding. * Solution: Skip check for EOS for first token in padding filling loop.

* Fix clip test * Skip falcon tests * Fix clip test * [SW-209062] Disable default sdpa in Albert (#23) Transformers' default sdpa implementation caused performance drop in Albert. Adding Albert to the list of models which don't yet have sdpa implementation in Gaudi and use eager attention. * [SW-209210] skip first token in EOS check. (#25) (#27) * Problem: output of _sample function was filled with padding tokens for for bart model. * Cause: Bart model uses the same token as decoder_start_token_id and end of string. See: https://huggingface.co/facebook/bart-large-cnn/blob/main/config.json Because of that mechanism filling model output with padding tokens after EOS (end of string) toke was replacing whole response with padding. * Solution: Skip check for EOS for first token in padding filling loop. * Update CODEOWNERS * Adding labels clone as workaround to avoid crash (#28) * [SW-0] Fix style --------- Co-authored-by: Urszula Golowicz <urszula.golowicz@intel.com> Co-authored-by: Marcin Łapiński <mlapinskix@habana.ai> Co-authored-by: Bhargav <beede@habana.ai>

tjs-intel changed the title ~~add support for safetensors when reading checkpoints~~ add support for TinyLlama model Feb 5, 2024

schoi-habana mentioned this pull request Feb 6, 2024

enable falcon-180b inference #15

Merged

3 tasks

tjs-intel force-pushed the support-safetensors branch from 37b54e3 to cb63ac7 Compare February 6, 2024 17:51

tjs-intel requested review from bhargaveede, ssarkar2 and vivekgoe as code owners February 6, 2024 17:51

tjs-intel changed the base branch from habana-main to schoi/falcon_180b February 6, 2024 17:51

tjs-intel mentioned this pull request Feb 8, 2024

add support for TinyLlama model huggingface/optimum-habana#693

Merged

3 tasks

tjs-intel force-pushed the support-safetensors branch from cb63ac7 to d7fb355 Compare February 8, 2024 16:25

tjs-intel changed the base branch from schoi/falcon_180b to habana-main February 8, 2024 16:26

vivekgoe requested review from mandy-li, piotrbocian and schoi-habana February 13, 2024 07:23

Add support for safetensors and sharded checkpoints

c501ebf

tjs-intel force-pushed the support-safetensors branch from d7fb355 to c501ebf Compare February 13, 2024 20:03

mandy-li approved these changes Feb 13, 2024

View reviewed changes

mandy-li merged commit f4e0239 into HabanaAI:habana-main Feb 13, 2024

tjs-intel deleted the support-safetensors branch February 13, 2024 20:21

bhargaveede pushed a commit that referenced this pull request Feb 19, 2024

Add support for safetensors and sharded checkpoints (#25)

a2398d5

Co-authored-by: Sun Choi <schoi@habana.ai>

bhargaveede pushed a commit that referenced this pull request Feb 19, 2024

Add support for safetensors and sharded checkpoints (#25)

e0d1de5

Co-authored-by: Sun Choi <schoi@habana.ai>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add support for TinyLlama model#25

add support for TinyLlama model#25
mandy-li merged 1 commit into
HabanaAI:habana-mainfrom
tjs-intel:support-safetensors

tjs-intel commented Feb 5, 2024

Uh oh!

tjs-intel commented Feb 5, 2024

Uh oh!

tjs-intel commented Feb 5, 2024

Uh oh!

tjs-intel commented Feb 6, 2024

Uh oh!

vivekgoe commented Feb 13, 2024

Uh oh!

mandy-li commented Feb 13, 2024

Uh oh!

tjs-intel commented Feb 13, 2024

Uh oh!

astachowiczhabana commented Jun 7, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

tjs-intel commented Feb 5, 2024

What does this PR do?

Before submitting

Uh oh!

tjs-intel commented Feb 5, 2024

Uh oh!

tjs-intel commented Feb 5, 2024

Uh oh!

tjs-intel commented Feb 6, 2024

Uh oh!

vivekgoe commented Feb 13, 2024

Uh oh!

mandy-li commented Feb 13, 2024

Uh oh!

tjs-intel commented Feb 13, 2024

Uh oh!

astachowiczhabana commented Jun 7, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants