Skip to content

add support for TinyLlama model#25

Merged
mandy-li merged 1 commit into
HabanaAI:habana-mainfrom
tjs-intel:support-safetensors
Feb 13, 2024
Merged

add support for TinyLlama model#25
mandy-li merged 1 commit into
HabanaAI:habana-mainfrom
tjs-intel:support-safetensors

Conversation

@tjs-intel
Copy link
Copy Markdown

What does this PR do?

Reopening #19 because #15 does not work for TinyLlama/TinyLlama-1.1B-Chat-v1.0. This model does not have the checkpoint index files.

ValueError: Can't find a checkpoint index (pytorch_model.bin.index.json or model.safetensors.index.json) in /root/.cache/huggingface/hub/models--TinyLlama--TinyLlama-1.1B-Chat-v1.0/snapshots/77e23968eed12d195bd46c519aa679cc22a27ddc.

The TinyLlama model only has checkpoints in the form of model.safetensors. This checkpoint needs to be included in the list of checkpoints that is passed to DeepSpeed in order for the model to function properly when initialized with DeepSpeed.

This PR adds the safetensor checkpoints to the list of checkpoints passed to DeepSpeed.

Note: This change requires an upstream commit in microsoft/DeepSpeed to be merged downstream to HabanaAI/DeepSpeed in order for DeepSpeed to support the provided safetensor format.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@tjs-intel tjs-intel changed the title add support for safetensors when reading checkpoints add support for TinyLlama model Feb 5, 2024
@tjs-intel
Copy link
Copy Markdown
Author

@vivekgoe I needed to repoen #19 because #15 did not actually resolve my issue with the TinyLlama model

@tjs-intel
Copy link
Copy Markdown
Author

I tested this and it is working with tiiuae/falcon-7b, FYI @schoi-habana

@schoi-habana schoi-habana mentioned this pull request Feb 6, 2024
3 tasks
@tjs-intel tjs-intel force-pushed the support-safetensors branch from 37b54e3 to cb63ac7 Compare February 6, 2024 17:51
@tjs-intel tjs-intel changed the base branch from habana-main to schoi/falcon_180b February 6, 2024 17:51
@tjs-intel
Copy link
Copy Markdown
Author

Tested and working with NousResearch/Nous-Hermes-2-SOLAR-10.7B, upstage/SOLAR-10.7B-Instruct-v1.0, tiiuae/falcon-7b and TinyLlama/TinyLlama-1.1B-Chat-v1.0

@tjs-intel tjs-intel force-pushed the support-safetensors branch from cb63ac7 to d7fb355 Compare February 8, 2024 16:25
@tjs-intel tjs-intel changed the base branch from schoi/falcon_180b to habana-main February 8, 2024 16:26
@vivekgoe
Copy link
Copy Markdown

@schoi-habana @mandy-li since this is related to #15 can you please help review this and merge it if it looks ok to you?

@mandy-li
Copy link
Copy Markdown

LGTM. @tjs-intel , please rebase your fork repo to get the latest code changes from oh-fork habana-main branch

@tjs-intel
Copy link
Copy Markdown
Author

@mandy-li done

@mandy-li mandy-li merged commit f4e0239 into HabanaAI:habana-main Feb 13, 2024
@tjs-intel tjs-intel deleted the support-safetensors branch February 13, 2024 20:21
bhargaveede pushed a commit that referenced this pull request Feb 19, 2024
Co-authored-by: Sun Choi <schoi@habana.ai>
bhargaveede pushed a commit that referenced this pull request Feb 19, 2024
Co-authored-by: Sun Choi <schoi@habana.ai>
@astachowiczhabana
Copy link
Copy Markdown

huggingface#831

astachowiczhabana pushed a commit that referenced this pull request Nov 22, 2024
* Problem: output of _sample function was filled with padding tokens
   for for bart model.

 * Cause: Bart model uses the same token as decoder_start_token_id and
   end of string.
   See: https://huggingface.co/facebook/bart-large-cnn/blob/main/config.json
   Because of that mechanism filling model output with padding
   tokens after EOS (end of string) toke was replacing whole response
   with padding.

 * Solution: Skip check for EOS for first token in padding filling loop.
astachowiczhabana added a commit that referenced this pull request Nov 22, 2024
* Fix clip test

* Skip falcon tests

* Fix clip test

* [SW-209062] Disable default sdpa in Albert (#23)

Transformers' default sdpa implementation caused performance
drop in Albert. Adding Albert to the list of models which don't
yet have sdpa implementation in Gaudi and use eager attention.

* [SW-209210] skip first token in EOS check. (#25) (#27)

* Problem: output of _sample function was filled with padding tokens
   for for bart model.

 * Cause: Bart model uses the same token as decoder_start_token_id and
   end of string.
   See: https://huggingface.co/facebook/bart-large-cnn/blob/main/config.json
   Because of that mechanism filling model output with padding
   tokens after EOS (end of string) toke was replacing whole response
   with padding.

 * Solution: Skip check for EOS for first token in padding filling loop.

* Update CODEOWNERS

* Adding labels clone as workaround to avoid crash (#28)

* [SW-0] Fix style

---------

Co-authored-by: Urszula Golowicz <urszula.golowicz@intel.com>
Co-authored-by: Marcin Łapiński <mlapinskix@habana.ai>
Co-authored-by: Bhargav <beede@habana.ai>
astachowiczhabana pushed a commit that referenced this pull request Nov 28, 2024
* Problem: output of _sample function was filled with padding tokens
   for for bart model.

 * Cause: Bart model uses the same token as decoder_start_token_id and
   end of string.
   See: https://huggingface.co/facebook/bart-large-cnn/blob/main/config.json
   Because of that mechanism filling model output with padding
   tokens after EOS (end of string) toke was replacing whole response
   with padding.

 * Solution: Skip check for EOS for first token in padding filling loop.
xinyu-intel pushed a commit that referenced this pull request Mar 4, 2025
* Problem: output of _sample function was filled with padding tokens
   for for bart model.

 * Cause: Bart model uses the same token as decoder_start_token_id and
   end of string.
   See: https://huggingface.co/facebook/bart-large-cnn/blob/main/config.json
   Because of that mechanism filling model output with padding
   tokens after EOS (end of string) toke was replacing whole response
   with padding.

 * Solution: Skip check for EOS for first token in padding filling loop.
xinyu-intel pushed a commit that referenced this pull request Mar 4, 2025
* Fix clip test

* Skip falcon tests

* Fix clip test

* [SW-209062] Disable default sdpa in Albert (#23)

Transformers' default sdpa implementation caused performance
drop in Albert. Adding Albert to the list of models which don't
yet have sdpa implementation in Gaudi and use eager attention.

* [SW-209210] skip first token in EOS check. (#25) (#27)

* Problem: output of _sample function was filled with padding tokens
   for for bart model.

 * Cause: Bart model uses the same token as decoder_start_token_id and
   end of string.
   See: https://huggingface.co/facebook/bart-large-cnn/blob/main/config.json
   Because of that mechanism filling model output with padding
   tokens after EOS (end of string) toke was replacing whole response
   with padding.

 * Solution: Skip check for EOS for first token in padding filling loop.

* Update CODEOWNERS

* Adding labels clone as workaround to avoid crash (#28)

* [SW-0] Fix style

---------

Co-authored-by: Urszula Golowicz <urszula.golowicz@intel.com>
Co-authored-by: Marcin Łapiński <mlapinskix@habana.ai>
Co-authored-by: Bhargav <beede@habana.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants