Implement MambaForSequenceClassification #31155

Adibvafa · 2024-05-31T02:52:42Z

What does this PR do?

Adds the MambaForSequenceClassification model based on MambaModel backbone.

We recently published EHRMamba, a state-of-the-art foundation model for Electronic Health Records. This model is built on the same architecture and we will release the trained weights using the MambaForSequenceClassification class.
https://vectorinstitute.github.io/EHRMamba

Fixes #30431

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Mamba Models - Missing MambaForSequenceClassification #30431
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

As discussed in #30431, @ArthurZucker could you take a look? 😊

Notes

This implementation closely follows the GPT2ForSequenceClassification method, with the exception of pooling the last hidden states before passing them to the classifier to improve efficiency.

Adibvafa · 2024-05-31T03:57:25Z

Referring to #29552, "there's a test specific to sequence classification that expects all the unfrozen params to be initialized in the range [0.0, 1.0] and the initialized values for the mixer don't satisfy that assertion."

This results in a test failure even though the classifier head is initialized properly.

ArthurZucker

thanks for opening a PR! 🤗

src/transformers/models/mamba/modeling_mamba.py

ArthurZucker · 2024-06-06T07:18:19Z

Could you rebase on main and make sure the CIs are green! 🤗

Adibvafa · 2024-06-07T02:12:46Z

Could you rebase on main and make sure the CIs are green! 🤗

Of course! It should be good to merge now.
There is a failed test for "MobileViTV2ModelTest" or similar which are unrelated to Mamba.

…tion into main

Adibvafa · 2024-08-02T13:37:57Z

@ArthurZucker Pending review!

ArthurZucker

🤗

ArthurZucker · 2024-08-05T06:09:00Z

src/transformers/models/mamba/modeling_mamba.py

+    """,
+    MAMBA_START_DOCSTRING,
+)
+class MambaForSequenceClassification(MambaPreTrainedModel):


Why not just use copied from here? : # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with Llama->Mamba, LLAMA->MAMBA, self.transformer->self.model, transformer_outputs->model_outputs

I'm not sure if I understand your comment. The forward method of Mamba and Llama for sequence classification seem different. Could you please elaborate! 🤗

Adibvafa · 2024-08-09T19:08:21Z

@ArthurZucker Now that the #32080 is merged, can we do a final review for this one too? Also, I would like to add Mamba2ForSequenceClassification to this PR as well so we have both Mamba models with classification capabilities. Then I would be able to release the EHRMamba model on HuggingFace.

mohith7548 · 2024-08-14T07:20:45Z

Hey, any update on this?

Jellymoon · 2024-08-14T15:51:01Z

src/transformers/models/mamba/modeling_mamba.py

+        labels: Optional[torch.LongTensor] = None,
+        output_hidden_states: Optional[bool] = None,
+        return_dict: Optional[bool] = None,
+        use_cache: Optional[bool] = None,


i think there is a
**kwargs,
missing in the forward function line 842-843

Good catch! I suppose it won't be necessary (as opposed to MambaForCausalLM) but having it is good. I will add it in a commit now.

Jellymoon · 2024-08-15T19:08:25Z

@Adibvafa have you tried running this? Whenever the model gets to an evaluation step I get the error below.
The code I tried was the huggingface sequence classification tutorial (link to tutorial) but i used a gpu, replaced "distilbert/distilbert-base-uncased" with "state-spaces/mamba-130m-hf" and i replaced my local modeling_mamba.py with yours so it does load the model.

ERROR:

Traceback (most recent call last):
  File ".../tutorial_script.py", line 71, in <module>
  File ".../lib/python3.11/site-packages/transformers/trainer.py", line 3754, in predict
    output = eval_loop(
             ^^^^^^^^^^
  File ".../lib/python3.11/site-packages/transformers/trainer.py", line 3887, in evaluation_loop
    logits = self.accelerator.pad_across_processes(logits, dim=1, pad_index=-100)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../lib/python3.11/site-packages/accelerate/accelerator.py", line 2508, in pad_across_processes
    return pad_across_processes(tensor, dim=dim, pad_index=pad_index, pad_first=pad_first)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../lib/python3.11/site-packages/accelerate/utils/operations.py", line 411, in wrapper
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../lib/python3.11/site-packages/accelerate/utils/operations.py", line 678, in pad_across_processes
    return recursively_apply(
           ^^^^^^^^^^^^^^^^^^
  File ".../lib/python3.11/site-packages/accelerate/utils/operations.py", line 107, in recursively_apply
    return honor_type(
           ^^^^^^^^^^^
  File ".../lib/python3.11/site-packages/accelerate/utils/operations.py", line 81, in honor_type
    return type(obj)(generator)
           ^^^^^^^^^^^^^^^^^^^^
  File ".../lib/python3.11/site-packages/accelerate/utils/operations.py", line 110, in <genexpr>
    recursively_apply(
  File ".../lib/python3.11/site-packages/accelerate/utils/operations.py", line 128, in recursively_apply
    raise TypeError(
TypeError: Unsupported types (<class 'transformers.cache_utils.MambaCache'>) passed to `_pad_across_processes`. 
Only nested list/tuple/dicts of objects that are valid for `is_torch_tensor` should be passed.

mohith7548 · 2024-08-25T13:20:03Z

Hey @Jellymoon, the Mamba model, works as expected during the training loop. However, it fails during the evaluation loop. So, I found that it is necessary to set use_cache=False when loading the model so that evaluation does not fail.

cc: @Adibvafa

model = MambaForSequenceClassification.from_pretrained(
    model_path, 
    num_labels=len(id2label), 
    id2label=id2label, 
    label2id=label2id,
    use_cache=False  # This needs to be passed when using eval and training Mamba for sequence classification otherwise it will raise an error
)

mohith7548 · 2024-08-25T13:28:45Z

I noticed that the training speed (fine-tuning) is very slow compared to the other HF transformer models. Can something be improved here?

mohith7548

Ran locally. Loaded, Finetuned and did batch inference. Works as expected.

Adibvafa · 2024-08-27T14:17:41Z

Hey @Jellymoon, the Mamba model, works as expected during the training loop. However, it fails during the evaluation loop. So, I found that it is necessary to set use_cache=False when loading the model so that evaluation does not fail.

cc: @Adibvafa
model = MambaForSequenceClassification.from_pretrained(
    model_path, 
    num_labels=len(id2label), 
    id2label=id2label, 
    label2id=label2id,
    use_cache=False  # This needs to be passed when using eval and training Mamba for sequence classification otherwise it will raise an error
)

I will take a look. Thank you for bringing this up! @Jellymoon @mohith7548

Adibvafa · 2024-08-27T14:18:38Z

I noticed that the training speed (fine-tuning) is very slow compared to the other HF transformer models. Can something be improved here?

Do you have mamba-ssm installed? Is it slow in the classification or in Mamba in general?

mohith7548 · 2024-08-28T15:23:14Z

@Adibvafa, I have mamba-ssm installed. However, I realized that it also need causal-conv1d>=1.4.0 package train faster. Otherwise it was showing some warning related to conv1d that it's gonna use slow/sequential version. Now that I installed causal-conv1d>=1.4.0 finetuning works as expected.

Adibvafa · 2024-08-28T15:29:07Z

@Adibvafa, I have mamba-ssm installed. However, I realized that it also need causal-conv1d>=1.4.0 package train faster. Otherwise it was showing some warning related to conv1d that it's gonna use slow/sequential version. Now that I installed causal-conv1d>=1.4.0 finetuning works as expected.

Amazing!
There is currently a bug with the slow training path that either breaks in low precision training or uses a huge amount of memory at once. I suggest opening the issue for the memory surge. I have opened the issue and currently working on the low precision training error.

mohith7548 · 2024-08-28T15:41:58Z

@Adibvafa, a bug in Mamba? or transformers? Can you eloborate? Please share the link of the issue.

mohith7548 · 2024-08-28T18:02:59Z

I successfully ran the Mamba model with the new changes you made to the code. Any chance that this will also support the Mamba2 model?

vasqu · 2024-09-05T18:07:15Z

There is currently a bug with the slow training path that either breaks in low precision training or uses a huge amount of memory at once. I suggest opening the issue for the memory surge. I have opened the issue and currently working on the low precision training error.

I think the low-precision bug possibly refers to #32691. The huge amount of memory in the slow path is to be expected though and is one of the reasons why the kernel exists (i.e. to avoid materializing certain tensors etc). Nothing you can really do about this tbh.
cc @mohith7548

vasqu · 2024-09-05T18:16:39Z

tests/models/mamba/test_modeling_mamba.py

    test_pruning = False
    test_head_masking = False  # Mamba does not have attention heads
+    test_model_parallel = False
+    test_mismatched_shapes = False  # MambaMixer follows a different initialization


This seems a bit weird to me 🤔 Disabling the test_mismatched_shapes flag shouldn't be needed imo.

Could you add get_input_embeddings and set_input_embeddings methods for the ForSeqClassification class and see if it fixes those tests?

Adibvafa added 3 commits May 30, 2024 21:43

Added MambaForSequenceClassification to src/transformers

76c191c

Updated docs with MambaForSequenceClassification

4a54410

Added tests for MambaForSequenceClassification.

242d10c

Adibvafa changed the title ~~Implemented MambaForSequenceClassification - Issue #30431~~ Implemented MambaForSequenceClassification May 31, 2024

Adibvafa mentioned this pull request May 31, 2024

Mamba Models - Missing MambaForSequenceClassification #30431

Open

Adibvafa added 5 commits May 30, 2024 23:08

Fixed style errors.

4d17612

Fixed errors with dummy objects.

3a1160c

Fixed style issues.

1ba4ed6

Fixed style issues with ruff.

cb7b9ed

Fixed incorrect example in docstring.

29c83f3

ArthurZucker reviewed Jun 5, 2024

View reviewed changes

src/transformers/models/mamba/modeling_mamba.py Show resolved Hide resolved

Removed cache_params from MambaSequenceClassifierOutput

d5b8b90

Adibvafa added 2 commits June 6, 2024 21:32

Merge remote-tracking branch 'upstream/main' into main

18ef0b0

Fixed issues with the incompatibale test with MambaMixer initialization.

3b7f419

Adibvafa added 12 commits June 11, 2024 11:27

Added MambaForSequenceClassification to src/transformers

bfdee9c

Updated docs with MambaForSequenceClassification

6a37706

Added tests for MambaForSequenceClassification.

96ad016

Fixed style errors.

898ce93

Fixed errors with dummy objects.

b617614

Fixed style issues.

eba99b0

Fixed style issues with ruff.

fe4badd

Fixed incorrect example in docstring.

d80efe1

Removed cache_params from MambaSequenceClassifierOutput

bdf4936

Fixed issues with the incompatibale test with MambaMixer initialization.

734c35e

Merge branch 'main' of github.com:Adibvafa/MambaForSequenceClassifica…

97fdcbd

…tion into main

Merge remote-tracking branch 'upstream/main' into main

64e53d1

Adibvafa changed the title ~~Implemented MambaForSequenceClassification~~ Implement MambaForSequenceClassification Jun 11, 2024

Adibvafa added 2 commits July 31, 2024 21:08

Merge branch 'huggingface:main' into main

172a3f7

Merge branch 'huggingface:main' into main

3aded36

ArthurZucker reviewed Aug 5, 2024

View reviewed changes

Merge branch 'huggingface:main' into main

cc8b300

Adibvafa requested a review from ArthurZucker August 5, 2024 22:28

Adibvafa added 3 commits August 6, 2024 11:03

Merge branch 'huggingface:main' into main

df0f337

Merge branch 'huggingface:main' into main

6343b46

Merge branch 'huggingface:main' into main

3e27597

Jellymoon reviewed Aug 14, 2024

View reviewed changes

Adibvafa added 2 commits August 14, 2024 16:04

Merge branch 'huggingface:main' into main

de9afdc

Add **kwargs to MambaForSequenceClassification.

1b35cdd

Adibvafa added 2 commits August 16, 2024 23:13

Merge branch 'huggingface:main' into main

c9223f6

Merge branch 'huggingface:main' into main

514f636

This comment was marked as outdated.

Sign in to view

mohith7548 approved these changes Aug 27, 2024

View reviewed changes

Merge branch 'huggingface:main' into main

259f925

vasqu reviewed Sep 5, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement MambaForSequenceClassification #31155

Implement MambaForSequenceClassification #31155

Adibvafa commented May 31, 2024 •

edited

Loading

Adibvafa commented May 31, 2024

ArthurZucker left a comment

ArthurZucker commented Jun 6, 2024

Adibvafa commented Jun 7, 2024 •

edited

Loading

Adibvafa commented Aug 2, 2024

ArthurZucker left a comment

ArthurZucker Aug 5, 2024

Adibvafa Aug 5, 2024 •

edited

Loading

Adibvafa commented Aug 9, 2024

mohith7548 commented Aug 14, 2024

Jellymoon Aug 14, 2024

Adibvafa Aug 14, 2024

Jellymoon commented Aug 15, 2024

This comment was marked as outdated.

mohith7548 commented Aug 25, 2024

mohith7548 commented Aug 25, 2024

mohith7548 left a comment

Adibvafa commented Aug 27, 2024

Adibvafa commented Aug 27, 2024

mohith7548 commented Aug 28, 2024

Adibvafa commented Aug 28, 2024

mohith7548 commented Aug 28, 2024

mohith7548 commented Aug 28, 2024

vasqu commented Sep 5, 2024

vasqu Sep 5, 2024 •

edited

Loading

Implement MambaForSequenceClassification #31155

Are you sure you want to change the base?

Implement MambaForSequenceClassification #31155

Conversation

Adibvafa commented May 31, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

Notes

Adibvafa commented May 31, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker commented Jun 6, 2024

Adibvafa commented Jun 7, 2024 • edited Loading

Adibvafa commented Aug 2, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker Aug 5, 2024

Choose a reason for hiding this comment

Adibvafa Aug 5, 2024 • edited Loading

Choose a reason for hiding this comment

Adibvafa commented Aug 9, 2024

mohith7548 commented Aug 14, 2024

Jellymoon Aug 14, 2024

Choose a reason for hiding this comment

Adibvafa Aug 14, 2024

Choose a reason for hiding this comment

Jellymoon commented Aug 15, 2024

This comment was marked as outdated.

mohith7548 commented Aug 25, 2024

mohith7548 commented Aug 25, 2024

mohith7548 left a comment

Choose a reason for hiding this comment

Adibvafa commented Aug 27, 2024

Adibvafa commented Aug 27, 2024

mohith7548 commented Aug 28, 2024

Adibvafa commented Aug 28, 2024

mohith7548 commented Aug 28, 2024

mohith7548 commented Aug 28, 2024

vasqu commented Sep 5, 2024

vasqu Sep 5, 2024 • edited Loading

Choose a reason for hiding this comment

Adibvafa commented May 31, 2024 •

edited

Loading

Adibvafa commented Jun 7, 2024 •

edited

Loading

Adibvafa Aug 5, 2024 •

edited

Loading

vasqu Sep 5, 2024 •

edited

Loading