Skip to content

Conversation

@ydshieh
Copy link
Collaborator

@ydshieh ydshieh commented Jun 3, 2025

What does this PR do?

All pass on A10.

One fails on T4

tests/models/falcon_mamba/test_modeling_falcon_mamba.py::FalconMambaIntegrationTests::test_generation_fp16

(# On T4, get NotImplementedError: Cannot copy out of meta tensor; no data!)

I decided to just skip it on T4

@ydshieh ydshieh requested a review from vasqu June 3, 2025 16:59
Comment on lines +548 to +555
("cuda", 7): [
' I will be talking about the “Theory of Relativity” by Albert Einstein.\nThe',
' I will be talking about the importance of the internet in our lives.\nThe internet is a global',
],
("cuda", 8): [
' I am going to talk about the “Theory of Relativity” by Albert Einstein.\n',
' I will be talking about the importance of the internet in our lives.\nThe internet is a global'
],
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gante I guess it's normal that, with inputs_embeds, we don't have the prompt included, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imo, that doesn't seem correct to me. It would be weird to expect different behaviour here since we generate from the same "prompt". That might be a regression somewhere.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not expert (that is why I ping @gante ), but I kind think it's normal. For the prompt part, we only pass embedding not the token ids. And for those part, we can't recover the token ids from the embedding. That is why it only gives the part that are generated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to know if the original commit worked on this test. Or at least if on input embeds the same issues persisted (no prompt).

Both behaviours are justified imo.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's failing when the test is written 😢

(That's why I always it's important to use run-slow on PR )

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, my bad, move too fast to get the wrong results.

It's on

https://huggingface.slack.com/archives/C06LR9PQA00/p1725987868413609?thread_ts=1725987848.555519&cid=C06LR9PQA00

I will check the commit the day before it to see what happened

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirmed that this test is failing when the test is added on 2025/06/19 - I checkout to the commit, run the test.

(around that time, there are several CIs triggered manually on different commits, so hard to check on slack channels)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sry to be so picky but is it also having failures on (at least) no prefix being returned 👀 If yes, I think this is fine to merge then.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, same failing reason. Not picky, it's fine. I am also happy to wait joao's response. No urgent to merge at all :-)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gentilly ping @gante

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would wait for Joao on the input embeds side because that seems really weird to me. Otherwise, mostly nits.

def tearDown(self):
cleanup(torch_device, gc_collect=True)

# On T4, get `NotImplementedError: Cannot copy out of meta tensor; no data!`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be too little memory or some cpu offloading issues?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I guess too. But if I remove device_map="auto" and add .to(torch_device)

model = AutoModelForCausalLM.from_pretrained(self.model_id, torch_dtype=torch.float16).to(torch_device)

the generation could run without OOM.

So it's a bit strange to me why auto would cause problem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe someone to cc here? It's not a deal breaker here but might be interesting to investigate if someone wants to / has the time.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will open an issue and ping

Comment on lines +548 to +555
("cuda", 7): [
' I will be talking about the “Theory of Relativity” by Albert Einstein.\nThe',
' I will be talking about the importance of the internet in our lives.\nThe internet is a global',
],
("cuda", 8): [
' I am going to talk about the “Theory of Relativity” by Albert Einstein.\n',
' I will be talking about the importance of the internet in our lives.\nThe internet is a global'
],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imo, that doesn't seem correct to me. It would be weird to expect different behaviour here since we generate from the same "prompt". That might be a regression somewhere.

@ydshieh
Copy link
Collaborator Author

ydshieh commented Jun 19, 2025

Although it is not urgent, but let's merge. If @gante has any comment that would lead to any further change(s), we could do it in a follow-up PR.

Keeping failing tests have some risk: if other merged commits cause troubles, it won't be detected and make further fixes more difficult to handle.

(I have been in these cases several times ... trust me the pain)

@ydshieh ydshieh enabled auto-merge (squash) June 19, 2025 11:49
@ydshieh ydshieh disabled auto-merge June 19, 2025 11:50
@ydshieh ydshieh merged commit 5d26a38 into main Jun 19, 2025
15 checks passed
@ydshieh ydshieh deleted the fix_FalconMamba branch June 19, 2025 11:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants