Skip to content

Conversation

@manueldeprada
Copy link
Contributor

@manueldeprada manueldeprada commented Jul 7, 2025

Fix #38725

Coming from #38725, previously, e18f233 attempted to fix the default attention mask issue that appeated with #34464, but it was still failing the slow test tests/models/moshi/test_modeling_moshi.py::MoshiIntegrationTests::test_moshika_greedy_unconditional_fp16

History from git bisect:

Setting cache_implementation="dynamic" makes the test pass, but the sliding window cache should not behave different. I believe this is due to the depth decoder being window 8 by default, but audio is confusing to me.

This PR is not a fix: the modeling code should be changed to accomodate what I highlight in the diff

cc @eustlb @ydshieh

@huggingface huggingface deleted a comment from github-actions bot Jul 8, 2025
@huggingface huggingface deleted a comment from github-actions bot Jul 8, 2025
@huggingface huggingface deleted a comment from github-actions bot Jul 8, 2025
@manueldeprada
Copy link
Contributor Author

run-slow: moshi

@github-actions
Copy link
Contributor

github-actions bot commented Jul 8, 2025

This comment contains run-slow, running the specified jobs:

models: ['models/moshi']
quantizations: [] ...

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Contributor

@eustlb eustlb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update @manueldeprada, still some work to come to fix moshi...
Is this PR still relevant concerning changes to cache_utils.py and generation/utils.py ? If not can you solve merging conflicts?

Comment on lines 1863 to -1886
def _prepare_attention_mask_for_generation(
self,
input_ids: torch.LongTensor,
inputs_tensor: torch.Tensor,
generation_config: GenerationConfig,
kwargs: dict[str, Any],
model_kwargs: dict[str, Any],
) -> torch.LongTensor:
pad_token_id = generation_config.pad_token_id
eos_token_id = generation_config.eos_token_id

default_attention_mask = torch.ones(input_ids.shape, dtype=torch.long, device=input_ids.device)
if pad_token_id is None:
return default_attention_mask

is_pad_token_in_inputs = (pad_token_id is not None) and torch.isin(input_ids, pad_token_id).any()
is_pad_token_not_equal_to_eos_token_id = (eos_token_id is None) or ~torch.isin(
eos_token_id, pad_token_id
).any()
can_infer_attention_mask = is_pad_token_in_inputs * is_pad_token_not_equal_to_eos_token_id
attention_mask_from_padding = input_ids.ne(pad_token_id).long()

attention_mask = (
attention_mask_from_padding * can_infer_attention_mask + default_attention_mask * ~can_infer_attention_mask
return super()._prepare_attention_mask_for_generation(
inputs_tensor=inputs_tensor,
generation_config=generation_config,
model_kwargs={},
)
return attention_mask
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we just remove _prepare_attention_mask_for_generation override here?

slicing = torch.arange(max_cache_len, device=value_states.device)
current_seq_len = cache_position[-1] + 1 # Use last position to determine current length
to_shift = current_seq_len > max_cache_len
to_shift = current_seq_len >= max_cache_len
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will break other models no?

# - different models have a different cache name expected by the model (default = "past_key_values")
# - `max_length`, prepared above, is used to determine the maximum cache length
max_cache_length = generation_config.max_length - 1
max_cache_length = generation_config.max_length
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will break other models no?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MoshiIntegrationTests started to fail after #34464

3 participants