New dynamic cache for sliding window attention #34352

Cyrilvallez · 2024-10-23T16:21:25Z

What does this PR do?

This supersedes #33619 to introduce a new DynamicSlidingWindowCache.

This cache behaves the exact same way as DynamicCache, and contrarily to SlidingWindowCache (static variant), we can correctly continue generation from an existing cache instance (fully filled or not), with more than 1 new token added to the sequence (e.g. for prefix caching).

In order for this to work I had to do the following main modifications:

I introduced get_past_seen_tokens(), which should replace get_seq_length() (almost) everywhere. This is because once the cache is filled, the 2 will not longer provide the same information, so we need to correctly differentiate to recreate the cache_position
for every cache class except DynamicSlidingWindowCache, both get_past_seen_tokens and get_seq_length will return the same value, so no issue there
For every generative model using sliding window, I modified the 4d causal mask creation for the new cache class for both sdpa and eager, and correctly sliced the 2d mask for FA2

Once generate no longer returns legacy cache by default (tuples), we can make this class the default for generative models with sliding window. We cannot do it before because we would lose the information of the past seen tokens if the cache is full.

It also makes a nice precedent to support the same features with the static variant in a subsequent PR.

…r prefill stage

Cyrilvallez · 2024-11-01T09:47:28Z

Ok @ArthurZucker it has been a long time on the side but it's now ready for final review!

Cyrilvallez · 2024-11-01T16:19:44Z

Slow tests for Mistral are all good (4 failing but similar on main)

ArthurZucker · 2024-11-25T10:34:01Z

Hey! As we discussed offline, not super super sure we need this. The big blocker for me is that you are changing a lot of model forward, which is something we try to avoid !

ArthurZucker added the run-benchmark label Oct 23, 2024

Cyrilvallez force-pushed the sliding-window branch 4 times, most recently from 456eda6 to 21b472c Compare October 24, 2024 17:23

Cyrilvallez marked this pull request as ready for review October 24, 2024 18:54

Cyrilvallez changed the title ~~Sliding window~~ New dynamic cache for sliding window attention Oct 24, 2024

Cyrilvallez added 23 commits November 1, 2024 10:23

Add new dynamic cache

96f0100

Add cache by default in generate for models supporting it

568f807

Add to __init__ and correct typo

be62f53

Correct output if prefill larger than sliding window + compatibility

52b920d

Add legacy format handling

0c0836e

style

b087839

add docs

91a1fee

fix import

a662de2

Update dummy_pt_objects.py

ce203ea

Update test

1719984

style

05d1053

update cache conversion in test

cb2de20

style

7dfc86d

Allow the cache to support new states of more than 1 token, even afte…

cc13c4c

…r prefill stage

Update cache_utils.py

1887bda

maybe change test

ad86620

revert tests diffs

a6f0d8d

define get_seen_tokens

5394c77

Modify all current .get_seq_length names

25b8f80

style

daae19a

trigger CIs

6b7cb5a

Add tests

9cc7077

Update test_utils.py

e217859

Cyrilvallez added 23 commits November 1, 2024 10:29

fix missed conflict

d240402

Apply to other models

a7ae24d

Add required arg in prepare_inoput

e0de263

Update test_utils.py

8bc872a

Update test_utils.py

5ae270b

Fix kv_seq_length and rotary_seq_length

5e248fa

up

4c04f89

up

cba5ae4

up

d200e77

up

2e40fd3

CIs

50731b5

improve sdpa is_causal escape

bbd6069

make fix-copies

6ae7ec0

add check for models with sliding window

efc1131

Update modeling_git.py

6478126

style

2055eda

Update modeling_mimi.py

35dd895

Update utils.py

fb231cf

replace get_seq_length

8affc79

Update test_utils.py

bc5b036

CIs

09bab35

CIs

6c80731

Update modeling_longt5.py

5efa057

Cyrilvallez force-pushed the sliding-window branch from 00b66b5 to 5efa057 Compare November 1, 2024 09:30

Update skip test for moshi

0f83f21

Cyrilvallez closed this Aug 12, 2025

Cyrilvallez deleted the sliding-window branch August 12, 2025 12:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New dynamic cache for sliding window attention #34352

New dynamic cache for sliding window attention #34352

Uh oh!

Cyrilvallez commented Oct 23, 2024 •

edited

Loading

Uh oh!

Cyrilvallez commented Nov 1, 2024

Uh oh!

Cyrilvallez commented Nov 1, 2024

Uh oh!

ArthurZucker commented Nov 25, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

New dynamic cache for sliding window attention #34352

New dynamic cache for sliding window attention #34352

Uh oh!

Conversation

Cyrilvallez commented Oct 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

Cyrilvallez commented Nov 1, 2024

Uh oh!

Cyrilvallez commented Nov 1, 2024

Uh oh!

ArthurZucker commented Nov 25, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Cyrilvallez commented Oct 23, 2024 •

edited

Loading