Merged
Conversation
nastya236
approved these changes
Mar 10, 2026
Collaborator
nastya236
left a comment
There was a problem hiding this comment.
Looks great, thank you! I left once comment for myself:)
| def logit_bias_processor(_, logits): | ||
| logits[:, indices] += values | ||
| return logits | ||
| return logits.at[:, indices].add(values) |
Collaborator
There was a problem hiding this comment.
If I remember correctly, you mentioned once that .at[].add() is needed to use Scatter::Sum instead of Gather + Add + Scatter::None, right?
Member
Author
There was a problem hiding this comment.
Exactly what you wrote. The reason I changed it is because the pattern Gather -> Add -> Scatter can be significantly more inefficient when the Gather or Add depends on the src of scatter because it breaks donation and Scatter ends up doing a copy.
Aaryaman3
pushed a commit
to Aaryaman3/mlx-lm
that referenced
this pull request
Mar 12, 2026
lyonsno
added a commit
to lyonsno/mlx-lm
that referenced
this pull request
Mar 28, 2026
Semantic merge of upstream/main (through 4d3af3c) into dev. Key changes: - LRUPromptCache moved from server.py to cache.py (upstream ml-explore#1019). Dev's rewind preflight/fail-closed logic integrated into the new PromptTrie-based LRUPromptCache.fetch_nearest_cache. - Presence/frequency penalties (upstream ml-explore#971). - Better caching with CacheOrder and PromptTrie (upstream ml-explore#911, ml-explore#1019). - Harden can_trim_prompt_cache against missing is_trimmable. - Adapt behavioral tests from dev's refcounting model to upstream's deepcopy-on-fetch model. Safety contracts preserved: fail-closed on partial rewind, preflight deepcopy avoidance, exact entry preservation. All 247 tests pass (+ 88 subtests). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Different versions of the repetition penalty.