Skip to content

Presence and frequency penalties#971

Merged
angeloskath merged 2 commits intomainfrom
presence-frequency
Mar 10, 2026
Merged

Presence and frequency penalties#971
angeloskath merged 2 commits intomainfrom
presence-frequency

Conversation

@angeloskath
Copy link
Copy Markdown
Member

Different versions of the repetition penalty.

@angeloskath angeloskath requested a review from nastya236 March 9, 2026 04:16
Copy link
Copy Markdown
Collaborator

@nastya236 nastya236 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thank you! I left once comment for myself:)

def logit_bias_processor(_, logits):
logits[:, indices] += values
return logits
return logits.at[:, indices].add(values)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remember correctly, you mentioned once that .at[].add() is needed to use Scatter::Sum instead of Gather + Add + Scatter::None, right?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly what you wrote. The reason I changed it is because the pattern Gather -> Add -> Scatter can be significantly more inefficient when the Gather or Add depends on the src of scatter because it breaks donation and Scatter ends up doing a copy.

@angeloskath angeloskath merged commit 4a21ffd into main Mar 10, 2026
2 checks passed
@angeloskath angeloskath deleted the presence-frequency branch March 10, 2026 05:26
Aaryaman3 pushed a commit to Aaryaman3/mlx-lm that referenced this pull request Mar 12, 2026
lyonsno added a commit to lyonsno/mlx-lm that referenced this pull request Mar 28, 2026
Semantic merge of upstream/main (through 4d3af3c) into dev. Key changes:

- LRUPromptCache moved from server.py to cache.py (upstream ml-explore#1019).
  Dev's rewind preflight/fail-closed logic integrated into the new
  PromptTrie-based LRUPromptCache.fetch_nearest_cache.
- Presence/frequency penalties (upstream ml-explore#971).
- Better caching with CacheOrder and PromptTrie (upstream ml-explore#911, ml-explore#1019).
- Harden can_trim_prompt_cache against missing is_trimmable.
- Adapt behavioral tests from dev's refcounting model to upstream's
  deepcopy-on-fetch model. Safety contracts preserved: fail-closed on
  partial rewind, preflight deepcopy avoidance, exact entry preservation.

All 247 tests pass (+ 88 subtests).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants