spec: ngram-mod, score-based pruning#19294
Draft
bfroemel wants to merge 5 commits intoggml-org:masterfrom
Draft
Conversation
Author
|
@ggerganov @srogmann Please let me know, if you see any merits in further pursuing this PR. Thanks! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
related to #19164, PoC of #19164 (comment)
Track for each ngram in the pool a capped score, initially set to SCORE_INS on insert. If an ngram was used successfully in a draft, count its score up. If the draft was rejected count its score down. On streaks remove all ngrams with a score lower than SCORE_THR.
I did superficial testing and speedup is more consistent throughout processing the whole request; no more sudden drops of speed-up after (early) low acceptance streaks. Pruning currently goes through all 4M cache pool entries + the scoring has a minor but noticeable effect on performance; there is still optimization potential.
Also added some hash pool stats (scoring state + collisions) that might be helpful to further fine-tune the parameters (SCORE_MIN, SCORE_MAX, SCORE_INS, ..).
Here logs where the prompt looked like this: [GIVEN_SOURCE_CODE|TASK] and the model was tasked to generate [A|GIVEN_SOURCE_CODE|B|GIVEN_SOURCE_CODE]. A, and B is something sampled stochastically, GIVEN_SOURCE_CODE is known to be in the hash pool. Before the change sometimes the streak was encountered (early), the entire hash pool was cleared and there was no speed-up afterwards (see: #19164 (comment)). With this change we only prune low-scored ngrams on streaks and (still useful) ngrams above or equal SCORE_THR remain in the hash pool.
Log