scripts(dflash): switch default bench target to Q8_0 + --target flag#65
Merged
Conversation
Per Markus 2026-06-04: DFlash quality measurement should use a Q8_0 target rather than Q4_K_M, since Q4_K_M introduces enough target-side quantization noise to confound DFlash's own accept-rate signal. Q8_0 fits in 38 GB total, well within titan A100 80 GB. * Default `TARGET` is now `gemma-4-31B-it-Q8_0.gguf`. Override via `--target PATH` or `DFLASH_BENCH_TARGET` env var. * Also added `DFLASH_BENCH_DRAFTER_DIR` env var for consistency. * Comment block documents VRAM math for Q4_K_M / Q8_0 / BF16 targets so future runs can pick the right card.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Per Markus 2026-06-04: DFlash quality measurement should use a Q8_0 target rather than Q4_K_M. The Q4_K_M target introduces enough quantization noise that it confounds DFlash's own accept-rate signal — we want a higher-quality reference for the speculative-decoding evaluation.
Changes
TARGETchanged fromgemma-4-31B-it-Q4_K_M.gguftogemma-4-31B-it-Q8_0.gguf.--target PATHflag for explicit per-run override.DFLASH_BENCH_TARGETandDFLASH_BENCH_DRAFTER_DIRenv vars (env-first, then CLI flag, then default).Verified
bash -n scripts/bench-dflash.sh— syntax OK--helprenders the updated docblock correctlyQ4_K_M.ggufacross the tree)Follow-up
Task ggml-org#110 already updated to reflect this. Next concrete step is the titan re-bake against
b0daec55b(Task ggml-org#109), then this bench script can run with its new default.