Skip to content

Commit c96dba4

Browse files
committed
doc change blog11 to blog12
Signed-off-by: Fred Wei <[email protected]>
1 parent a954d23 commit c96dba4

6 files changed

+5
-5
lines changed
Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@ Provides sufficient concurrency to achieve good performance while ease of use. C
5555

5656
This is the call sequence diagram of `Scaffolding`:
5757
<div align="center">
58-
<img src="../media/scaffolding_sequence.png" alt="Scaffolding Sequence" width="900px">
58+
<img src="../media/tech_blog12_scaffolding_sequence.png" alt="Scaffolding Sequence" width="900px">
5959
</div>
6060
<p align="center"><sub><em>Figure 1. Scaffolding Sequence</em></sub></p>
6161

@@ -186,7 +186,7 @@ Let's make a summary of the overall implementation of `Scaffolding`. If users wa
186186
Dynasor-CoT is a certainty-based, training-free approach to accelerate Chain-of-Thought (CoT) inference. This chapter discusses how inference-time compute methods can be smoothly integrated into the TRT-LLM Scaffolding framework, using Dynasor-CoT as an example.
187187

188188
<div align="center">
189-
<img src="../media/tech_blog11_dynasor_demo.gif" alt="Dynasor Demo" width="900px">
189+
<img src="../media/tech_blog12_dynasor_demo.gif" alt="Dynasor Demo" width="900px">
190190
</div>
191191
<p align="center"><sub><em>Figure 2. Demo of DeepSeek-R1-Distill-Qwen-7B achieving a 5.74x speedup compared to the baseline when using Dynasor-CoT on MATH500</em></sub></p>
192192

@@ -197,7 +197,7 @@ LLM reasoning is highly token-inefficient, often requiring far more tokens to ac
197197
For instance, Figure 2 compares a traditional Qwen-7B model with a reasoning-focused, Deepseek-distilled Qwen-7B model on a simple question. While the traditional model reaches its answer in 180 tokens, the reasoning model expends 1,000 tokens on iterative verification, despite having already found the correct answer at token 340. This represents a significant waste of tokens for diminishing returns on accuracy.
198198

199199
<div align="center">
200-
<img src="../media/tech_blog11_dynasor_hesitation.png" alt="Motivation" width="900px">
200+
<img src="../media/tech_blog12_dynasor_hesitation.png" alt="Motivation" width="900px">
201201
</div>
202202
<p align="center"><sub><em>Figure 2. An example answer from reasoning model (Deepseek-distilled Qwen-2.5 7B) vs traditional model (Qwen-2.5 7B) on one of the problem in MATH500 dataset.</em></sub></p>
203203

@@ -208,7 +208,7 @@ More specifically, a probe is an extra generation request with an eliciting prom
208208

209209

210210
<div align="center">
211-
<img src="../media/tech_blog11_dynasor_pressure_testing.png" alt="Dynasor Demo" width="900px">
211+
<img src="../media/tech_blog12_dynasor_pressure_testing.png" alt="Dynasor Demo" width="900px">
212212
</div>
213213
<p align="center"><sub><em>Figure 3. DeepSeek-R1's performance on AMC23 and AIME24 at varying token budgets. (Left) Standard reasoning with late answer outputs. (Right) Early answer extraction using the Probe-In-The-Middle technique, demonstrating equivalent accuracy with a 50% token reduction. The greener regions in the right panels suggest the model knows the answers much earlier than it reveals in standard reasoning.</em></sub></p>
214214

@@ -224,7 +224,7 @@ Figure 4 provides an illustration:
224224
* **Case 3**: The model generates special tokens like "wait" or "hmm," which also indicate hesitation, so we continue the generation.
225225

226226
<div align="center">
227-
<img src="../media/tech_blog11_dynasor_illustration.jpg" alt="Dynasor Illustration" width="900px">
227+
<img src="../media/tech_blog12_dynasor_illustration.jpg" alt="Dynasor Illustration" width="900px">
228228
</div>
229229
<p align="center"><sub><em>Figure 4. Illustration of Dynasor-CoT. Case 1: early exit due to consistent early-stage results. Case 2: continue generation due to inconsistent early-stage results. Case 3: responses containing hesitation words (e.g., wait) are disgarded.</em></sub></p>
230230

0 commit comments

Comments
 (0)