Skip to content

[Model Runner V2] Spec decode rejection sampler logprobs support#37237

Merged
WoosukKwon merged 1 commit intovllm-project:mainfrom
TheEpicDolphin:gdelfin/mrv2-spec-decode-rejection-sample-logprobs
Mar 19, 2026
Merged

[Model Runner V2] Spec decode rejection sampler logprobs support#37237
WoosukKwon merged 1 commit intovllm-project:mainfrom
TheEpicDolphin:gdelfin/mrv2-spec-decode-rejection-sample-logprobs

Conversation

@TheEpicDolphin
Copy link
Copy Markdown
Collaborator

@TheEpicDolphin TheEpicDolphin commented Mar 16, 2026

Purpose

Following up on #35461, specifically with the logprobs support.

In order to get the top logprobs using compute_topk_logprobs, I need the sampled token ids shape to be [num_logits]. For strict rejection sampling, this is easy because the Sampler already returns the ground-truth target token ids. But in probabilistic rejection sampling, we don't sample target token ids (it would be a waste of compute). So we need to get the output sampled token ids first, and then flatten them from [num_reqs, num_speculative_steps + 1] => [num_logits]. I do this using a simple _flatten_sampled_kernel. This works, and allows us to get the top logprobs.

Testing

Served 8 requests (temperature = 0) concurrently. Below are the prompts and responses including the top-3 logprobs:

0. Explain the theory of relativity in simple terms.
'The'              (-0.107) | top: ['The':-0.107, 'A':-2.482, 'Albert':-4.357]
' theory'          (-0.003) | top: [' theory':-0.003, ' Theory':-6.003, ' famous':-9.128]
' of'              ( 0.000) | top: [' of':0.000, '!':-17.875, '!\n\n':-20.500]
' rel'             (-0.000) | top: [' rel':-0.000, ' special':-13.375, ' relative':-14.625]
'ativity'          (-0.000) | top: ['ativity':-0.000, 'avity':-15.750, 'atility':-16.375]
'!'                (-0.235) | top: ['!':-0.235, ',':-1.985, ' is':-2.860]
' It'              (-0.430) | top: [' It':-0.430, ' One':-1.555, ' Albert':-2.930]
"'s"               (-0.004) | top: ["'s":-0.004, ' can':-6.129, ' may':-6.629]
' a'               (-0.023) | top: [' a':-0.023, ' actually':-4.023, ' one':-6.148]
' mind'            (-0.481) | top: [' mind':-0.481, ' complex':-1.981, ' big':-1.981]
'-b'               (-0.001) | top: ['-b':-0.001, '-st':-7.501, '-bl':-8.501]
'ending'           (-0.253) | top: ['ending':-0.253, 'low':-1.503, 'ender':-7.003]
' concept'         (-0.026) | top: [' concept':-0.026, ' idea':-3.776, ' topic':-6.151]
','                (-0.577) | top: [',':-0.577, ' that':-0.827, ' developed':-6.702]
' but'             (-0.000) | top: [' but':-0.000, ' even':-11.000, ' isn':-11.750]
' I'               (-0.080) | top: [' I':-0.080, ' don':-2.580, ' fear':-7.205]
"'ll"              (-0.008) | top: ["'ll":-0.008, "'d":-5.383, "'m":-5.633]
' try'             (-0.010) | top: [' try':-0.010, ' do':-5.260, ' break':-5.385]
' to'              (-0.000) | top: [' to':-0.000, ' my':-13.250, ' break':-17.750]
' break'           (-0.203) | top: [' break':-0.203, ' simplify':-1.828, ' explain':-3.828]
' it'              ( 0.000) | top: [' it':0.000, 'it':-21.250, ' down':-21.625]
' down'            ( 0.000) | top: [' down':0.000, 'down':-19.750, '.down':-20.375]
' in'              (-0.015) | top: [' in':-0.015, ' simply':-4.640, ' into':-5.765]
' simple'          (-0.002) | top: [' simple':-0.002, ' a':-6.627, ' super':-8.377]
' terms'           (-0.000) | top: [' terms':-0.000, ' language':-9.250, ' words':-11.250]
'.\n\n'            (-0.038) | top: ['.\n\n':-0.038, '\n\n':-3.913, ':\n\n':-4.788]
'**'               (-0.262) | top: ['**':-0.262, 'The':-1.637, 'Albert':-4.012]
'What'             (-0.142) | top: ['What':-0.142, 'The':-2.392, 'Theory':-4.142]
' is'              (-0.013) | top: [' is':-0.013, "'s":-4.388, ' are':-8.638]
' the'             (-0.328) | top: [' the':-0.328, ' rel':-1.328, ' time':-5.203]
' theory'          (-0.018) | top: [' theory':-0.018, ' Theory':-4.018, 'theory':-12.018]
' of'              (-0.000) | top: [' of':-0.000, '?':-15.000, ' about':-20.125]
' rel'             (-0.000) | top: [' rel':-0.000, ' relativ':-14.750, ' Rel':-16.250]
'ativity'          (-0.000) | top: ['ativity':-0.000, 'atility':-13.000, 'inity':-13.250]
'?'                (-0.000) | top: ['?':-0.000, '?"\n\n':-9.875, ' all':-11.625]
'**\n\n'           (-0.018) | top: ['**\n\n':-0.018, '**\n':-4.018, '**':-13.018]
'The'              (-0.064) | top: ['The':-0.064, 'Albert':-3.564, 'In':-3.564]
' theory'          (-0.000) | top: [' theory':-0.000, 'ory':-11.625, ' Theory':-11.750]
' of'              (-0.000) | top: [' of':-0.000, ' was':-10.125, ',':-10.750]
' rel'             (-0.000) | top: [' rel':-0.000, ' special':-14.000, ' relativ':-15.375]
'ativity'          (-0.000) | top: ['ativity':-0.000, 'avity':-16.375, 'ality':-17.875]
','                (-0.295) | top: [',':-0.295, ' is':-1.545, ' was':-3.170]
' developed'       (-0.325) | top: [' developed':-0.325, ' proposed':-1.450, ' introduced':-3.325]
' by'              ( 0.000) | top: [' by':0.000, ' in':-18.750, 'by':-18.875]
' Albert'          (-0.000) | top: [' Albert':-0.000, 'Albert':-8.250, ' physicist':-9.750]
' Einstein'        ( 0.000) | top: [' Einstein':0.000, ' Ein':-18.375, 'E':-20.750]
','                (-0.043) | top: [',':-0.043, ' in':-3.168, ' (':-12.293]
' is'              (-0.095) | top: [' is':-0.095, ' explains':-3.845, ' says':-3.970]
' a'               (-0.042) | top: [' a':-0.042, ' an':-3.292, ' about':-6.167]
' way'             (-0.064) | top: [' way':-0.064, ' fundamental':-3.064, ' concept':-5.189]
' of'              (-0.313) | top: [' of':-0.313, ' to':-1.313, ' that':-14.563]
' understanding'   (-0.002) | top: [' understanding':-0.002, ' thinking':-6.627, ' explaining':-7.377]
' how'             (-0.011) | top: [' how':-0.011, ' space':-5.136, ' the':-5.511]
' the'             (-0.027) | top: [' the':-0.027, ' space':-3.777, ' time':-5.652]
' universe'        (-0.000) | top: [' universe':-0.000, ' Universe':-9.250, ' world':-9.250]
' works'           (-0.003) | top: [' works':-0.003, ' behaves':-6.128, ' and':-8.378]
'.'                (-0.026) | top: ['.':-0.026, ',':-4.401, ' and':-4.651]
' It'              (-0.010) | top: [' It':-0.010, ' He':-5.510, ' There':-5.760]
"'s"               (-0.221) | top: ["'s":-0.221, ' says':-1.846, ' shows':-4.971]
' based'           (-0.368) | top: [' based':-0.368, ' a':-2.368, ' about':-2.868]
' on'              (-0.000) | top: [' on':-0.000, ' around':-15.500, ' two':-17.500]
' two'             (-0.009) | top: [' two':-0.009, ' the':-4.759, ' a':-7.634]
' main'            (-0.001) | top: [' main':-0.001, ' simple':-7.876, ' key':-8.751]
' ideas'           (-0.012) | top: [' ideas':-0.012, ' principles':-4.887, ' concepts':-6.012]
1. What is the capital of France?
'The'              (-0.059) | top: ['The':-0.059, 'That':-2.934, 'Easy':-6.309]
' capital'         (-0.000) | top: [' capital':-0.000, ' answer':-11.125, 'capital':-14.125]
' of'              (-0.000) | top: [' of':-0.000, ' city':-14.750, ' and':-18.750]
' France'          (-0.000) | top: [' France':-0.000, 'France':-14.375, ' Franc':-18.250]
' is'              ( 0.000) | top: [' is':0.000, ' was':-18.375, ' adalah':-18.625]
' Paris'           (-0.000) | top: [' Paris':-0.000, 'Paris':-9.625, ' PAR':-12.125]
'.'                (-0.209) | top: ['.':-0.209, '!':-1.709, ' (':-5.084]
'<|eot_id|>'       (-0.000) | top: ['<|eot_id|>':-0.000, ' It':-11.125, ' Paris':-11.750]
2. Write a haiku about coding.
'Here'             (-0.174) | top: ['Here':-0.174, 'Lines':-2.674, 'Code':-3.174]
' is'              (-0.034) | top: [' is':-0.034, "'s":-3.409, ''s':-12.159]
' a'               (-0.000) | top: [' a':-0.000, ' ha':-12.000, ' one':-16.750]
' ha'              (-0.000) | top: [' ha':-0.000, ' short':-9.750, ' Ha':-12.375]
'iku'              ( 0.000) | top: ['iku':0.000, 'ika':-17.500, 'ibu':-18.125]
' about'           (-0.000) | top: [' about':-0.000, ':\n\n':-14.375, 'about':-17.875]
' coding'          (-0.000) | top: [' coding':-0.000, 'coding':-14.625, ' code':-15.000]
':\n\n'            (-0.000) | top: [':\n\n':-0.000, ':\n\n\n':-10.250, ':\n':-12.625]
'Lines'            (-0.481) | top: ['Lines':-0.481, 'Code':-1.231, ' Lines':-3.731]
' of'              (-0.001) | top: [' of':-0.001, ' dance':-7.876, ' and':-8.251]
' code'            (-0.001) | top: [' code':-0.001, ' logic':-8.501, ' ones':-9.126]
' unfold'          (-0.522) | top: [' unfold':-0.522, ' flow':-1.397, ' dance':-2.647]
'\n'               (-0.000) | top: ['\n':-0.000, ',\n':-11.000, '\r\n':-12.875]
'Logic'            (-1.358) | top: ['Logic':-1.358, 'Bug':-2.108, 'Mean':-2.233]
' flows'           (-1.228) | top: [' flows':-1.228, "'s":-1.353, ' and':-1.478]
' like'            (-0.424) | top: [' like':-0.424, ',':-1.174, ' from':-3.549]
' digital'         (-1.392) | top: [' digital':-1.392, ' rivers':-1.642, ' a':-2.017]
' streams'         (-1.095) | top: [' streams':-1.095, ' stream':-1.220, ' rain':-2.220]
'\n'               (-0.000) | top: ['\n':-0.000, '<|eot_id|>':-8.250, ',\n':-12.688]
'Beauty'           (-0.776) | top: ['Beauty':-0.776, 'Creation':-1.901, 'Art':-2.526]
' in'              (-0.001) | top: [' in':-0.001, ' is':-8.751, ' born':-9.001]
' the'             (-0.463) | top: [' the':-0.463, ' bits':-2.651, ' bytes':-2.776]
' bug'             (-0.360) | top: [' bug':-0.360, ' bugs':-1.485, ' code':-3.610]
'<|eot_id|>'       (-0.001) | top: ['<|eot_id|>':-0.001, '-free':-7.439, 'free':-9.751]
3. List three benefits of regular exercise.
'Here'             (-0.000) | top: ['Here':-0.000, 'Regular':-9.750, 'A':-10.500]
' are'             (-0.000) | top: [' are':-0.000, ' three':-13.250, ' Are':-15.625]
' three'           (-0.000) | top: [' three':-0.000, 'three':-15.250, ' the':-16.125]
' benefits'        (-0.000) | top: [' benefits':-0.000, ' significant':-11.375, ' Benefits':-13.750]
' of'              ( 0.000) | top: [' of':0.000, ' to':-24.000, ' or':-24.125]
' regular'         (-0.000) | top: [' regular':-0.000, 'regular':-16.375, ' regularly':-16.500]
' exercise'        (-0.000) | top: [' exercise':-0.000, ' Exercise':-14.375, ' exercises':-14.750]
':\n\n'            (-0.000) | top: [':\n\n':-0.000, ':\r\n\r\n':-14.250, ':\n\n\n':-14.625]
'1'                (-0.000) | top: ['1':-0.000, '**':-13.500, ' **':-19.875]
'.'                ( 0.000) | top: ['.':0.000, '.?':-18.438, '.\n\n':-18.438]
' **'              (-0.000) | top: [' **':-0.000, ' Impro':-12.125, ' Improved':-12.500]
'Impro'            (-0.410) | top: ['Impro':-0.410, 'Improved':-1.410, 'Weight':-2.410]
'ves'              (-0.001) | top: ['ves':-0.000, 'vements':-8.750, 'ving':-9.125]
' Physical'        (-0.462) | top: [' Physical':-0.462, ' Cardio':-1.337, ' Mental':-2.837]
' Health'          (-0.000) | top: [' Health':-0.000, ' and':-10.250, ' Heath':-10.625]
'**:'              (-0.000) | top: ['**:':-0.000, ':**':-11.750, ' and':-15.250]
' Regular'         (-0.000) | top: [' Regular':-0.000, ' Exercise':-7.875, 'Regular':-10.000]
' exercise'        (-0.000) | top: [' exercise':-0.000, ' physical':-12.250, ' Exercise':-12.875]
' can'             (-0.163) | top: [' can':-0.163, ' helps':-1.913, ' has':-6.163]
' help'            (-0.000) | top: [' help':-0.000, ' improve':-8.625, ' reduce':-9.750]
' to'              (-0.671) | top: [' to':-0.671, ' reduce':-1.671, ' you':-2.421]
' reduce'          (-0.834) | top: [' reduce':-0.834, ' maintain':-1.709, ' improve':-1.834]
' the'             (-0.023) | top: [' the':-0.023, ' your':-3.773, ' risk':-13.398]
' risk'            (-0.000) | top: [' risk':-0.000, ' risks':-11.375, 'risk':-12.750]
' of'              (-0.000) | top: [' of':-0.000, ' or':-14.750, ' factors':-18.500]
' chronic'         (-0.004) | top: [' chronic':-0.004, ' developing':-5.629, ' many':-8.004]
' diseases'        (-0.000) | top: [' diseases':-0.000, ' illnesses':-10.125, ' health':-12.375]
','                (-0.576) | top: [',':-0.576, ' such':-0.826, ' like':-8.451]
' such'            (-0.000) | top: [' such':-0.000, 'such':-14.875, ' including':-14.875]
' as'              ( 0.000) | top: [' as':0.000, ' heart':-19.625, ' obesity':-20.500]
' heart'           (-0.000) | top: [' heart':-0.000, ' diabetes':-10.000, ' cardiovascular':-10.375]
' disease'         (-0.000) | top: [' disease':-0.000, ' Disease':-14.500, ' diseases':-15.125]
','                (-0.000) | top: [',':-0.000, ' and':-14.250, ',':-22.000]
' diabetes'        (-0.363) | top: [' diabetes':-0.363, ' type':-1.238, ' stroke':-4.238]
','                (-0.000) | top: [',':-0.000, ' and':-13.750, ' mell':-16.000]
' and'             (-0.000) | top: [' and':-0.000, ' obesity':-13.375, ' some':-13.500]
' some'            (-0.069) | top: [' some':-0.069, ' certain':-2.819, ' obesity':-4.944]
' cancers'         (-0.281) | top: [' cancers':-0.281, ' types':-1.406, ' forms':-8.531]
'.'                (-0.026) | top: ['.':-0.026, ',':-3.651, ' by':-9.151]
' It'              (-0.694) | top: [' It':-0.694, ' Exercise':-0.694, ' Regular':-8.319]
' can'             (-0.002) | top: [' can':-0.002, ' also':-6.502, 'can':-14.627]
' also'            (-0.000) | top: [' also':-0.000, 'also':-13.875, ' lower':-13.875]
' help'            (-0.489) | top: [' help':-0.489, ' improve':-0.989, ' lower':-4.364]
' to'              (-0.001) | top: [' to':-0.001, ' with':-6.626, ' manage':-10.376]
' manage'          (-0.909) | top: [' manage':-0.909, ' improve':-1.284, ' lower':-1.909]
' weight'          (-0.138) | top: [' weight':-0.138, ' existing':-3.138, ' conditions':-3.638]
','                (-0.000) | top: [',':-0.000, ' loss':-9.625, ' and':-10.000]
' improve'         (-0.092) | top: [' improve':-0.092, ' boost':-3.217, ' lower':-3.842]
' sleep'           (-0.627) | top: [' sleep':-0.627, ' blood':-1.002, ' bone':-3.377]
' quality'         (-0.143) | top: [' quality':-0.143, ',':-2.018, ' patterns':-7.893]
','                (-0.000) | top: [',':-0.000, ' and':-13.000, ',':-23.375]
' and'             (-0.000) | top: [' and':-0.000, ' boost':-14.750, ' increase':-15.375]
' boost'           (-0.403) | top: [' boost':-0.403, ' increase':-1.278, ' reduce':-3.528]
' overall'         (-0.346) | top: [' overall':-0.346, ' immune':-1.971, ' the':-2.971]
' physical'        (-0.007) | top: [' physical':-0.007, ' energy':-5.257, ' fitness':-7.507]
' health'          (-1.039) | top: [' health':-1.039, ' fitness':-1.039, ' function':-1.414]
'.\n'              (-0.013) | top: ['.\n':-0.013, ' and':-4.388, '.\n\n':-8.263]
'2'                ( 0.000) | top: ['2':0.000, '3':-18.000, '۲':-21.125]
'.'                ( 0.000) | top: ['.':0.000, ',':-20.500, '.\n':-22.938]
' **'              ( 0.000) | top: [' **':0.000, '**':-19.250, ' **\n':-19.500]
'Enh'              (-0.671) | top: ['Enh':-0.671, 'Boost':-0.796, 'Red':-4.296]
'ances'            (-0.000) | top: ['ances':-0.000, 'anced':-13.375, 'ases':-14.000]
' Mental'          (-0.000) | top: [' Mental':-0.000, ' Cognitive':-10.000, ' Mood':-10.625]
' Well'            (-0.143) | top: [' Well':-0.143, ' Health':-2.018, ' Wellness':-8.518]
4. How does a refrigerator keep food cold?
'A'                (-0.229) | top: ['A':-0.229, 'Re':-1.604, 'The':-5.854]
' refrigerator'    (-0.001) | top: [' refrigerator':-0.001, ' refriger':-8.001, ' fridge':-9.001]
' keeps'           (-0.074) | top: [' keeps':-0.074, ' is':-3.449, ',':-3.699]
' food'            (-0.000) | top: [' food':-0.000, ' your':-10.125, ' our':-15.125]
' cold'            (-0.000) | top: [' cold':-0.000, ' cool':-11.500, ' and':-11.500]
' by'              (-0.441) | top: [' by':-0.441, ' through':-1.066, ' using':-4.441]
' using'           (-0.004) | top: [' using':-0.004, ' utilizing':-5.879, ' transferring':-7.379]
' a'               (-0.005) | top: [' a':-0.005, ' refriger':-5.755, ' the':-6.880]
' combination'     (-0.071) | top: [' combination':-0.071, ' refriger':-3.321, ' process':-3.821]
' of'              ( 0.000) | top: [' of':0.000, 'of':-19.125, ' or':-19.750]
' several'         (-1.368) | top: [' several':-1.368, ' technologies':-1.368, ' principles':-2.118]
' technologies'    (-0.639) | top: [' technologies':-0.639, ' components':-1.389, ' mechanisms':-2.764]
' and'             (-0.526) | top: [' and':-0.526, ' to':-0.901, ' that':-6.151]
' principles'      (-0.450) | top: [' principles':-0.450, ' mechanisms':-1.575, ' processes':-2.700]
' to'              (-0.288) | top: [' to':-0.288, '.':-1.413, ' of':-5.163]
' remove'          (-0.689) | top: [' remove':-0.689, ' transfer':-1.439, ' maintain':-1.939]
' heat'            (-0.000) | top: [' heat':-0.000, ' and':-8.750, ' warmth':-9.875]
' from'            (-0.005) | top: [' from':-0.005, ' and':-5.255, ' energy':-9.755]
' the'             (-0.002) | top: [' the':-0.002, ' its':-6.502, ' inside':-9.002]
' interior'        (-0.133) | top: [' interior':-0.133, ' inside':-2.383, ' compartment':-4.258]
' of'              (-0.083) | top: [' of':-0.083, ' and':-3.083, ' compartment':-3.458]
' the'             ( 0.000) | top: [' the':0.000, 'the':-19.125, ' a':-19.125]
' appliance'       (-0.372) | top: [' appliance':-0.372, ' fridge':-1.497, ' unit':-3.247]
' and'             (-0.049) | top: [' and':-0.049, '.':-3.049, '.\n\n':-10.299]
' transfer'        (-0.302) | top: [' transfer':-0.302, ' maintain':-1.427, ' replace':-5.427]
' it'              (-0.000) | top: [' it':-0.000, ' that':-8.875, ' the':-12.250]
' outside'         (-0.576) | top: [' outside':-0.576, ' to':-0.826, ' outdoors':-9.201]
'.'                (-0.011) | top: ['.':-0.011, ',':-5.011, ' to':-5.761]
' Here'            (-0.014) | top: [' Here':-0.014, ' The':-4.264, ' This':-9.264]
"'s"               (-0.005) | top: ["'s":-0.005, ' are':-5.380, ' is':-12.255]
' a'               (-0.003) | top: [' a':-0.003, ' how':-6.003, ' the':-9.003]
' simplified'      (-0.145) | top: [' simplified':-0.145, ' step':-2.395, ' breakdown':-3.270]
' explanation'     (-0.044) | top: [' explanation':-0.044, ' overview':-3.169, ' breakdown':-6.794]
' of'              (-0.252) | top: [' of':-0.252, ':\n\n':-1.502, ':\n':-14.377]
' the'             (-0.160) | top: [' the':-0.160, ' how':-1.910, ' some':-14.285]
' process'         (-0.304) | top: [' process':-0.304, ' main':-1.429, ' key':-4.304]
':\n\n'            (-0.000) | top: [':\n\n':-0.000, ':\n':-14.500, ':\r\n\r\n':-16.375]
'1'                (-0.021) | top: ['1':-0.021, '**':-3.896, ' **':-14.771]
'.'                (-0.000) | top: ['.':-0.000, '.?':-16.625, '️':-18.688]
' **'              (-0.007) | top: [' **':-0.007, ' Cooling':-6.007, ' Com':-6.757]
'Cool'             (-1.098) | top: ['Cool':-1.098, 'Compression':-1.848, 'Re':-1.848]
'ing'              (-0.349) | top: ['ing':-0.349, 'ant':-1.224, 'ants':-7.224]
' System'          (-0.914) | top: [' System':-0.914, ' Cycle':-1.914, ' system':-2.164]
'**:'              (-0.160) | top: ['**:':-0.160, ':**':-1.910, ':':-8.785]
' The'             (-0.089) | top: [' The':-0.089, ' A':-3.089, ' Most':-3.964]
' refrigerator'    (-0.039) | top: [' refrigerator':-0.039, ' refriger':-3.914, ' heart':-5.039]
' has'             (-0.289) | top: [' has':-0.289, ' contains':-1.664, ' uses':-3.039]
' a'               (-0.001) | top: [' a':-0.001, ' two':-7.501, ' an':-7.751]
' cooling'         (-0.170) | top: [' cooling':-0.170, ' refriger':-2.545, ' built':-3.045]
' system'          (-0.000) | top: [' system':-0.000, ' unit':-10.750, ' coil':-11.250]
' that'            (-0.062) | top: [' that':-0.062, ',':-3.312, ' consisting':-3.937]
' uses'            (-1.068) | top: [' uses':-1.068, ' consists':-1.068, ' circ':-1.693]
' a'               (-0.038) | top: [' a':-0.038, ' refriger':-3.288, ' the':-9.413]
' refriger'        (-0.049) | top: [' refriger':-0.049, ' liquid':-3.674, ' type':-4.424]
'ant'              (-0.000) | top: ['ant':-0.000, 'ation':-10.625, 'ated':-15.000]
','                (-0.173) | top: [',':-0.173, ' (':-2.048, ' to':-3.673]
' such'            (-0.663) | top: [' such':-0.663, ' a':-1.288, ' which':-1.913]
' as'              (-0.000) | top: [' as':-0.000, ' a':-12.375, ' us':-19.313]
' Fre'             (-0.326) | top: [' Fre':-0.326, ' fre':-1.450, ' refriger':-3.950]
'on'               (-0.000) | top: ['on':-0.000, 'ón':-14.375, 'ON':-15.750]
','                (-0.193) | top: [',':-0.193, ' or':-2.318, ' (':-2.568]
' to'              (-0.165) | top: [' to':-0.165, ' which':-1.915, ' that':-5.790]
' absorb'          (-0.048) | top: [' absorb':-0.048, ' transfer':-3.548, ' circ':-4.673]
' heat'            (-0.004) | top: [' heat':-0.004, ' and':-5.504, ' the':-10.879]
5. What is the difference between HTTP and HTTPS?
'HTTP'             (-0.022) | top: ['HTTP':-0.022, 'The':-3.897, 'HTTPS':-6.897]
' ('               (-0.001) | top: [' (':-0.001, ' and':-6.626, ' stands':-11.126]
'H'                (-0.014) | top: ['H':-0.014, 'Hyper':-4.264, 'Hy':-10.014]
'yp'               (-0.000) | top: ['yp':-0.000, 'ypo':-9.125, 'yper':-9.375]
'ertext'           (-0.000) | top: ['ertext':-0.000, 'ert':-16.875, 'ersonic':-17.500]
' Transfer'        (-0.000) | top: [' Transfer':-0.000, ' Transport':-10.125, 'Transfer':-11.000]
' Protocol'        (-0.000) | top: [' Protocol':-0.000, 'Protocol':-10.250, ' protocol':-13.250]
')'                (-0.000) | top: [')':-0.000, '),':-16.750, ' )':-18.250]
' and'             (-0.055) | top: [' and':-0.055, ' is':-2.930, 'and':-15.430]
' HTTPS'           (-0.000) | top: [' HTTPS':-0.000, 'HTTPS':-11.000, ' HTTP':-12.500]
' ('               (-0.000) | top: [' (':-0.000, ' are':-14.875, ' ()':-17.375]
'H'                (-0.000) | top: ['H':-0.000, 'Hyper':-10.875, 'Secure':-11.875]
'yp'               (-0.000) | top: ['yp':-0.000, 'ypo':-13.375, ' Hyp':-14.375]
'ertext'           (-0.000) | top: ['ertext':-0.000, 'ert':-13.250, 'ext':-16.250]
' Transfer'        (-0.000) | top: [' Transfer':-0.000, ' Transport':-9.250, 'Transfer':-10.375]
' Protocol'        (-0.000) | top: [' Protocol':-0.000, 'Protocol':-11.625, ' protocol':-12.375]
' Secure'          (-0.000) | top: [' Secure':-0.000, ' Sec':-8.625, 'Secure':-9.375]
')'                (-0.000) | top: [')':-0.000, '),':-14.750, '))':-15.875]
' are'             (-0.000) | top: [' are':-0.000, ' both':-7.875, ' differ':-11.125]
' both'            (-0.593) | top: [' both':-0.593, ' two':-0.843, ' the':-4.343]
' protocols'       (-0.504) | top: [' protocols':-0.504, ' used':-1.254, ' communication':-2.254]
' used'            (-0.001) | top: [' used':-0.001, ' for':-7.376, ' that':-7.626]
' for'             (-0.180) | top: [' for':-0.180, ' to':-1.805, ' by':-8.805]
' transferring'    (-0.073) | top: [' transferring':-0.073, ' transmitting':-2.823, ' exchanging':-5.198]
' data'            (-0.000) | top: [' data':-0.000, ' files':-11.000, ' and':-11.125]
' over'            (-0.011) | top: [' over':-0.011, ',':-5.136, ' between':-5.636]
' the'             (-0.000) | top: [' the':-0.000, ' a':-11.750, ' internet':-13.750]
' internet'        (-0.000) | top: [' internet':-0.000, ' web':-8.250, ' Internet':-9.125]
'.'                (-0.434) | top: ['.':-0.434, ',':-1.059, '.\n\n':-5.184]
' The'             (-0.001) | top: [' The':-0.001, ' However':-7.001, ' While':-9.251]
' main'            (-0.013) | top: [' main':-0.013, ' primary':-4.638, ' key':-6.263]
' difference'      (-0.000) | top: [' difference':-0.000, ' differences':-8.375, 'difference':-12.500]
' between'         (-0.000) | top: [' between':-0.000, ' is':-8.000, ' lies':-10.750]
' the'             (-0.694) | top: [' the':-0.694, ' them':-0.694, ' HTTP':-7.819]
' two'             (-0.000) | top: [' two':-0.000, 'two':-15.625, ' Two':-21.750]
' is'              (-0.004) | top: [' is':-0.004, ' lies':-5.629, ' protocols':-9.879]
' the'             (-0.117) | top: [' the':-0.117, ' that':-2.242, ' how':-5.742]
' level'           (-0.038) | top: [' level':-0.038, ' way':-3.413, ' security':-6.163]
' of'              ( 0.000) | top: [' of':0.000, ' security':-19.000, ' or':-19.375]
' security'        (-0.016) | top: [' security':-0.016, ' encryption':-4.141, 'security':-11.141]
' and'             (-0.006) | top: [' and':-0.006, ' they':-5.131, ' provided':-8.131]
' encryption'      (-0.005) | top: [' encryption':-0.005, ' authentication':-6.755, ' protection':-7.005]
' used'            (-0.093) | top: [' used':-0.093, ' they':-2.843, ' provided':-3.968]
' to'              (-0.020) | top: [' to':-0.020, '.\n\n':-4.145, ':\n\n':-6.145]
' protect'         (-0.054) | top: [' protect':-0.054, ' transmit':-4.054, ' secure':-4.179]
' the'             (-0.001) | top: [' the':-0.001, ' data':-6.876, ' user':-8.501]
' data'            (-0.009) | top: [' data':-0.009, ' communication':-4.884, ' transmission':-7.259]
' being'           (-0.333) | top: [' being':-0.333, '.\n\n':-1.833, ' in':-2.208]
' transmitted'     (-0.253) | top: [' transmitted':-0.253, ' transferred':-1.503, ' sent':-7.503]
'.\n\n'            (-0.000) | top: ['.\n\n':-0.000, ':\n\n':-7.875, ' between':-12.125]
'**'               (-0.316) | top: ['**':-0.316, 'HTTP':-1.316, 'Here':-6.316]
'HTTP'             (-0.000) | top: ['HTTP':-0.000, 'HTTPS':-9.250, 'What':-10.375]
':'                (-0.603) | top: [':':-0.603, ' (':-1.603, '**':-1.978]
'**\n\n'           (-0.001) | top: ['**\n\n':-0.001, '**\n':-7.376, ' (':-13.626]
'HTTP'             (-0.436) | top: ['HTTP':-0.436, '*':-1.061, '1':-5.186]
' is'              (-0.000) | top: [' is':-0.000, ',':-10.750, ' (':-11.125]
' the'             (-0.744) | top: [' the':-0.744, ' an':-0.994, ' a':-1.869]
' original'        (-0.547) | top: [' original':-0.547, ' standard':-1.547, ' older':-2.422]
' protocol'        (-0.012) | top: [' protocol':-0.012, ' and':-4.637, ',':-7.262]
' used'            (-0.038) | top: [' used':-0.038, ' for':-3.413, ' developed':-6.163]
' for'             (-0.008) | top: [' for':-0.008, ' to':-4.883, ' by':-10.508]
' transferring'    (-0.038) | top: [' transferring':-0.038, ' transmitting':-3.663, ' exchanging':-5.538]
' data'            (-0.010) | top: [' data':-0.010, ' web':-5.135, ' hyp':-5.760]
' over'            (-0.003) | top: [' over':-0.003, ' on':-6.378, ' between':-6.503]
6. Suggest a short book to read on a rainy day.
'A'                (-0.121) | top: ['A':-0.121, 'What':-2.371, 'Perfect':-4.746]
' rainy'           (-0.481) | top: [' rainy':-0.481, ' perfect':-1.106, ' cozy':-3.481]
' day'             (-0.000) | top: [' day':-0.000, ' days':-14.625, 'day':-15.125]
' is'              (-0.002) | top: [' is':-0.002, '!':-7.502, ' calls':-7.752]
' the'             (-0.001) | top: [' the':-0.001, ' a':-7.376, ' perfect':-8.751]
' perfect'         (-0.001) | top: [' perfect':-0.001, ' pur':-7.626, 'perfect':-10.251]
' excuse'          (-0.000) | top: [' excuse':-0.000, ' opportunity':-10.625, ' time':-10.875]
' to'              (-0.000) | top: [' to':-0.000, ' for':-13.125, ' stay':-19.750]
' curl'            (-0.428) | top: [' curl':-0.428, ' cozy':-1.303, ' sn':-3.303]
' up'              (-0.000) | top: [' up':-0.000, '-up':-16.125, ' Up':-19.875]
' with'            (-0.000) | top: [' with':-0.000, ' and':-14.875, 'with':-18.750]
' a'               ( 0.000) | top: [' a':0.000, ' an':-19.250, ' some':-20.375]
' good'            (-0.009) | top: [' good':-0.009, ' great':-4.759, ' book':-9.884]
' book'            (-0.000) | top: [' book':-0.000, 'book':-15.625, ' read':-15.625]
'!'                (-0.211) | top: ['!':-0.211, '!\n\n':-1.711, '.':-4.711]
' Here'            (-0.013) | top: [' Here':-0.013, ' I':-4.513, ' Considering':-7.388]
"'s"               (-0.313) | top: ["'s":-0.313, ' are':-1.313, ' is':-12.563]
' a'               ( 0.000) | top: [' a':0.000, ' my':-17.125, ' some':-17.750]
' suggestion'      (-0.170) | top: [' suggestion':-0.170, ' short':-1.920, ' recommendation':-5.045]
' for'             (-0.696) | top: [' for':-0.696, ':\n\n':-0.696, ' that':-6.071]
' a'               (-0.000) | top: [' a':-0.000, ' you':-12.625, ' an':-16.750]
' short'           (-0.009) | top: [' short':-0.009, ' delightful':-5.134, ' cozy':-6.634]
' and'             (-0.070) | top: [' and':-0.070, ' but':-3.820, ',':-3.945]
' cozy'            (-0.413) | top: [' cozy':-0.413, ' delightful':-1.288, ' engaging':-3.788]
' read'            (-0.049) | top: [' read':-0.049, ' book':-3.049, ' novel':-7.924]
':\n\n'            (-0.102) | top: [':\n\n':-0.102, ' that':-2.352, ' to':-6.352]
'"The'             (-0.535) | top: ['"The':-0.535, '**':-0.910, '"':-5.035]
' Little'          (-0.156) | top: [' Little':-0.156, ' Night':-3.031, ' Snow':-3.531]
' Paris'           (-0.067) | top: [' Paris':-0.067, ' Prince':-2.817, ' Book':-6.192]
' Book'            (-0.000) | top: [' Book':-0.000, 'Book':-10.750, ' Books':-12.375]
'shop'             (-0.000) | top: ['shop':-0.000, 'store':-12.500, 'Shop':-13.500]
'"'                (-0.000) | top: ['"':-0.000, ':':-11.125, ' by':-13.750]
' by'              (-0.000) | top: [' by':-0.000, ' (':-14.000, 'by':-14.625]
' Nina'            (-0.001) | top: [' Nina':-0.001, ' Natalie':-8.563, ' Nicholas':-9.376]
' George'          (-0.001) | top: [' George':-0.001, ' Georges':-7.501, 'George':-8.251]
' ('               (-0.264) | top: [' (':-0.264, '\n\n':-1.764, ':\n\n':-3.139]
'224'              (-1.301) | top: ['224':-1.301, '272':-1.426, '192':-1.426]
' pages'           (-0.000) | top: [' pages':-0.000, 'pages':-11.500, ' Pages':-12.000]
')\n\n'            (-0.013) | top: [')\n\n':-0.013, '):\n\n':-4.638, ',':-6.013]
'This'             (-0.001) | top: ['This':-0.001, 'Imagine':-8.501, 'Set':-8.501]
' charming'        (-0.180) | top: [' charming':-0.180, ' novel':-2.680, ' delightful':-3.430]
' novel'           (-0.012) | top: [' novel':-0.012, ' nov':-4.512, ' and':-7.637]
' tells'           (-0.096) | top: [' tells':-0.096, ' is':-2.971, ' follows':-3.221]
' the'             ( 0.000) | top: [' the':0.000, ' a':-20.000, ' story':-22.375]
' story'           (-0.000) | top: [' story':-0.000, ' tale':-11.875, ' Story':-12.375]
' of'              ( 0.000) | top: [' of':0.000, 'of':-18.625, ' Mons':-20.625]
' Jean'            (-0.035) | top: [' Jean':-0.035, ' Mons':-3.535, ' a':-6.660]
' Per'             (-0.007) | top: [' Per':-0.007, 'Per':-6.132, '-P':-6.882]
'du'               (-0.009) | top: ['du':-0.009, 'uvian':-6.572, 'rot':-6.572]
','                ( 0.000) | top: [',':0.000, "'s":-17.750, ' who':-18.625]
' a'               (-0.002) | top: [' a':-0.002, ' the':-6.252, ' who':-10.627]
' books'           (-0.734) | top: [' books':-0.734, ' gr':-1.734, ' melanch':-1.984]
'eller'            (-0.000) | top: ['eller':-0.000, 'ellers':-8.750, 'elling':-9.375]
' who'             (-0.022) | top: [' who':-0.022, ' in':-3.897, ' and':-7.272]
' has'             (-0.365) | top: [' has':-0.365, ' owns':-2.365, ' sets':-2.490]
' been'            (-0.081) | top: [' been':-0.081, ' lost':-3.581, ' spent':-3.831]
' stuck'           (-0.233) | top: [' stuck':-0.233, ' unable':-2.545, ' wandering':-3.795]
' in'              (-0.034) | top: [' in':-0.034, ' on':-3.409, ' at':-9.159]
' his'             (-0.769) | top: [' his':-0.769, ' a':-0.894, ' the':-2.269]
' grief'           (-1.097) | top: [' grief':-1.097, ' ways':-1.347, ' life':-1.847]
' for'             (-1.154) | top: [' for':-1.154, ' over':-1.404, ' after':-1.529]
' years'           (-0.447) | top: [' years':-0.447, ' ':-1.947, ' decades':-2.197]
' after'           (-0.523) | top: [' after':-0.523, ' since':-1.523, '.':-1.773]
' a'               (-1.077) | top: [' a':-1.077, ' the':-1.077, ' his':-1.202]
7. 2+2=?
'The'              (-0.340) | top: ['The':-0.340, '2':-1.965, '4':-2.090]
' answer'          (-0.002) | top: [' answer':-0.002, ' correct':-6.127, ' easy':-11.377]
' is'              (-0.252) | top: [' is':-0.252, ' to':-1.502, ',':-15.002]
' '                (-0.465) | top: [' ':-0.465, '...':-1.215, ':':-2.715]
'4'                (-0.000) | top: ['4':-0.000, '2':-15.250, '4':-16.250]
'!'                (-0.092) | top: ['!':-0.092, '.':-2.467, '<|eot_id|>':-5.842]
'<|eot_id|>'       (-0.000) | top: ['<|eot_id|>':-0.000, '':-9.125, '':-9.563]

Next Steps

  • Fuse draft logits gather and two softmaxes (draft and target probs).
  • Remove strict rejection sampling and make probabilistic the default.

@mergify mergify bot added the v1 label Mar 16, 2026
@TheEpicDolphin TheEpicDolphin force-pushed the gdelfin/mrv2-spec-decode-rejection-sample-logprobs branch 2 times, most recently from b14c905 to d3671ed Compare March 16, 2026 23:20
@TheEpicDolphin TheEpicDolphin marked this pull request as ready for review March 16, 2026 23:20
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces logprobs support for speculative decoding with rejection sampling. The changes primarily involve a significant refactoring of the probabilistic rejection sampling implementation, including new and modified Triton kernels to support greedy sampling and efficiently gather draft logits. A new kernel is also added to flatten sampled tokens for logprob computation. While the overall approach is sound, I've identified a critical correctness issue where draft logits are not temperature-scaled, and a high-severity bug related to tensor initialization that could result in incorrect token IDs.

@TheEpicDolphin TheEpicDolphin force-pushed the gdelfin/mrv2-spec-decode-rejection-sample-logprobs branch 3 times, most recently from f862803 to 79a48b3 Compare March 17, 2026 00:03
@WoosukKwon WoosukKwon added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 18, 2026
Copy link
Copy Markdown
Collaborator

@WoosukKwon WoosukKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

@WoosukKwon
Copy link
Copy Markdown
Collaborator

@TheEpicDolphin Can you please rebase? Sorry for the delay!

Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>
@TheEpicDolphin TheEpicDolphin force-pushed the gdelfin/mrv2-spec-decode-rejection-sample-logprobs branch from 79a48b3 to d9a477f Compare March 18, 2026 23:11
@WoosukKwon WoosukKwon enabled auto-merge (squash) March 18, 2026 23:55
@WoosukKwon WoosukKwon merged commit 053f3b6 into vllm-project:main Mar 19, 2026
60 checks passed
fxdawnn pushed a commit to fxdawnn/vllm that referenced this pull request Mar 19, 2026
@TheEpicDolphin TheEpicDolphin deleted the gdelfin/mrv2-spec-decode-rejection-sample-logprobs branch March 19, 2026 17:40
SouthWest7 pushed a commit to SouthWest7/vllm that referenced this pull request Mar 27, 2026
khairulkabir1661 pushed a commit to khairulkabir1661/vllm that referenced this pull request Mar 27, 2026
Monishver11 pushed a commit to Monishver11/vllm that referenced this pull request Mar 27, 2026
…m-project#37237)

Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>
Signed-off-by: Monishver Chandrasekaran <monishverchandrasekaran@gmail.com>
JiantaoXu pushed a commit to JiantaoXu/vllm that referenced this pull request Mar 28, 2026
vrdn-23 pushed a commit to vrdn-23/vllm that referenced this pull request Mar 30, 2026
…m-project#37237)

Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>
Signed-off-by: Vinay Damodaran <vrdn@hey.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants