[Model Runner V2] Spec decode rejection sampler logprobs support by TheEpicDolphin · Pull Request #37237 · vllm-project/vllm

TheEpicDolphin · 2026-03-16T23:12:57Z

Purpose

Following up on #35461, specifically with the logprobs support.

In order to get the top logprobs using compute_topk_logprobs, I need the sampled token ids shape to be [num_logits]. For strict rejection sampling, this is easy because the Sampler already returns the ground-truth target token ids. But in probabilistic rejection sampling, we don't sample target token ids (it would be a waste of compute). So we need to get the output sampled token ids first, and then flatten them from [num_reqs, num_speculative_steps + 1] => [num_logits]. I do this using a simple _flatten_sampled_kernel. This works, and allows us to get the top logprobs.

Testing

Served 8 requests (temperature = 0) concurrently. Below are the prompts and responses including the top-3 logprobs:

0. Explain the theory of relativity in simple terms.

'The'              (-0.107) | top: ['The':-0.107, 'A':-2.482, 'Albert':-4.357]
' theory'          (-0.003) | top: [' theory':-0.003, ' Theory':-6.003, ' famous':-9.128]
' of'              ( 0.000) | top: [' of':0.000, '!':-17.875, '!\n\n':-20.500]
' rel'             (-0.000) | top: [' rel':-0.000, ' special':-13.375, ' relative':-14.625]
'ativity'          (-0.000) | top: ['ativity':-0.000, 'avity':-15.750, 'atility':-16.375]
'!'                (-0.235) | top: ['!':-0.235, ',':-1.985, ' is':-2.860]
' It'              (-0.430) | top: [' It':-0.430, ' One':-1.555, ' Albert':-2.930]
"'s"               (-0.004) | top: ["'s":-0.004, ' can':-6.129, ' may':-6.629]
' a'               (-0.023) | top: [' a':-0.023, ' actually':-4.023, ' one':-6.148]
' mind'            (-0.481) | top: [' mind':-0.481, ' complex':-1.981, ' big':-1.981]
'-b'               (-0.001) | top: ['-b':-0.001, '-st':-7.501, '-bl':-8.501]
'ending'           (-0.253) | top: ['ending':-0.253, 'low':-1.503, 'ender':-7.003]
' concept'         (-0.026) | top: [' concept':-0.026, ' idea':-3.776, ' topic':-6.151]
','                (-0.577) | top: [',':-0.577, ' that':-0.827, ' developed':-6.702]
' but'             (-0.000) | top: [' but':-0.000, ' even':-11.000, ' isn':-11.750]
' I'               (-0.080) | top: [' I':-0.080, ' don':-2.580, ' fear':-7.205]
"'ll"              (-0.008) | top: ["'ll":-0.008, "'d":-5.383, "'m":-5.633]
' try'             (-0.010) | top: [' try':-0.010, ' do':-5.260, ' break':-5.385]
' to'              (-0.000) | top: [' to':-0.000, ' my':-13.250, ' break':-17.750]
' break'           (-0.203) | top: [' break':-0.203, ' simplify':-1.828, ' explain':-3.828]
' it'              ( 0.000) | top: [' it':0.000, 'it':-21.250, ' down':-21.625]
' down'            ( 0.000) | top: [' down':0.000, 'down':-19.750, '.down':-20.375]
' in'              (-0.015) | top: [' in':-0.015, ' simply':-4.640, ' into':-5.765]
' simple'          (-0.002) | top: [' simple':-0.002, ' a':-6.627, ' super':-8.377]
' terms'           (-0.000) | top: [' terms':-0.000, ' language':-9.250, ' words':-11.250]
'.\n\n'            (-0.038) | top: ['.\n\n':-0.038, '\n\n':-3.913, ':\n\n':-4.788]
'**'               (-0.262) | top: ['**':-0.262, 'The':-1.637, 'Albert':-4.012]
'What'             (-0.142) | top: ['What':-0.142, 'The':-2.392, 'Theory':-4.142]
' is'              (-0.013) | top: [' is':-0.013, "'s":-4.388, ' are':-8.638]
' the'             (-0.328) | top: [' the':-0.328, ' rel':-1.328, ' time':-5.203]
' theory'          (-0.018) | top: [' theory':-0.018, ' Theory':-4.018, 'theory':-12.018]
' of'              (-0.000) | top: [' of':-0.000, '?':-15.000, ' about':-20.125]
' rel'             (-0.000) | top: [' rel':-0.000, ' relativ':-14.750, ' Rel':-16.250]
'ativity'          (-0.000) | top: ['ativity':-0.000, 'atility':-13.000, 'inity':-13.250]
'?'                (-0.000) | top: ['?':-0.000, '?"\n\n':-9.875, ' all':-11.625]
'**\n\n'           (-0.018) | top: ['**\n\n':-0.018, '**\n':-4.018, '**':-13.018]
'The'              (-0.064) | top: ['The':-0.064, 'Albert':-3.564, 'In':-3.564]
' theory'          (-0.000) | top: [' theory':-0.000, 'ory':-11.625, ' Theory':-11.750]
' of'              (-0.000) | top: [' of':-0.000, ' was':-10.125, ',':-10.750]
' rel'             (-0.000) | top: [' rel':-0.000, ' special':-14.000, ' relativ':-15.375]
'ativity'          (-0.000) | top: ['ativity':-0.000, 'avity':-16.375, 'ality':-17.875]
','                (-0.295) | top: [',':-0.295, ' is':-1.545, ' was':-3.170]
' developed'       (-0.325) | top: [' developed':-0.325, ' proposed':-1.450, ' introduced':-3.325]
' by'              ( 0.000) | top: [' by':0.000, ' in':-18.750, 'by':-18.875]
' Albert'          (-0.000) | top: [' Albert':-0.000, 'Albert':-8.250, ' physicist':-9.750]
' Einstein'        ( 0.000) | top: [' Einstein':0.000, ' Ein':-18.375, 'E':-20.750]
','                (-0.043) | top: [',':-0.043, ' in':-3.168, ' (':-12.293]
' is'              (-0.095) | top: [' is':-0.095, ' explains':-3.845, ' says':-3.970]
' a'               (-0.042) | top: [' a':-0.042, ' an':-3.292, ' about':-6.167]
' way'             (-0.064) | top: [' way':-0.064, ' fundamental':-3.064, ' concept':-5.189]
' of'              (-0.313) | top: [' of':-0.313, ' to':-1.313, ' that':-14.563]
' understanding'   (-0.002) | top: [' understanding':-0.002, ' thinking':-6.627, ' explaining':-7.377]
' how'             (-0.011) | top: [' how':-0.011, ' space':-5.136, ' the':-5.511]
' the'             (-0.027) | top: [' the':-0.027, ' space':-3.777, ' time':-5.652]
' universe'        (-0.000) | top: [' universe':-0.000, ' Universe':-9.250, ' world':-9.250]
' works'           (-0.003) | top: [' works':-0.003, ' behaves':-6.128, ' and':-8.378]
'.'                (-0.026) | top: ['.':-0.026, ',':-4.401, ' and':-4.651]
' It'              (-0.010) | top: [' It':-0.010, ' He':-5.510, ' There':-5.760]
"'s"               (-0.221) | top: ["'s":-0.221, ' says':-1.846, ' shows':-4.971]
' based'           (-0.368) | top: [' based':-0.368, ' a':-2.368, ' about':-2.868]
' on'              (-0.000) | top: [' on':-0.000, ' around':-15.500, ' two':-17.500]
' two'             (-0.009) | top: [' two':-0.009, ' the':-4.759, ' a':-7.634]
' main'            (-0.001) | top: [' main':-0.001, ' simple':-7.876, ' key':-8.751]
' ideas'           (-0.012) | top: [' ideas':-0.012, ' principles':-4.887, ' concepts':-6.012]

1. What is the capital of France?

'The'              (-0.059) | top: ['The':-0.059, 'That':-2.934, 'Easy':-6.309]
' capital'         (-0.000) | top: [' capital':-0.000, ' answer':-11.125, 'capital':-14.125]
' of'              (-0.000) | top: [' of':-0.000, ' city':-14.750, ' and':-18.750]
' France'          (-0.000) | top: [' France':-0.000, 'France':-14.375, ' Franc':-18.250]
' is'              ( 0.000) | top: [' is':0.000, ' was':-18.375, ' adalah':-18.625]
' Paris'           (-0.000) | top: [' Paris':-0.000, 'Paris':-9.625, ' PAR':-12.125]
'.'                (-0.209) | top: ['.':-0.209, '!':-1.709, ' (':-5.084]
'<|eot_id|>'       (-0.000) | top: ['<|eot_id|>':-0.000, ' It':-11.125, ' Paris':-11.750]

2. Write a haiku about coding.

'Here'             (-0.174) | top: ['Here':-0.174, 'Lines':-2.674, 'Code':-3.174]
' is'              (-0.034) | top: [' is':-0.034, "'s":-3.409, ''s':-12.159]
' a'               (-0.000) | top: [' a':-0.000, ' ha':-12.000, ' one':-16.750]
' ha'              (-0.000) | top: [' ha':-0.000, ' short':-9.750, ' Ha':-12.375]
'iku'              ( 0.000) | top: ['iku':0.000, 'ika':-17.500, 'ibu':-18.125]
' about'           (-0.000) | top: [' about':-0.000, ':\n\n':-14.375, 'about':-17.875]
' coding'          (-0.000) | top: [' coding':-0.000, 'coding':-14.625, ' code':-15.000]
':\n\n'            (-0.000) | top: [':\n\n':-0.000, ':\n\n\n':-10.250, ':\n':-12.625]
'Lines'            (-0.481) | top: ['Lines':-0.481, 'Code':-1.231, ' Lines':-3.731]
' of'              (-0.001) | top: [' of':-0.001, ' dance':-7.876, ' and':-8.251]
' code'            (-0.001) | top: [' code':-0.001, ' logic':-8.501, ' ones':-9.126]
' unfold'          (-0.522) | top: [' unfold':-0.522, ' flow':-1.397, ' dance':-2.647]
'\n'               (-0.000) | top: ['\n':-0.000, ',\n':-11.000, '\r\n':-12.875]
'Logic'            (-1.358) | top: ['Logic':-1.358, 'Bug':-2.108, 'Mean':-2.233]
' flows'           (-1.228) | top: [' flows':-1.228, "'s":-1.353, ' and':-1.478]
' like'            (-0.424) | top: [' like':-0.424, ',':-1.174, ' from':-3.549]
' digital'         (-1.392) | top: [' digital':-1.392, ' rivers':-1.642, ' a':-2.017]
' streams'         (-1.095) | top: [' streams':-1.095, ' stream':-1.220, ' rain':-2.220]
'\n'               (-0.000) | top: ['\n':-0.000, '<|eot_id|>':-8.250, ',\n':-12.688]
'Beauty'           (-0.776) | top: ['Beauty':-0.776, 'Creation':-1.901, 'Art':-2.526]
' in'              (-0.001) | top: [' in':-0.001, ' is':-8.751, ' born':-9.001]
' the'             (-0.463) | top: [' the':-0.463, ' bits':-2.651, ' bytes':-2.776]
' bug'             (-0.360) | top: [' bug':-0.360, ' bugs':-1.485, ' code':-3.610]
'<|eot_id|>'       (-0.001) | top: ['<|eot_id|>':-0.001, '-free':-7.439, 'free':-9.751]

3. List three benefits of regular exercise.

'Here'             (-0.000) | top: ['Here':-0.000, 'Regular':-9.750, 'A':-10.500]
' are'             (-0.000) | top: [' are':-0.000, ' three':-13.250, ' Are':-15.625]
' three'           (-0.000) | top: [' three':-0.000, 'three':-15.250, ' the':-16.125]
' benefits'        (-0.000) | top: [' benefits':-0.000, ' significant':-11.375, ' Benefits':-13.750]
' of'              ( 0.000) | top: [' of':0.000, ' to':-24.000, ' or':-24.125]
' regular'         (-0.000) | top: [' regular':-0.000, 'regular':-16.375, ' regularly':-16.500]
' exercise'        (-0.000) | top: [' exercise':-0.000, ' Exercise':-14.375, ' exercises':-14.750]
':\n\n'            (-0.000) | top: [':\n\n':-0.000, ':\r\n\r\n':-14.250, ':\n\n\n':-14.625]
'1'                (-0.000) | top: ['1':-0.000, '**':-13.500, ' **':-19.875]
'.'                ( 0.000) | top: ['.':0.000, '.?':-18.438, '.\n\n':-18.438]
' **'              (-0.000) | top: [' **':-0.000, ' Impro':-12.125, ' Improved':-12.500]
'Impro'            (-0.410) | top: ['Impro':-0.410, 'Improved':-1.410, 'Weight':-2.410]
'ves'              (-0.001) | top: ['ves':-0.000, 'vements':-8.750, 'ving':-9.125]
' Physical'        (-0.462) | top: [' Physical':-0.462, ' Cardio':-1.337, ' Mental':-2.837]
' Health'          (-0.000) | top: [' Health':-0.000, ' and':-10.250, ' Heath':-10.625]
'**:'              (-0.000) | top: ['**:':-0.000, ':**':-11.750, ' and':-15.250]
' Regular'         (-0.000) | top: [' Regular':-0.000, ' Exercise':-7.875, 'Regular':-10.000]
' exercise'        (-0.000) | top: [' exercise':-0.000, ' physical':-12.250, ' Exercise':-12.875]
' can'             (-0.163) | top: [' can':-0.163, ' helps':-1.913, ' has':-6.163]
' help'            (-0.000) | top: [' help':-0.000, ' improve':-8.625, ' reduce':-9.750]
' to'              (-0.671) | top: [' to':-0.671, ' reduce':-1.671, ' you':-2.421]
' reduce'          (-0.834) | top: [' reduce':-0.834, ' maintain':-1.709, ' improve':-1.834]
' the'             (-0.023) | top: [' the':-0.023, ' your':-3.773, ' risk':-13.398]
' risk'            (-0.000) | top: [' risk':-0.000, ' risks':-11.375, 'risk':-12.750]
' of'              (-0.000) | top: [' of':-0.000, ' or':-14.750, ' factors':-18.500]
' chronic'         (-0.004) | top: [' chronic':-0.004, ' developing':-5.629, ' many':-8.004]
' diseases'        (-0.000) | top: [' diseases':-0.000, ' illnesses':-10.125, ' health':-12.375]
','                (-0.576) | top: [',':-0.576, ' such':-0.826, ' like':-8.451]
' such'            (-0.000) | top: [' such':-0.000, 'such':-14.875, ' including':-14.875]
' as'              ( 0.000) | top: [' as':0.000, ' heart':-19.625, ' obesity':-20.500]
' heart'           (-0.000) | top: [' heart':-0.000, ' diabetes':-10.000, ' cardiovascular':-10.375]
' disease'         (-0.000) | top: [' disease':-0.000, ' Disease':-14.500, ' diseases':-15.125]
','                (-0.000) | top: [',':-0.000, ' and':-14.250, ',':-22.000]
' diabetes'        (-0.363) | top: [' diabetes':-0.363, ' type':-1.238, ' stroke':-4.238]
','                (-0.000) | top: [',':-0.000, ' and':-13.750, ' mell':-16.000]
' and'             (-0.000) | top: [' and':-0.000, ' obesity':-13.375, ' some':-13.500]
' some'            (-0.069) | top: [' some':-0.069, ' certain':-2.819, ' obesity':-4.944]
' cancers'         (-0.281) | top: [' cancers':-0.281, ' types':-1.406, ' forms':-8.531]
'.'                (-0.026) | top: ['.':-0.026, ',':-3.651, ' by':-9.151]
' It'              (-0.694) | top: [' It':-0.694, ' Exercise':-0.694, ' Regular':-8.319]
' can'             (-0.002) | top: [' can':-0.002, ' also':-6.502, 'can':-14.627]
' also'            (-0.000) | top: [' also':-0.000, 'also':-13.875, ' lower':-13.875]
' help'            (-0.489) | top: [' help':-0.489, ' improve':-0.989, ' lower':-4.364]
' to'              (-0.001) | top: [' to':-0.001, ' with':-6.626, ' manage':-10.376]
' manage'          (-0.909) | top: [' manage':-0.909, ' improve':-1.284, ' lower':-1.909]
' weight'          (-0.138) | top: [' weight':-0.138, ' existing':-3.138, ' conditions':-3.638]
','                (-0.000) | top: [',':-0.000, ' loss':-9.625, ' and':-10.000]
' improve'         (-0.092) | top: [' improve':-0.092, ' boost':-3.217, ' lower':-3.842]
' sleep'           (-0.627) | top: [' sleep':-0.627, ' blood':-1.002, ' bone':-3.377]
' quality'         (-0.143) | top: [' quality':-0.143, ',':-2.018, ' patterns':-7.893]
','                (-0.000) | top: [',':-0.000, ' and':-13.000, ',':-23.375]
' and'             (-0.000) | top: [' and':-0.000, ' boost':-14.750, ' increase':-15.375]
' boost'           (-0.403) | top: [' boost':-0.403, ' increase':-1.278, ' reduce':-3.528]
' overall'         (-0.346) | top: [' overall':-0.346, ' immune':-1.971, ' the':-2.971]
' physical'        (-0.007) | top: [' physical':-0.007, ' energy':-5.257, ' fitness':-7.507]
' health'          (-1.039) | top: [' health':-1.039, ' fitness':-1.039, ' function':-1.414]
'.\n'              (-0.013) | top: ['.\n':-0.013, ' and':-4.388, '.\n\n':-8.263]
'2'                ( 0.000) | top: ['2':0.000, '3':-18.000, '۲':-21.125]
'.'                ( 0.000) | top: ['.':0.000, ',':-20.500, '.\n':-22.938]
' **'              ( 0.000) | top: [' **':0.000, '**':-19.250, ' **\n':-19.500]
'Enh'              (-0.671) | top: ['Enh':-0.671, 'Boost':-0.796, 'Red':-4.296]
'ances'            (-0.000) | top: ['ances':-0.000, 'anced':-13.375, 'ases':-14.000]
' Mental'          (-0.000) | top: [' Mental':-0.000, ' Cognitive':-10.000, ' Mood':-10.625]
' Well'            (-0.143) | top: [' Well':-0.143, ' Health':-2.018, ' Wellness':-8.518]

4. How does a refrigerator keep food cold?

'A'                (-0.229) | top: ['A':-0.229, 'Re':-1.604, 'The':-5.854]
' refrigerator'    (-0.001) | top: [' refrigerator':-0.001, ' refriger':-8.001, ' fridge':-9.001]
' keeps'           (-0.074) | top: [' keeps':-0.074, ' is':-3.449, ',':-3.699]
' food'            (-0.000) | top: [' food':-0.000, ' your':-10.125, ' our':-15.125]
' cold'            (-0.000) | top: [' cold':-0.000, ' cool':-11.500, ' and':-11.500]
' by'              (-0.441) | top: [' by':-0.441, ' through':-1.066, ' using':-4.441]
' using'           (-0.004) | top: [' using':-0.004, ' utilizing':-5.879, ' transferring':-7.379]
' a'               (-0.005) | top: [' a':-0.005, ' refriger':-5.755, ' the':-6.880]
' combination'     (-0.071) | top: [' combination':-0.071, ' refriger':-3.321, ' process':-3.821]
' of'              ( 0.000) | top: [' of':0.000, 'of':-19.125, ' or':-19.750]
' several'         (-1.368) | top: [' several':-1.368, ' technologies':-1.368, ' principles':-2.118]
' technologies'    (-0.639) | top: [' technologies':-0.639, ' components':-1.389, ' mechanisms':-2.764]
' and'             (-0.526) | top: [' and':-0.526, ' to':-0.901, ' that':-6.151]
' principles'      (-0.450) | top: [' principles':-0.450, ' mechanisms':-1.575, ' processes':-2.700]
' to'              (-0.288) | top: [' to':-0.288, '.':-1.413, ' of':-5.163]
' remove'          (-0.689) | top: [' remove':-0.689, ' transfer':-1.439, ' maintain':-1.939]
' heat'            (-0.000) | top: [' heat':-0.000, ' and':-8.750, ' warmth':-9.875]
' from'            (-0.005) | top: [' from':-0.005, ' and':-5.255, ' energy':-9.755]
' the'             (-0.002) | top: [' the':-0.002, ' its':-6.502, ' inside':-9.002]
' interior'        (-0.133) | top: [' interior':-0.133, ' inside':-2.383, ' compartment':-4.258]
' of'              (-0.083) | top: [' of':-0.083, ' and':-3.083, ' compartment':-3.458]
' the'             ( 0.000) | top: [' the':0.000, 'the':-19.125, ' a':-19.125]
' appliance'       (-0.372) | top: [' appliance':-0.372, ' fridge':-1.497, ' unit':-3.247]
' and'             (-0.049) | top: [' and':-0.049, '.':-3.049, '.\n\n':-10.299]
' transfer'        (-0.302) | top: [' transfer':-0.302, ' maintain':-1.427, ' replace':-5.427]
' it'              (-0.000) | top: [' it':-0.000, ' that':-8.875, ' the':-12.250]
' outside'         (-0.576) | top: [' outside':-0.576, ' to':-0.826, ' outdoors':-9.201]
'.'                (-0.011) | top: ['.':-0.011, ',':-5.011, ' to':-5.761]
' Here'            (-0.014) | top: [' Here':-0.014, ' The':-4.264, ' This':-9.264]
"'s"               (-0.005) | top: ["'s":-0.005, ' are':-5.380, ' is':-12.255]
' a'               (-0.003) | top: [' a':-0.003, ' how':-6.003, ' the':-9.003]
' simplified'      (-0.145) | top: [' simplified':-0.145, ' step':-2.395, ' breakdown':-3.270]
' explanation'     (-0.044) | top: [' explanation':-0.044, ' overview':-3.169, ' breakdown':-6.794]
' of'              (-0.252) | top: [' of':-0.252, ':\n\n':-1.502, ':\n':-14.377]
' the'             (-0.160) | top: [' the':-0.160, ' how':-1.910, ' some':-14.285]
' process'         (-0.304) | top: [' process':-0.304, ' main':-1.429, ' key':-4.304]
':\n\n'            (-0.000) | top: [':\n\n':-0.000, ':\n':-14.500, ':\r\n\r\n':-16.375]
'1'                (-0.021) | top: ['1':-0.021, '**':-3.896, ' **':-14.771]
'.'                (-0.000) | top: ['.':-0.000, '.?':-16.625, '️':-18.688]
' **'              (-0.007) | top: [' **':-0.007, ' Cooling':-6.007, ' Com':-6.757]
'Cool'             (-1.098) | top: ['Cool':-1.098, 'Compression':-1.848, 'Re':-1.848]
'ing'              (-0.349) | top: ['ing':-0.349, 'ant':-1.224, 'ants':-7.224]
' System'          (-0.914) | top: [' System':-0.914, ' Cycle':-1.914, ' system':-2.164]
'**:'              (-0.160) | top: ['**:':-0.160, ':**':-1.910, ':':-8.785]
' The'             (-0.089) | top: [' The':-0.089, ' A':-3.089, ' Most':-3.964]
' refrigerator'    (-0.039) | top: [' refrigerator':-0.039, ' refriger':-3.914, ' heart':-5.039]
' has'             (-0.289) | top: [' has':-0.289, ' contains':-1.664, ' uses':-3.039]
' a'               (-0.001) | top: [' a':-0.001, ' two':-7.501, ' an':-7.751]
' cooling'         (-0.170) | top: [' cooling':-0.170, ' refriger':-2.545, ' built':-3.045]
' system'          (-0.000) | top: [' system':-0.000, ' unit':-10.750, ' coil':-11.250]
' that'            (-0.062) | top: [' that':-0.062, ',':-3.312, ' consisting':-3.937]
' uses'            (-1.068) | top: [' uses':-1.068, ' consists':-1.068, ' circ':-1.693]
' a'               (-0.038) | top: [' a':-0.038, ' refriger':-3.288, ' the':-9.413]
' refriger'        (-0.049) | top: [' refriger':-0.049, ' liquid':-3.674, ' type':-4.424]
'ant'              (-0.000) | top: ['ant':-0.000, 'ation':-10.625, 'ated':-15.000]
','                (-0.173) | top: [',':-0.173, ' (':-2.048, ' to':-3.673]
' such'            (-0.663) | top: [' such':-0.663, ' a':-1.288, ' which':-1.913]
' as'              (-0.000) | top: [' as':-0.000, ' a':-12.375, ' us':-19.313]
' Fre'             (-0.326) | top: [' Fre':-0.326, ' fre':-1.450, ' refriger':-3.950]
'on'               (-0.000) | top: ['on':-0.000, 'ón':-14.375, 'ON':-15.750]
','                (-0.193) | top: [',':-0.193, ' or':-2.318, ' (':-2.568]
' to'              (-0.165) | top: [' to':-0.165, ' which':-1.915, ' that':-5.790]
' absorb'          (-0.048) | top: [' absorb':-0.048, ' transfer':-3.548, ' circ':-4.673]
' heat'            (-0.004) | top: [' heat':-0.004, ' and':-5.504, ' the':-10.879]

5. What is the difference between HTTP and HTTPS?

'HTTP'             (-0.022) | top: ['HTTP':-0.022, 'The':-3.897, 'HTTPS':-6.897]
' ('               (-0.001) | top: [' (':-0.001, ' and':-6.626, ' stands':-11.126]
'H'                (-0.014) | top: ['H':-0.014, 'Hyper':-4.264, 'Hy':-10.014]
'yp'               (-0.000) | top: ['yp':-0.000, 'ypo':-9.125, 'yper':-9.375]
'ertext'           (-0.000) | top: ['ertext':-0.000, 'ert':-16.875, 'ersonic':-17.500]
' Transfer'        (-0.000) | top: [' Transfer':-0.000, ' Transport':-10.125, 'Transfer':-11.000]
' Protocol'        (-0.000) | top: [' Protocol':-0.000, 'Protocol':-10.250, ' protocol':-13.250]
')'                (-0.000) | top: [')':-0.000, '),':-16.750, ' )':-18.250]
' and'             (-0.055) | top: [' and':-0.055, ' is':-2.930, 'and':-15.430]
' HTTPS'           (-0.000) | top: [' HTTPS':-0.000, 'HTTPS':-11.000, ' HTTP':-12.500]
' ('               (-0.000) | top: [' (':-0.000, ' are':-14.875, ' ()':-17.375]
'H'                (-0.000) | top: ['H':-0.000, 'Hyper':-10.875, 'Secure':-11.875]
'yp'               (-0.000) | top: ['yp':-0.000, 'ypo':-13.375, ' Hyp':-14.375]
'ertext'           (-0.000) | top: ['ertext':-0.000, 'ert':-13.250, 'ext':-16.250]
' Transfer'        (-0.000) | top: [' Transfer':-0.000, ' Transport':-9.250, 'Transfer':-10.375]
' Protocol'        (-0.000) | top: [' Protocol':-0.000, 'Protocol':-11.625, ' protocol':-12.375]
' Secure'          (-0.000) | top: [' Secure':-0.000, ' Sec':-8.625, 'Secure':-9.375]
')'                (-0.000) | top: [')':-0.000, '),':-14.750, '))':-15.875]
' are'             (-0.000) | top: [' are':-0.000, ' both':-7.875, ' differ':-11.125]
' both'            (-0.593) | top: [' both':-0.593, ' two':-0.843, ' the':-4.343]
' protocols'       (-0.504) | top: [' protocols':-0.504, ' used':-1.254, ' communication':-2.254]
' used'            (-0.001) | top: [' used':-0.001, ' for':-7.376, ' that':-7.626]
' for'             (-0.180) | top: [' for':-0.180, ' to':-1.805, ' by':-8.805]
' transferring'    (-0.073) | top: [' transferring':-0.073, ' transmitting':-2.823, ' exchanging':-5.198]
' data'            (-0.000) | top: [' data':-0.000, ' files':-11.000, ' and':-11.125]
' over'            (-0.011) | top: [' over':-0.011, ',':-5.136, ' between':-5.636]
' the'             (-0.000) | top: [' the':-0.000, ' a':-11.750, ' internet':-13.750]
' internet'        (-0.000) | top: [' internet':-0.000, ' web':-8.250, ' Internet':-9.125]
'.'                (-0.434) | top: ['.':-0.434, ',':-1.059, '.\n\n':-5.184]
' The'             (-0.001) | top: [' The':-0.001, ' However':-7.001, ' While':-9.251]
' main'            (-0.013) | top: [' main':-0.013, ' primary':-4.638, ' key':-6.263]
' difference'      (-0.000) | top: [' difference':-0.000, ' differences':-8.375, 'difference':-12.500]
' between'         (-0.000) | top: [' between':-0.000, ' is':-8.000, ' lies':-10.750]
' the'             (-0.694) | top: [' the':-0.694, ' them':-0.694, ' HTTP':-7.819]
' two'             (-0.000) | top: [' two':-0.000, 'two':-15.625, ' Two':-21.750]
' is'              (-0.004) | top: [' is':-0.004, ' lies':-5.629, ' protocols':-9.879]
' the'             (-0.117) | top: [' the':-0.117, ' that':-2.242, ' how':-5.742]
' level'           (-0.038) | top: [' level':-0.038, ' way':-3.413, ' security':-6.163]
' of'              ( 0.000) | top: [' of':0.000, ' security':-19.000, ' or':-19.375]
' security'        (-0.016) | top: [' security':-0.016, ' encryption':-4.141, 'security':-11.141]
' and'             (-0.006) | top: [' and':-0.006, ' they':-5.131, ' provided':-8.131]
' encryption'      (-0.005) | top: [' encryption':-0.005, ' authentication':-6.755, ' protection':-7.005]
' used'            (-0.093) | top: [' used':-0.093, ' they':-2.843, ' provided':-3.968]
' to'              (-0.020) | top: [' to':-0.020, '.\n\n':-4.145, ':\n\n':-6.145]
' protect'         (-0.054) | top: [' protect':-0.054, ' transmit':-4.054, ' secure':-4.179]
' the'             (-0.001) | top: [' the':-0.001, ' data':-6.876, ' user':-8.501]
' data'            (-0.009) | top: [' data':-0.009, ' communication':-4.884, ' transmission':-7.259]
' being'           (-0.333) | top: [' being':-0.333, '.\n\n':-1.833, ' in':-2.208]
' transmitted'     (-0.253) | top: [' transmitted':-0.253, ' transferred':-1.503, ' sent':-7.503]
'.\n\n'            (-0.000) | top: ['.\n\n':-0.000, ':\n\n':-7.875, ' between':-12.125]
'**'               (-0.316) | top: ['**':-0.316, 'HTTP':-1.316, 'Here':-6.316]
'HTTP'             (-0.000) | top: ['HTTP':-0.000, 'HTTPS':-9.250, 'What':-10.375]
':'                (-0.603) | top: [':':-0.603, ' (':-1.603, '**':-1.978]
'**\n\n'           (-0.001) | top: ['**\n\n':-0.001, '**\n':-7.376, ' (':-13.626]
'HTTP'             (-0.436) | top: ['HTTP':-0.436, '*':-1.061, '1':-5.186]
' is'              (-0.000) | top: [' is':-0.000, ',':-10.750, ' (':-11.125]
' the'             (-0.744) | top: [' the':-0.744, ' an':-0.994, ' a':-1.869]
' original'        (-0.547) | top: [' original':-0.547, ' standard':-1.547, ' older':-2.422]
' protocol'        (-0.012) | top: [' protocol':-0.012, ' and':-4.637, ',':-7.262]
' used'            (-0.038) | top: [' used':-0.038, ' for':-3.413, ' developed':-6.163]
' for'             (-0.008) | top: [' for':-0.008, ' to':-4.883, ' by':-10.508]
' transferring'    (-0.038) | top: [' transferring':-0.038, ' transmitting':-3.663, ' exchanging':-5.538]
' data'            (-0.010) | top: [' data':-0.010, ' web':-5.135, ' hyp':-5.760]
' over'            (-0.003) | top: [' over':-0.003, ' on':-6.378, ' between':-6.503]

6. Suggest a short book to read on a rainy day.

'A'                (-0.121) | top: ['A':-0.121, 'What':-2.371, 'Perfect':-4.746]
' rainy'           (-0.481) | top: [' rainy':-0.481, ' perfect':-1.106, ' cozy':-3.481]
' day'             (-0.000) | top: [' day':-0.000, ' days':-14.625, 'day':-15.125]
' is'              (-0.002) | top: [' is':-0.002, '!':-7.502, ' calls':-7.752]
' the'             (-0.001) | top: [' the':-0.001, ' a':-7.376, ' perfect':-8.751]
' perfect'         (-0.001) | top: [' perfect':-0.001, ' pur':-7.626, 'perfect':-10.251]
' excuse'          (-0.000) | top: [' excuse':-0.000, ' opportunity':-10.625, ' time':-10.875]
' to'              (-0.000) | top: [' to':-0.000, ' for':-13.125, ' stay':-19.750]
' curl'            (-0.428) | top: [' curl':-0.428, ' cozy':-1.303, ' sn':-3.303]
' up'              (-0.000) | top: [' up':-0.000, '-up':-16.125, ' Up':-19.875]
' with'            (-0.000) | top: [' with':-0.000, ' and':-14.875, 'with':-18.750]
' a'               ( 0.000) | top: [' a':0.000, ' an':-19.250, ' some':-20.375]
' good'            (-0.009) | top: [' good':-0.009, ' great':-4.759, ' book':-9.884]
' book'            (-0.000) | top: [' book':-0.000, 'book':-15.625, ' read':-15.625]
'!'                (-0.211) | top: ['!':-0.211, '!\n\n':-1.711, '.':-4.711]
' Here'            (-0.013) | top: [' Here':-0.013, ' I':-4.513, ' Considering':-7.388]
"'s"               (-0.313) | top: ["'s":-0.313, ' are':-1.313, ' is':-12.563]
' a'               ( 0.000) | top: [' a':0.000, ' my':-17.125, ' some':-17.750]
' suggestion'      (-0.170) | top: [' suggestion':-0.170, ' short':-1.920, ' recommendation':-5.045]
' for'             (-0.696) | top: [' for':-0.696, ':\n\n':-0.696, ' that':-6.071]
' a'               (-0.000) | top: [' a':-0.000, ' you':-12.625, ' an':-16.750]
' short'           (-0.009) | top: [' short':-0.009, ' delightful':-5.134, ' cozy':-6.634]
' and'             (-0.070) | top: [' and':-0.070, ' but':-3.820, ',':-3.945]
' cozy'            (-0.413) | top: [' cozy':-0.413, ' delightful':-1.288, ' engaging':-3.788]
' read'            (-0.049) | top: [' read':-0.049, ' book':-3.049, ' novel':-7.924]
':\n\n'            (-0.102) | top: [':\n\n':-0.102, ' that':-2.352, ' to':-6.352]
'"The'             (-0.535) | top: ['"The':-0.535, '**':-0.910, '"':-5.035]
' Little'          (-0.156) | top: [' Little':-0.156, ' Night':-3.031, ' Snow':-3.531]
' Paris'           (-0.067) | top: [' Paris':-0.067, ' Prince':-2.817, ' Book':-6.192]
' Book'            (-0.000) | top: [' Book':-0.000, 'Book':-10.750, ' Books':-12.375]
'shop'             (-0.000) | top: ['shop':-0.000, 'store':-12.500, 'Shop':-13.500]
'"'                (-0.000) | top: ['"':-0.000, ':':-11.125, ' by':-13.750]
' by'              (-0.000) | top: [' by':-0.000, ' (':-14.000, 'by':-14.625]
' Nina'            (-0.001) | top: [' Nina':-0.001, ' Natalie':-8.563, ' Nicholas':-9.376]
' George'          (-0.001) | top: [' George':-0.001, ' Georges':-7.501, 'George':-8.251]
' ('               (-0.264) | top: [' (':-0.264, '\n\n':-1.764, ':\n\n':-3.139]
'224'              (-1.301) | top: ['224':-1.301, '272':-1.426, '192':-1.426]
' pages'           (-0.000) | top: [' pages':-0.000, 'pages':-11.500, ' Pages':-12.000]
')\n\n'            (-0.013) | top: [')\n\n':-0.013, '):\n\n':-4.638, ',':-6.013]
'This'             (-0.001) | top: ['This':-0.001, 'Imagine':-8.501, 'Set':-8.501]
' charming'        (-0.180) | top: [' charming':-0.180, ' novel':-2.680, ' delightful':-3.430]
' novel'           (-0.012) | top: [' novel':-0.012, ' nov':-4.512, ' and':-7.637]
' tells'           (-0.096) | top: [' tells':-0.096, ' is':-2.971, ' follows':-3.221]
' the'             ( 0.000) | top: [' the':0.000, ' a':-20.000, ' story':-22.375]
' story'           (-0.000) | top: [' story':-0.000, ' tale':-11.875, ' Story':-12.375]
' of'              ( 0.000) | top: [' of':0.000, 'of':-18.625, ' Mons':-20.625]
' Jean'            (-0.035) | top: [' Jean':-0.035, ' Mons':-3.535, ' a':-6.660]
' Per'             (-0.007) | top: [' Per':-0.007, 'Per':-6.132, '-P':-6.882]
'du'               (-0.009) | top: ['du':-0.009, 'uvian':-6.572, 'rot':-6.572]
','                ( 0.000) | top: [',':0.000, "'s":-17.750, ' who':-18.625]
' a'               (-0.002) | top: [' a':-0.002, ' the':-6.252, ' who':-10.627]
' books'           (-0.734) | top: [' books':-0.734, ' gr':-1.734, ' melanch':-1.984]
'eller'            (-0.000) | top: ['eller':-0.000, 'ellers':-8.750, 'elling':-9.375]
' who'             (-0.022) | top: [' who':-0.022, ' in':-3.897, ' and':-7.272]
' has'             (-0.365) | top: [' has':-0.365, ' owns':-2.365, ' sets':-2.490]
' been'            (-0.081) | top: [' been':-0.081, ' lost':-3.581, ' spent':-3.831]
' stuck'           (-0.233) | top: [' stuck':-0.233, ' unable':-2.545, ' wandering':-3.795]
' in'              (-0.034) | top: [' in':-0.034, ' on':-3.409, ' at':-9.159]
' his'             (-0.769) | top: [' his':-0.769, ' a':-0.894, ' the':-2.269]
' grief'           (-1.097) | top: [' grief':-1.097, ' ways':-1.347, ' life':-1.847]
' for'             (-1.154) | top: [' for':-1.154, ' over':-1.404, ' after':-1.529]
' years'           (-0.447) | top: [' years':-0.447, ' ':-1.947, ' decades':-2.197]
' after'           (-0.523) | top: [' after':-0.523, ' since':-1.523, '.':-1.773]
' a'               (-1.077) | top: [' a':-1.077, ' the':-1.077, ' his':-1.202]

7. 2+2=?

'The'              (-0.340) | top: ['The':-0.340, '2':-1.965, '4':-2.090]
' answer'          (-0.002) | top: [' answer':-0.002, ' correct':-6.127, ' easy':-11.377]
' is'              (-0.252) | top: [' is':-0.252, ' to':-1.502, ',':-15.002]
' '                (-0.465) | top: [' ':-0.465, '...':-1.215, ':':-2.715]
'4'                (-0.000) | top: ['4':-0.000, '2':-15.250, '４':-16.250]
'!'                (-0.092) | top: ['!':-0.092, '.':-2.467, '<|eot_id|>':-5.842]
'<|eot_id|>'       (-0.000) | top: ['<|eot_id|>':-0.000, '':-9.125, '':-9.563]

Next Steps

Fuse draft logits gather and two softmaxes (draft and target probs).
Remove strict rejection sampling and make probabilistic the default.

gemini-code-assist

Code Review

This pull request introduces logprobs support for speculative decoding with rejection sampling. The changes primarily involve a significant refactoring of the probabilistic rejection sampling implementation, including new and modified Triton kernels to support greedy sampling and efficiently gather draft logits. A new kernel is also added to flatten sampled tokens for logprob computation. While the overall approach is sound, I've identified a critical correctness issue where draft logits are not temperature-scaled, and a high-severity bug related to tensor initialization that could result in incorrect token IDs.

vllm/v1/worker/gpu/spec_decode/rejection_sampler.py

WoosukKwon

Thanks for the PR!

WoosukKwon · 2026-03-18T21:12:57Z

@TheEpicDolphin Can you please rebase? Sorry for the delay!

Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>

…m-project#37237) Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>

…m-project#37237) Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai> Signed-off-by: Monishver Chandrasekaran <monishverchandrasekaran@gmail.com>

…m-project#37237) Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>

…m-project#37237) Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai> Signed-off-by: Vinay Damodaran <vrdn@hey.com>

mergify bot added the v1 label Mar 16, 2026

TheEpicDolphin force-pushed the gdelfin/mrv2-spec-decode-rejection-sample-logprobs branch 2 times, most recently from b14c905 to d3671ed Compare March 16, 2026 23:20

TheEpicDolphin marked this pull request as ready for review March 16, 2026 23:20

TheEpicDolphin requested review from WoosukKwon and njhill as code owners March 16, 2026 23:20

TheEpicDolphin mentioned this pull request Mar 16, 2026

[Model Runner V2] Spec decode rejection sampler greedy + logprobs support #36930

Closed

gemini-code-assist bot reviewed Mar 16, 2026

View reviewed changes

vllm/v1/worker/gpu/spec_decode/rejection_sampler.py Show resolved Hide resolved

vllm/v1/worker/gpu/spec_decode/rejection_sampler.py Outdated Show resolved Hide resolved

TheEpicDolphin force-pushed the gdelfin/mrv2-spec-decode-rejection-sample-logprobs branch 3 times, most recently from f862803 to 79a48b3 Compare March 17, 2026 00:03

WoosukKwon added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 18, 2026

WoosukKwon approved these changes Mar 18, 2026

View reviewed changes

[Model Runner V2] Spec decode rejection sampler logprobs support

d9a477f

Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>

TheEpicDolphin force-pushed the gdelfin/mrv2-spec-decode-rejection-sample-logprobs branch from 79a48b3 to d9a477f Compare March 18, 2026 23:11

WoosukKwon enabled auto-merge (squash) March 18, 2026 23:55

WoosukKwon merged commit 053f3b6 into vllm-project:main Mar 19, 2026
60 checks passed

fxdawnn pushed a commit to fxdawnn/vllm that referenced this pull request Mar 19, 2026

[Model Runner V2] Spec decode rejection sampler logprobs support (vll…

0da4303

…m-project#37237) Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>

TheEpicDolphin deleted the gdelfin/mrv2-spec-decode-rejection-sample-logprobs branch March 19, 2026 17:40

SouthWest7 pushed a commit to SouthWest7/vllm that referenced this pull request Mar 27, 2026

[Model Runner V2] Spec decode rejection sampler logprobs support (vll…

de5f045

…m-project#37237) Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>

khairulkabir1661 pushed a commit to khairulkabir1661/vllm that referenced this pull request Mar 27, 2026

[Model Runner V2] Spec decode rejection sampler logprobs support (vll…

33f1588

…m-project#37237) Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>

JiantaoXu pushed a commit to JiantaoXu/vllm that referenced this pull request Mar 28, 2026

[Model Runner V2] Spec decode rejection sampler logprobs support (vll…

fcbc55e

…m-project#37237) Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>

vrdn-23 pushed a commit to vrdn-23/vllm that referenced this pull request Mar 30, 2026

[Model Runner V2] Spec decode rejection sampler logprobs support (vll…

d291cfc

…m-project#37237) Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai> Signed-off-by: Vinay Damodaran <vrdn@hey.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Model Runner V2] Spec decode rejection sampler logprobs support#37237

[Model Runner V2] Spec decode rejection sampler logprobs support#37237
WoosukKwon merged 1 commit intovllm-project:mainfrom
TheEpicDolphin:gdelfin/mrv2-spec-decode-rejection-sample-logprobs

TheEpicDolphin commented Mar 16, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

WoosukKwon left a comment

Uh oh!

WoosukKwon commented Mar 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

TheEpicDolphin commented Mar 16, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Testing

Next Steps

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

WoosukKwon left a comment

Choose a reason for hiding this comment

Uh oh!

WoosukKwon commented Mar 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

TheEpicDolphin commented Mar 16, 2026 •

edited by github-actions bot

Loading