Skip to content

Conversation

Deleter-D
Copy link
Collaborator

@Deleter-D Deleter-D commented Oct 16, 2025

接口请求方法参考#4467

image

从推理层面来看,每个step会产生两条消息,分别来自Target模型和Draft模型。

  • Target模型根据seq_lens_this_time返回即可;
  • Draft模型只在第一个Draft Step返回根据Target模型的hidden states推理出的tokens及对应logprobs,其他Draft Step产生的token则延迟到下一次验证后,借由Draft模型的重计算机制来返回正确的tokens及logprobs。

Copy link

paddle-bot bot commented Oct 16, 2025

Thanks for your contribution!

yuanlehome
yuanlehome previously approved these changes Oct 17, 2025
@yuanlehome
Copy link
Collaborator

PR描述里给出一些实现方案,思路,注意事项等等必要的信息

@Deleter-D
Copy link
Collaborator Author

PR描述里给出一些实现方案,思路,注意事项等等必要的信息

已添加描述~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants