[Feature]: Add speculative decoding with draft model pruning

### 🚀 The feature, motivation and pitch

I have implemented the [FR-Spec](https://arxiv.org/abs/2502.14856)  approach
 at the logits processor level, using `AllowedTokenIdsLogitsProcessor`. This implementation does not prune the draft model itself but allows evaluating acceptance rates under different draft pruning ratios. You can find the code [here](https://github.com/jmamou/vllm/tree/frspec-acceptance).

**MT-Bench results:**
pruning ratio | vanilla | 0.1 | 0.25 | 0.5 | 0.75 | 0.9 | 0.99
-- | -- | -- | -- | -- | -- | -- | --
draft acceptance rate (%) | 27.8 | 28.3 | 28.6 | 28.6 | 27.2 | 25.9 | 18.8

New speculative config parameters:
- token_ids_by_frequency: Path to a tensor file containing token frequencies sorted by token IDs, used for pruning-based speculative decoding.
- pruning_ratio: Ratio of tokens to prune during speculative decoding.

**Example usage**
```
VLLM_USE_V1=1 vllm serve meta-llama/Llama-3.1-8B-Instruct \
  --speculative_config '{"method": "eagle", "model": "yuhuili/EAGLE-LLaMA3.1-Instruct-8B", "num_speculative_tokens": 4, "token_ids_by_frequency": "vllm/examples/target-dist-meta-llama-Llama-3.1-8B-Instruct-wikitext-wikitext-103-raw-v1-train.pt", "pruning_ratio": 0.1}'

```
```
vllm bench serve \
  --model meta-llama/Llama-3.1-8B-Instruct \
  --endpoint-type openai-chat \
  --endpoint /v1/chat/completions \
  --dataset-name hf \
  --dataset-path philschmid/mt-bench \
  --num-prompts 80 \
  --max-concurrency 16 \
  --temperature 1 \
  --top-p 1.0

```
By selectively pruning unlikely tokens in the draft model, this feature is expected to improve speculative decoding speedups while maintaining high acceptance rates, enabling faster inference for large models like LLaMA-3.1-8B-Instruct.

@keyboardAnt @eitanturok
FR-Spec implementation https://github.com/vllm-project/vllm/pull/24343

### Alternatives

_No response_

### Additional context

_No response_

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature]: Add speculative decoding with draft model pruning #24506

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: Add speculative decoding with draft model pruning #24506

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions