Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
16e7e13
adding mxfp4 quant key
zyongye Feb 13, 2026
0c6da53
runnable but not correct
zyongye Feb 13, 2026
0ed72eb
remove unused variable
zyongye Feb 13, 2026
8c7da24
bug fix
zyongye Feb 13, 2026
eff5f48
convert bf16
zyongye Feb 13, 2026
51e8b0d
revert back scalar dtype
zyongye Feb 13, 2026
6d80baa
fix trtllm moe
zyongye Feb 14, 2026
7a088dc
add tune size to flashinfer experts
zyongye Feb 15, 2026
d10d307
move kernel setup to process_weight
zyongye Feb 15, 2026
2935058
only cast when act is fp8
zyongye Feb 15, 2026
25aac05
add topk_ids contiguous assertion
zyongye Feb 15, 2026
2a3aa22
add testing infrastructure
zyongye Feb 16, 2026
8a5885b
fix pre-commit
zyongye Feb 16, 2026
ae3105e
change parameter inside the kernels
zyongye Feb 17, 2026
8cd30a3
change ci to h100
zyongye Feb 17, 2026
094fc4c
add back quant function parameters
zyongye Feb 17, 2026
28ae123
add back dep interface
zyongye Feb 17, 2026
a258129
add back dep interface
zyongye Feb 17, 2026
b353697
fixing trtllm moe and pre commit
zyongye Feb 17, 2026
cd93ad9
assert not using dep
zyongye Feb 17, 2026
70b025f
bring back dep
zyongye Feb 17, 2026
03e0a58
pre-commit
zyongye Feb 17, 2026
8ec4e1d
update ci tests
zyongye Feb 18, 2026
6de17ac
update device to use in moe config
zyongye Feb 18, 2026
275db17
move fake scale into init
zyongye Feb 18, 2026
91f8c70
add dtype into scales
zyongye Feb 18, 2026
a501137
unifing moe_mk interface
zyongye Feb 19, 2026
c17109a
adding activation type to experts
zyongye Feb 23, 2026
2618960
fix typos and update tests
zyongye Feb 23, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions .buildkite/test_areas/lm_eval.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -73,3 +73,29 @@ steps:
num_devices: 2
commands:
- pytest -s -v evals/gsm8k/test_gsm8k_correctness.py --config-list-file=evals/gsm8k/configs/moe-refactor-dp-ep/config-b200.txt

- label: GPQA Eval (GPT-OSS) (H100)
timeout_in_minutes: 120
device: h100
optional: true
num_devices: 2
source_file_dependencies:
- csrc/
- vllm/model_executor/layers/quantization
- tests/evals/gpt_oss/
commands:
- uv pip install --system 'gpt-oss[eval]==0.0.5'
- pytest -s -v evals/gpt_oss/test_gpqa_correctness.py --config-list-file=configs/models-h100.txt

- label: GPQA Eval (GPT-OSS) (B200)
timeout_in_minutes: 120
device: b200
optional: true
num_devices: 2
source_file_dependencies:
- csrc/
- vllm/model_executor/layers/quantization
- tests/evals/gpt_oss/
commands:
- uv pip install --system 'gpt-oss[eval]==0.0.5'
- pytest -s -v evals/gpt_oss/test_gpqa_correctness.py --config-list-file=configs/models-b200.txt
Comment on lines +76 to +101
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are going to add these here, please remove the duplicated ones in misc.yaml

27 changes: 0 additions & 27 deletions .buildkite/test_areas/misc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -153,33 +153,6 @@ steps:
- pytest -v -s transformers_utils
- pytest -v -s config

- label: GPT-OSS Eval (H100)
timeout_in_minutes: 60
working_dir: "/vllm-workspace/"
device: h100
optional: true
source_file_dependencies:
- tests/evals/gpt_oss
- vllm/model_executor/models/gpt_oss.py
- vllm/model_executor/layers/quantization/mxfp4.py
commands:
- uv pip install --system 'gpt-oss[eval]==0.0.5'
- pytest -s -v tests/evals/gpt_oss/test_gpqa_correctness.py --model openai/gpt-oss-20b --metric 0.58

- label: GPT-OSS Eval (B200)
timeout_in_minutes: 60
working_dir: "/vllm-workspace/"
device: b200
optional: true
source_file_dependencies:
- tests/evals/gpt_oss
- vllm/model_executor/models/gpt_oss.py
- vllm/model_executor/layers/quantization/mxfp4.py
- vllm/v1/attention/backends/flashinfer.py
commands:
- uv pip install --system 'gpt-oss[eval]==0.0.5'
- pytest -s -v tests/evals/gpt_oss/test_gpqa_correctness.py --model openai/gpt-oss-20b --metric 0.58

- label: Batch Invariance (H100)
timeout_in_minutes: 25
device: h100
Expand Down
49 changes: 49 additions & 0 deletions tests/evals/gpt_oss/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# GPQA Evaluation using GPT-OSS

This directory contains GPQA evaluation tests using the GPT-OSS evaluation package and vLLM server.

## Usage

### Run tests with pytest (like buildkite)

```bash
# H200
pytest -s -v tests/evals/gpt_oss/test_gpqa_correctness.py \
--config-list-file=configs/models-h200.txt

# B200
pytest -s -v tests/evals/gpt_oss/test_gpqa_correctness.py \
--config-list-file=configs/models-b200.txt
```

## Configuration Format

Model configs in `configs/` directory use this YAML format:

```yaml
model_name: "openai/gpt-oss-20b"
metric_threshold: 0.568 # Minimum expected accuracy
reasoning_effort: "low" # Reasoning effort level (default: "low")
server_args: "--tensor-parallel-size 2" # Server arguments
startup_max_wait_seconds: 1800 # Max wait for server startup (default: 1800)
env: # Environment variables (optional)
SOME_VAR: "value"
```

The `server_args` field accepts any arguments that can be passed to `vllm serve`.

The `env` field accepts a dictionary of environment variables to set for the server process.

## Adding New Models

1. Create a new YAML config file in the `configs/` directory
2. Add the filename to the appropriate `models-*.txt` file

## Tiktoken Encoding Files

The tiktoken encoding files required by the vLLM server are automatically downloaded from OpenAI's public blob storage on first run:

- `cl100k_base.tiktoken`
- `o200k_base.tiktoken`

Files are cached in the `data/` directory. The `TIKTOKEN_ENCODINGS_BASE` environment variable is automatically set to point to this directory when running evaluations.
6 changes: 6 additions & 0 deletions tests/evals/gpt_oss/configs/gpt-oss-20b-baseline.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: Copyright contributors to the vLLM project
model_name: "openai/gpt-oss-20b"
metric_threshold: 0.568
reasoning_effort: "low"
server_args: "--tensor-parallel-size 2"
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: Copyright contributors to the vLLM project
model_name: "openai/gpt-oss-20b"
metric_threshold: 0.568
reasoning_effort: "low"
server_args: "--tensor-parallel-size 2"
env:
VLLM_USE_FLASHINFER_MOE_MXFP4_BF16: "1"
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: Copyright contributors to the vLLM project
model_name: "openai/gpt-oss-20b"
metric_threshold: 0.568
reasoning_effort: "low"
server_args: "--tensor-parallel-size 2"
env:
VLLM_USE_FLASHINFER_MOE_MXFP4_MXFP8_CUTLASS: "1"
8 changes: 8 additions & 0 deletions tests/evals/gpt_oss/configs/gpt-oss-20b-marlin.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: Copyright contributors to the vLLM project
model_name: "openai/gpt-oss-20b"
metric_threshold: 0.568
reasoning_effort: "low"
server_args: "--tensor-parallel-size 2"
env:
VLLM_MXFP4_USE_MARLIN: "1"
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: Copyright contributors to the vLLM project
model_name: "openai/gpt-oss-20b"
metric_threshold: 0.568
reasoning_effort: "low"
server_args: "--tensor-parallel-size 2"
env:
VLLM_USE_FLASHINFER_MOE_MXFP4_MXFP8: "1"
5 changes: 5 additions & 0 deletions tests/evals/gpt_oss/configs/models-b200.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# B200 model configurations for GPQA evaluation
# Tests different environment variable combinations
gpt-oss-20b-flashinfer-mxfp4-bf16.yaml
gpt-oss-20b-flashinfer-mxfp4-mxfp8-cutlass.yaml
gpt-oss-20b-sm100-fi-mxfp4-mxfp8-trtllm.yaml
5 changes: 5 additions & 0 deletions tests/evals/gpt_oss/configs/models-h100.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# H100 model configurations for GPQA evaluation
# Tests different environment variable combinations
gpt-oss-20b-baseline.yaml
gpt-oss-20b-flashinfer-mxfp4-bf16.yaml
gpt-oss-20b-marlin.yaml
60 changes: 54 additions & 6 deletions tests/evals/gpt_oss/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,61 @@
Pytest configuration for GPT-OSS evaluation tests.
"""

from pathlib import Path


def pytest_addoption(parser):
"""Add command line options for pytest."""
parser.addoption("--model", action="store", help="Model name to evaluate")
parser.addoption(
"--metric", action="store", type=float, help="Expected metric threshold"
)
"""Add custom command line options."""
parser.addoption(
"--server-args", action="store", default="", help="Additional server arguments"
"--config-list-file",
required=True,
help="File containing list of config files to test",
)


def pytest_generate_tests(metafunc):
"""Generate test parameters from config files."""
if "config_filename" in metafunc.fixturenames:
config_list_file = metafunc.config.getoption("--config-list-file")

# Handle both relative and absolute paths
config_list_path = Path(config_list_file)
if not config_list_path.is_absolute():
# If relative, try relative to test directory first
test_dir_path = Path(__file__).parent / config_list_file
if test_dir_path.exists():
config_list_path = test_dir_path
else:
# Try relative to current working directory
config_list_path = Path.cwd() / config_list_file

print(f"Looking for config list at: {config_list_path}")

config_files = []
if config_list_path.exists():
# Determine config directory (same directory as the list file)
config_dir = config_list_path.parent

with open(config_list_path) as f:
for line in f:
line = line.strip()
if line and not line.startswith("#"):
config_path = config_dir / line
print(f"Checking config file: {config_path}")
if config_path.exists():
config_files.append(config_path)
print(f" Found: {config_path}")
else:
print(f" Missing: {config_path}")
else:
print(f"Config list file not found: {config_list_path}")

# Generate test parameters
if config_files:
metafunc.parametrize(
"config_filename",
config_files,
ids=[config_file.stem for config_file in config_files],
)
else:
print("No config files found, test will be skipped")
Loading