Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,7 @@ Best peak throughput per model on NVIDIA RTX PRO 6000 Blackwell (vLLM v0.15.1, 3
| 3 | `PaddlePaddle/PaddleOCR-VL` | 2,341.9 | 64 | 6,385 ms | 49.0 ms |
| 4 | `deepseek-ai/DeepSeek-OCR` | 1,195.8 | 32 | 3,571 ms | 15.9 ms |
| 5 | `Qwen/Qwen3-VL-8B-Instruct` | 953.8 | 64 | 448 ms | 25.7 ms |
| 6 | `Qwen/Qwen3.6-35B-A3B-FP8` (DFlash spec) | 523.1 | 16 | 18,399 ms | 0.3 ms |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 PR does not bump version in vlmbench/version.py as required by CLAUDE.md

CLAUDE.md mandates: "Every PR must bump the version in vlmbench/version.py (__version__ = "X.Y.Z"). If the version is already bumped, do not bump it again." This PR only modifies README.md and does not include a version bump. The version remains at 0.5.5 (unchanged from the prior commit vlmbench/version.py:1).

Prompt for agents
The CLAUDE.md rule requires every PR to bump the version in vlmbench/version.py. The current version is 0.5.5 (defined in vlmbench/version.py). Since this is a docs-only change, a patch bump to 0.5.6 would be appropriate. Edit vlmbench/version.py and change __version__ = "0.5.5" to __version__ = "0.5.6".
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This entry introduces several inconsistencies with the leaderboard's established criteria (defined in line 116):

  1. Environment: The header specifies local hardware (RTX 6000), while this result was obtained via Cloud Run with HTTPS overhead.
  2. Software: The header specifies vLLM v0.15.1, but this used a nightly build (v0.19.2rc1...).
  3. Metrics: The 0.3 ms TPOT is, as noted in the PR description, "unrealistically low" due to speculative decoding and not directly comparable to the other models' TPOT.

To avoid misleading users, these caveats should be explicitly mentioned in the table row to clarify why the results (especially TTFT and TPOT) differ so significantly from the other entries.

Suggested change
| 6 | `Qwen/Qwen3.6-35B-A3B-FP8` (DFlash spec) | 523.1 | 16 | 18,399 ms | 0.3 ms |
| 6 | `Qwen/Qwen3.6-35B-A3B-FP8` (DFlash spec, Cloud Run, vLLM nightly) | 523.1 | 16 | 18,399 ms | 0.3 ms |


Compare your own results:

Expand Down
Loading