diff --git a/docs/source/developer_guide/evaluation/accuracy_report/Qwen2.5-VL-7B-Instruct.md b/docs/source/developer_guide/evaluation/accuracy_report/Qwen2.5-VL-7B-Instruct.md new file mode 100644 index 000000000..e9cac815f --- /dev/null +++ b/docs/source/developer_guide/evaluation/accuracy_report/Qwen2.5-VL-7B-Instruct.md @@ -0,0 +1,21 @@ +# Qwen/Qwen2.5-VL-7B-Instruct + +**vLLM Version**: vLLM: 0.10.0 ([6d8d0a2](https://github.com/vllm-project/vllm/commit/6d8d0a2)), +**vLLM Ascend Version**: v0.10.0rc1 ([4604882](https://github.com/vllm-project/vllm-ascend/commit/4604882)) +**Software Environment**: CANN: 8.2.RC1, PyTorch: 2.7.1, torch-npu: 2.7.1.dev20250724 +**Hardware Environment**: Atlas A2 Series +**Datasets**: mmmu_val +**Parallel Mode**: TP +**Execution Mode**: ACLGraph + +**Command**: + +```bash +export MODEL_ARGS='pretrained=Qwen/Qwen2.5-VL-7B-Instruct,tensor_parallel_size=1,dtype=auto,trust_remote_code=False,max_model_len=8192' +lm_eval --model vllm-vlm --model_args $MODEL_ARGS --tasks mmmu_val \ +--apply_chat_template True --fewshot_as_multiturn True \ +--limit None --batch_size auto +``` +| Task | Metric | Value | Stderr | +|-----------------------|-------------|----------:|-------:| +| mmmu_val | acc,none |✅0.5211111111111111 | ± 0.0162 | diff --git a/docs/source/developer_guide/evaluation/accuracy_report/Qwen3-30B-A3B.md b/docs/source/developer_guide/evaluation/accuracy_report/Qwen3-30B-A3B.md new file mode 100644 index 000000000..f419b66d0 --- /dev/null +++ b/docs/source/developer_guide/evaluation/accuracy_report/Qwen3-30B-A3B.md @@ -0,0 +1,22 @@ +# Qwen/Qwen3-30B-A3B + +**vLLM Version**: vLLM: 0.10.0 ([6d8d0a2](https://github.com/vllm-project/vllm/commit/6d8d0a2)), +**vLLM Ascend Version**: v0.10.0rc1 ([4604882](https://github.com/vllm-project/vllm-ascend/commit/4604882)) +**Software Environment**: CANN: 8.2.RC1, PyTorch: 2.7.1, torch-npu: 2.7.1.dev20250724 +**Hardware Environment**: Atlas A2 Series +**Datasets**: gsm8k +**Parallel Mode**: TP +**Execution Mode**: ACLGraph + +**Command**: + +```bash +export MODEL_ARGS='pretrained=Qwen/Qwen3-30B-A3B,tensor_parallel_size=2,dtype=auto,trust_remote_code=False,max_model_len=4096,gpu_memory_utilization=0.6,enable_expert_parallel=True' +lm_eval --model vllm --model_args $MODEL_ARGS --tasks gsm8k \ +--apply_chat_template False --fewshot_as_multiturn False --num_fewshot 5 \ +--limit None --batch_size auto +``` +| Task | Metric | Value | Stderr | +|-----------------------|-------------|----------:|-------:| +| gsm8k | exact_match,strict-match |✅0.8938589840788476 | ± 0.0085 | +| gsm8k | exact_match,flexible-extract |✅0.8476118271417741 | ± 0.0099 | diff --git a/docs/source/developer_guide/evaluation/accuracy_report/Qwen3-8B-Base.md b/docs/source/developer_guide/evaluation/accuracy_report/Qwen3-8B-Base.md new file mode 100644 index 000000000..5d23c0cca --- /dev/null +++ b/docs/source/developer_guide/evaluation/accuracy_report/Qwen3-8B-Base.md @@ -0,0 +1,22 @@ +# Qwen/Qwen3-8B-Base + +**vLLM Version**: vLLM: 0.10.0 ([6d8d0a2](https://github.com/vllm-project/vllm/commit/6d8d0a2)), +**vLLM Ascend Version**: v0.10.0rc1 ([4604882](https://github.com/vllm-project/vllm-ascend/commit/4604882)) +**Software Environment**: CANN: 8.2.RC1, PyTorch: 2.7.1, torch-npu: 2.7.1.dev20250724 +**Hardware Environment**: Atlas A2 Series +**Datasets**: gsm8k +**Parallel Mode**: TP +**Execution Mode**: ACLGraph + +**Command**: + +```bash +export MODEL_ARGS='pretrained=Qwen/Qwen3-8B-Base,tensor_parallel_size=1,dtype=auto,trust_remote_code=False,max_model_len=4096' +lm_eval --model vllm --model_args $MODEL_ARGS --tasks gsm8k \ +--apply_chat_template True --fewshot_as_multiturn True --num_fewshot 5 \ +--limit None --batch_size auto +``` +| Task | Metric | Value | Stderr | +|-----------------------|-------------|----------:|-------:| +| gsm8k | exact_match,strict-match |✅0.8278999241849886 | ± 0.0104 | +| gsm8k | exact_match,flexible-extract |✅0.8294162244124337 | ± 0.0104 | diff --git a/docs/source/developer_guide/evaluation/accuracy_report/index.md b/docs/source/developer_guide/evaluation/accuracy_report/index.md new file mode 100644 index 000000000..079215313 --- /dev/null +++ b/docs/source/developer_guide/evaluation/accuracy_report/index.md @@ -0,0 +1,9 @@ +# Accuracy Report + +:::{toctree} +:caption: Accuracy Report +:maxdepth: 1 +Qwen2.5-VL-7B-Instruct +Qwen3-30B-A3B +Qwen3-8B-Base +:::