Skip to content

Commit 4223a9a

Browse files
authored
[TRTLLM-7261][feat] Support phi-4 model in pytorch backend (#7371)
Signed-off-by: Wanli Jiang <[email protected]>
1 parent 572551b commit 4223a9a

File tree

6 files changed

+34
-0
lines changed

6 files changed

+34
-0
lines changed

docs/source/reference/support-matrix.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ TensorRT-LLM optimizes the performance of a range of well-known models on NVIDIA
2626
| `MllamaForConditionalGeneration` | Llama 3.2 | `meta-llama/Llama-3.2-11B-Vision` | L |
2727
| `NemotronForCausalLM` | Nemotron-3, Nemotron-4, Minitron | `nvidia/Minitron-8B-Base` | L |
2828
| `NemotronNASForCausalLM` | NemotronNAS | `nvidia/Llama-3_3-Nemotron-Super-49B-v1` | L |
29+
| `Phi3ForCausalLM` | Phi-4 | `microsoft/Phi-4` | L |
2930
| `Phi4MMForCausalLM` | Phi-4-multimodal | `microsoft/Phi-4-multimodal-instruct` | L + I + A |
3031
| `Qwen2ForCausalLM` | QwQ, Qwen2 | `Qwen/Qwen2-7B-Instruct` | L |
3132
| `Qwen2ForProcessRewardModel` | Qwen2-based | `Qwen/Qwen2.5-Math-PRM-7B` | L |

tests/integration/defs/accuracy/references/gsm8k.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -189,6 +189,11 @@ microsoft/Phi-4-multimodal-instruct-long-rope:
189189
- accuracy: 75.85
190190
microsoft/Phi-4-mini-instruct:
191191
- accuracy: 82.30
192+
microsoft/phi-4:
193+
- accuracy: 90.30
194+
- quant_algo: FP8
195+
kv_cache_quant_algo: FP8
196+
accuracy: 90.64
192197
mistralai/Codestral-22B-v0.1:
193198
- accuracy: 67.10
194199
GPT-OSS/BF16:

tests/integration/defs/accuracy/references/mmlu.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -293,6 +293,11 @@ microsoft/Phi-4-multimodal-instruct:
293293
- accuracy: 69.69
294294
microsoft/Phi-4-multimodal-instruct-long-rope:
295295
- accuracy: 65.98
296+
microsoft/phi-4:
297+
- accuracy: 79.73
298+
- quant_algo: FP8
299+
kv_cache_quant_algo: FP8
300+
accuracy: 79.36
296301
LGAI-EXAONE/EXAONE-4.0-32B:
297302
- accuracy: 78.52
298303
GPT-OSS/BF16:

tests/integration/defs/accuracy/test_llm_api_pytorch.py

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2791,6 +2791,25 @@ def test_fp8(self):
27912791
task.evaluate(llm)
27922792

27932793

2794+
class TestPhi4(LlmapiAccuracyTestHarness):
2795+
MODEL_NAME = "microsoft/phi-4"
2796+
2797+
def test_auto_dtype(self):
2798+
with LLM(f"{llm_models_root()}/Phi-4") as llm:
2799+
task = MMLU(self.MODEL_NAME)
2800+
task.evaluate(llm)
2801+
task = GSM8K(self.MODEL_NAME)
2802+
task.evaluate(llm)
2803+
2804+
@skip_pre_hopper
2805+
def test_fp8(self):
2806+
with LLM(f"{llm_models_root()}/Phi-4-FP8") as llm:
2807+
task = MMLU(self.MODEL_NAME)
2808+
task.evaluate(llm)
2809+
task = GSM8K(self.MODEL_NAME)
2810+
task.evaluate(llm)
2811+
2812+
27942813
class TestPhi4MM(LlmapiAccuracyTestHarness):
27952814
# phi4-mm can also support text input.
27962815
MODEL_NAME = "microsoft/Phi-4-multimodal-instruct"

tests/integration/test_lists/qa/llm_function_full.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -603,6 +603,8 @@ accuracy/test_llm_api_pytorch.py::TestMinistral8BInstruct::test_fp8
603603
accuracy/test_llm_api_pytorch.py::TestPhi4MM::test_auto_dtype
604604
accuracy/test_llm_api_pytorch.py::TestPhi4MM::test_auto_dtype_long_rope
605605
accuracy/test_llm_api_pytorch.py::TestPhi4MiniInstruct::test_auto_dtype
606+
accuracy/test_llm_api_pytorch.py::TestPhi4::test_auto_dtype
607+
accuracy/test_llm_api_pytorch.py::TestPhi4::test_fp8
606608
accuracy/test_llm_api_pytorch.py::TestEXAONE4::test_auto_dtype
607609
accuracy/test_llm_api_pytorch.py::TestQwen2_VL_7B::test_auto_dtype
608610
accuracy/test_disaggregated_serving.py::TestQwen3_8B::test_nixl_backend

tests/integration/test_lists/qa/llm_function_sanity.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -133,6 +133,8 @@ accuracy/test_llm_api_pytorch.py::TestMistralSmall24B::test_auto_dtype
133133
accuracy/test_llm_api_pytorch.py::TestMixtral8x7B::test_fp8_tp2
134134
accuracy/test_llm_api_pytorch.py::TestMixtral8x7B::test_nvfp4_tp2
135135
accuracy/test_llm_api_pytorch.py::TestNemotronNas::test_auto_dtype_tp8
136+
accuracy/test_llm_api_pytorch.py::TestPhi4::test_auto_dtype
137+
accuracy/test_llm_api_pytorch.py::TestPhi4::test_fp8
136138
accuracy/test_llm_api_pytorch.py::TestPhi4MiniInstruct::test_auto_dtype
137139
accuracy/test_llm_api_pytorch.py::TestQwen2_7BInstruct::test_auto_dtype
138140
accuracy/test_llm_api_pytorch.py::TestQwen3_235B_A22B::test_fp8[latency]

0 commit comments

Comments
 (0)