File tree Expand file tree Collapse file tree 6 files changed +9
-3
lines changed 
tensorrt_llm/_torch/auto_deploy/custom_ops 
tests/integration/test_lists/test-db Expand file tree Collapse file tree 6 files changed +9
-3
lines changed Original file line number Diff line number Diff line change @@ -476,9 +476,6 @@ def update_input_ids_with_new_tokens(
476476        idx  =  self .previous_batch_indices_cuda [: len (previous_batch_indices )]
477477        idx .copy_ (host_idx , non_blocking = True )
478478
479-         # sort them so that masked_scatter_ lines up correctly 
480-         idx , _  =  idx .sort ()
481- 
482479        # gather the exact values you want to write 
483480        src  =  new_tokens [0 , idx , 0 ]
484481
Original file line number Diff line number Diff line change @@ -69,6 +69,8 @@ l0_b200:
6969  - unittest/_torch/modeling -k "modeling_deepseek" 
7070  - unittest/_torch/modeling -k "modeling_gpt_oss" 
7171  - unittest/_torch/auto_deploy/unit/singlegpu -k "not test_trtllm_bench_backend_comparison" 
72+   #  ------------- AutoDeploy tests ---------------
73+   - accuracy/test_llm_api_autodeploy.py::TestLlama3_1_8B::test_auto_dtype 
7274- condition :
7375    ranges :
7476      system_gpu_count :
Original file line number Diff line number Diff line change @@ -103,3 +103,4 @@ l0_dgx_b200:
103103  - accuracy/test_llm_api_pytorch.py::TestGPTOSS::test_w4_4gpus[dp4-TRITON] 
104104  - disaggregated/test_disaggregated.py::test_disaggregated_benchmark_on_diff_backends[llama-v3-8b-hf] 
105105  - disaggregated/test_disaggregated.py::test_disaggregated_benchmark_on_diff_backends[DeepSeek-V3-Lite-fp8] 
106+   - accuracy/test_llm_api_autodeploy.py::TestLlama3_1_8B::test_auto_dtype 
Original file line number Diff line number Diff line change @@ -61,6 +61,8 @@ l0_dgx_h100:
6161  - test_e2e.py::test_ptp_quickstart_advanced_bs1 
6262  - test_e2e.py::test_ptp_quickstart_advanced_deepseek_v3_lite_4gpus_adp_balance[DeepSeek-V3-Lite-FP8-DeepSeek-V3-Lite/fp8] 
6363  - unittest/_torch/modeling/test_modeling_pixtral.py::test_tensor_parallelism 
64+   #  ------------- AutoDeploy tests ---------------
65+   - accuracy/test_llm_api_autodeploy.py::TestLlama3_1_8B::test_auto_dtype 
6466- condition :
6567    ranges :
6668      system_gpu_count :
Original file line number Diff line number Diff line change @@ -34,6 +34,8 @@ l0_dgx_h200:
3434  - unittest/_torch/multi_gpu_modeling/test_llama4.py::test_llama4[pp1-ep1-disable_adp-enable_graph-tp8-trtllm-scout] 
3535  - unittest/_torch/multi_gpu_modeling/test_llama4.py::test_llama4[pp1-ep4-enable_adp-enable_graph-tp8-trtllm-scout] 
3636  - unittest/llmapi/test_llm_pytorch.py::test_nemotron_nas_lora 
37+   #  ------------- AutoDeploy tests ---------------
38+   - accuracy/test_llm_api_autodeploy.py::TestLlama3_1_8B::test_auto_dtype 
3739- condition :
3840    ranges :
3941      system_gpu_count :
Original file line number Diff line number Diff line change @@ -102,6 +102,8 @@ l0_h100:
102102  - test_e2e.py::test_trtllm_bench_request_rate_and_concurrency[enable_concurrency-enable_request_rate]  #  negative test
103103  - test_e2e.py::test_trtllm_bench_help_sanity[meta-llama/Llama-3.1-8B] 
104104  - test_e2e.py::test_ptp_quickstart_multimodal[gemma-3-27b-it-gemma/gemma-3-27b-it-image-True] 
105+   #  ------------- AutoDeploy tests ---------------
106+   - accuracy/test_llm_api_autodeploy.py::TestLlama3_1_8B::test_auto_dtype 
105107- condition :
106108    ranges :
107109      system_gpu_count :
    
 
   
 
     
   
   
          
     
  
    
     
 
    
      
     
 
     
    You can’t perform that action at this time.
  
 
    
  
     
    
      
        
     
 
       
      
     
   
 
    
    
  
 
  
 
     
    
0 commit comments