Encountered an error in forward function: slice 712 exceeds buffer size 471 #1480

sleepwalker2017 · 2024-04-22T06:57:38Z

System Info

GPU A30 * 2

TensorRT-LLM version: v0.9.0

Model: vicuna 13B

Who can help?

@byshiue

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

build engine

python convert_checkpoint.py --model_dir /data/weilong.yu/vicuna-13b/vicuna-13b-v1.5/ \
                              --output_dir ./tllm_checkpoint_2gpu_fp16 \
                              --dtype float16 --tp_size 2

trtllm-build --checkpoint_dir ./tllm_checkpoint_2gpu_fp16 \
            --output_dir ./tmp/llama/13B/trt_engines/fp16/2-gpu \
            --gemm_plugin float16 \
            --use_fused_mlp \
            --max_batch_size $1 \
            --max_input_len 2048 \
            --max_output_len 256 \
            --context_fmha enable \
            --paged_kv_cache enable \
            --use_paged_context_fmha enable \
            --remove_input_padding enable  --workers 2 \
            --use_fused_mlp

run benchmark

mpirun -n 2 --allow-run-as-root ./gptManagerBenchmark --engine_dir ../../../examples/llama/tmp/llama/13B/trt_engines/fp16/2-gpu/ --dataset ../../../benchmarks/cpp/token-norm-dist.json --kv_cache_free_gpu_mem_fraction 0.85 --enable_kv_cache_reuse -enable_chunked_context

Expected behavior

No error message.

actual behavior

sh run.sh
[TensorRT-LLM][ERROR] Encountered an error in forward function: slice 712 exceeds buffer size 471
[TensorRT-LLM][WARNING] Step function failed, continuing.
[TensorRT-LLM][ERROR] Encountered an error in forward function: slice 712 exceeds buffer size 471
[TensorRT-LLM][WARNING] Step function failed, continuing.
[TensorRT-LLM][ERROR] Encountered an error in forward function: slice 1553 exceeds buffer size 927
[TensorRT-LLM][WARNING] Step function failed, continuing.
[TensorRT-LLM][ERROR] Encountered an error in forward function: slice 1553 exceeds buffer size 927
[TensorRT-LLM][WARNING] Step function failed, continuing.
[TensorRT-LLM][ERROR] Encountered an error in forward function: slice 884 exceeds buffer size 642
[TensorRT-LLM][WARNING] Step function failed, continuing.
[TensorRT-LLM][ERROR] Encountered an error in forward function: slice 884 exceeds buffer size 642
[TensorRT-LLM][WARNING] Step function failed, continuing.
[TensorRT-LLM][ERROR] Encountered an error in forward function: slice 1192 exceeds buffer size 951
[TensorRT-LLM][WARNING] Step function failed, continuing.
[TensorRT-LLM][ERROR] Encountered an error in forward function: slice 1192 exceeds buffer size 951
[TensorRT-LLM][WARNING] Step function failed, continuing.
[TensorRT-LLM][ERROR] Encountered an error in forward function: slice 1253 exceeds buffer size 1012
[TensorRT-LLM][ERROR] Encountered an error in forward function: slice 1253 exceeds buffer size 1012
[TensorRT-LLM][WARNING] Step function failed, continuing.
[TensorRT-LLM][WARNING] Step function failed, continuing.
[BENCHMARK] num_samples 200
[BENCHMARK] total_latency(ms) 71149.43
[BENCHMARK] seq_throughput(seq/sec) 2.81
[BENCHMARK] token_throughput(token/sec) 531.37
[BENCHMARK] avg_sequence_latency(ms) 22587.76
[BENCHMARK] p99_sequence_latency(ms) 50983.86
[BENCHMARK] p90_sequence_latency(ms) 45602.29
[BENCHMARK] p50_sequence_latency(ms) 14514.95
[TensorRT-LLM][INFO] Terminate signal received, worker thread exiting.
[TensorRT-LLM][INFO] Terminate signal received, worker thread exiting.

additional notes

no

The text was updated successfully, but these errors were encountered:

Tushar-ml · 2024-05-02T05:01:15Z

I am getting the same issue when trying speculative decoding (medusa) with vicuna, after some inference, it is getting buffer size exceeds 2560

skyCreateXian · 2024-05-06T02:05:32Z

Encountered an issue while using speculative decoding: '[TensorRT LM] [ERROR] Encountered an error in forward function: slice 501760 excesses buffer size 250880', 0.9.0 dev20240222000 is normal

pcastonguay · 2024-05-09T14:20:45Z

Hi, thanks for reporting this issue. I haven't been able to reproduce on latest main on 2xA100. What --max_batch_size value did you use (it's not specified in the build cmd you shared)? Thanks.

pcastonguay · 2024-05-09T14:50:04Z

I also just tested on 2xA30 and cannot reproduce using latest main following the instructions shared above.

mpirun -n 2 --allow-run-as-root ./gptManagerBenchmark --engine_dir ../../../examples/llama/tmp/llama/13B/trt_engines/fp16/2-gpu/ --dataset ../../../benchmarks/cpp/token-norm-dist.json --kv_cache_free_gpu_mem_fraction 0.85 --enable_kv_cache_reuse
[BENCHMARK] num_samples 100
[BENCHMARK] num_error_samples 0

[BENCHMARK] num_samples 100
[BENCHMARK] total_latency(ms) 1506.20
[BENCHMARK] seq_throughput(seq/sec) 66.39
[BENCHMARK] token_throughput(token/sec) 995.88

[BENCHMARK] avg_sequence_latency(ms) 1116.72
[BENCHMARK] max_sequence_latency(ms) 1501.60
[BENCHMARK] min_sequence_latency(ms) 872.77
[BENCHMARK] p99_sequence_latency(ms) 1501.60
[BENCHMARK] p90_sequence_latency(ms) 1501.58
[BENCHMARK] p50_sequence_latency(ms) 900.98

sleepwalker2017 · 2024-05-10T03:05:16Z

mpirun -n 2 --allow-run-as-root ./gptManagerBenchmark --engine_dir ../../../examples/llama/tmp/llama/13B/trt_engines/fp16/2-gpu/ --dataset ../../../benchmarks/cpp/token-norm-dist.json --kv_cache_free_gpu_mem_fraction 0.85 --enable_kv_cache_reuse -enable_chunked_context

mpirun -n 2 --allow-run-as-root ./gptManagerBenchmark --engine_dir ../../../examples/llama/tmp/llama/13B/trt_engines/fp16/2-gpu/ --dataset ../../../benchmarks/cpp/token-norm-dist.json --kv_cache_free_gpu_mem_fraction 0.85 --enable_kv_cache_reuse -enable_chunked_context

hi, this issue is reproduced by using --enable_kv_cache_reuse and -enable_chunked_context together.

I built it using max_batch = 24.

sleepwalker2017 added the bug Something isn't working label Apr 22, 2024

byshiue assigned QiJune Apr 24, 2024

kaiyux mentioned this issue Oct 29, 2024

Update TensorRT-LLM #2389

Merged

hello-11 closed this as completed Nov 14, 2024

Shixiaowei02 mentioned this issue Dec 4, 2024

TensorRT-LLM Release 0.15.0 #2529

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encountered an error in forward function: slice 712 exceeds buffer size 471 #1480

Encountered an error in forward function: slice 712 exceeds buffer size 471 #1480

sleepwalker2017 commented Apr 22, 2024

Tushar-ml commented May 2, 2024

skyCreateXian commented May 6, 2024 •

edited

Loading

pcastonguay commented May 9, 2024

pcastonguay commented May 9, 2024

sleepwalker2017 commented May 10, 2024

Encountered an error in forward function: slice 712 exceeds buffer size 471 #1480

Encountered an error in forward function: slice 712 exceeds buffer size 471 #1480

Comments

sleepwalker2017 commented Apr 22, 2024

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Tushar-ml commented May 2, 2024

skyCreateXian commented May 6, 2024 • edited Loading

pcastonguay commented May 9, 2024

pcastonguay commented May 9, 2024

sleepwalker2017 commented May 10, 2024

skyCreateXian commented May 6, 2024 •

edited

Loading