Skip to content

Commit

Permalink
Update TensorRT-LLM (#1530)
Browse files Browse the repository at this point in the history
  • Loading branch information
kaiyux authored Apr 30, 2024
1 parent 66ef1df commit 06c0e9b
Show file tree
Hide file tree
Showing 200 changed files with 8,814 additions and 2,695 deletions.
25 changes: 25 additions & 0 deletions .github/workflows/auto_close_inactive_issues.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Ref: https://docs.github.com/en/actions/managing-issues-and-pull-requests/closing-inactive-issues
name: Close inactive issues
on:
schedule:
- cron: "30 1 * * *"

jobs:
stale:
runs-on: ubuntu-latest
permissions:
issues: write
pull-requests: write
steps:
- uses: actions/stale@v9
with:
days-before-issue-stale: 30
days-before-issue-close: 15
stale-issue-label: "stale"
exempt-issue-labels: ""
stale-issue-message: This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."
close-issue-message: "This issue was closed because it has been stalled for 15 days with no activity."
days-before-pr-stale: -1
days-before-pr-close: -1
repo-token: ${{ secrets.GITHUB_TOKEN }}
debug-only: true
8 changes: 3 additions & 5 deletions benchmarks/cpp/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ If you want to get the logits, you could run gptSessionBenchmark with `--print_a

#### Prepare dataset

Run a preprocessing script to prepare/generate dataset into a json that gptManagerBenchmark can consume later. The processed output json has *input token ids, output tokens length and time delays* to control request rate by gptManagerBenchmark.
Run a preprocessing script to prepare/generate dataset into a json that gptManagerBenchmark can consume later. The processed output json has *input tokens length, input token ids and output tokens length*

This tool can be used in 2 different modes of traffic generation.

Expand All @@ -79,8 +79,6 @@ The tool will tokenize the words and instruct the model to generate a specified
python3 prepare_dataset.py \
--tokenizer <path/to/tokenizer> \
--output preprocessed_dataset.json
[--request-rate 10] \
[--time-delay-dist exponential_dist] \
dataset
--dataset-name <name of the dataset> \
--dataset-split <split of the dataset to use> \
Expand Down Expand Up @@ -118,8 +116,6 @@ For example, setting mean=100 and std dev=10 would generate requests where 95.4%
```
python prepare_dataset.py \
--output token-norm-dist.json \
--request-rate 10 \
--time-delay-dist constant \
--tokenizer <path/to/tokenizer> \
token-norm-dist \
--num-requests 100 \
Expand Down Expand Up @@ -148,6 +144,7 @@ Take GPT-350M as an example for single GPU V1 batching
./benchmarks/gptManagerBenchmark \
--engine_dir ../../examples/gpt/trt_engine/gpt2/fp16/1-gpu/ \
--type V1 \
--request_rate 10 \
--dataset ../../benchmarks/cpp/preprocessed_dataset.json
--max_num_samples 500
```
Expand All @@ -157,6 +154,7 @@ Take GPT-350M as an example for 2-GPU inflight batching
mpirun -n 2 ./benchmarks/gptManagerBenchmark \
--engine_dir ../../examples/gpt/trt_engine/gpt2-ib/fp16/2-gpu/ \
--type IFB \
--request_rate 10 \
--dataset ../../benchmarks/cpp/preprocessed_dataset.json
--max_num_samples 500
```
Expand Down
Loading

0 comments on commit 06c0e9b

Please sign in to comment.