Skip to content

Commit f353de0

Browse files
nv-guomingzdominicshanshan
authored andcommitted
[None][doc] Use hash id for external link (NVIDIA#7641)
Signed-off-by: nv-guomingz <[email protected]> Signed-off-by: Wangshanshan <[email protected]>
1 parent c2aad48 commit f353de0

File tree

4 files changed

+10
-10
lines changed

4 files changed

+10
-10
lines changed

docs/source/conf.py

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,10 +16,9 @@
1616

1717
sys.path.insert(0, os.path.abspath('.'))
1818

19-
project = 'TensorRT-LLM'
19+
project = 'TensorRT LLM'
2020
copyright = '2025, NVidia'
2121
author = 'NVidia'
22-
branch_name = pygit2.Repository('.').head.shorthand
2322
html_show_sphinx = False
2423

2524
# Get the git commit hash
@@ -78,7 +77,7 @@
7877
"https":
7978
None,
8079
"source":
81-
"https://github.com/NVIDIA/TensorRT-LLM/tree/" + branch_name + "/{{path}}",
80+
"https://github.com/NVIDIA/TensorRT-LLM/tree/" + commit_hash + "/{{path}}",
8281
}
8382

8483
myst_heading_anchors = 4

docs/source/features/paged-attention-ifb-scheduler.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -135,7 +135,7 @@ Overall, the max batch size and max num tokens limits play a key role in determi
135135
136136
## Revisiting Paged Context Attention and Context Chunking
137137

138-
[Previously](./useful-build-time-flags.md#paged-context-attention) we recommended enabling paged context attention even though in our case study it didn't affect performance significantly. Now that we understand the TensorRT LLM scheduler, we can explain why this is beneficial. In short, we recommend enabling it because it enables context chunking, which allows the context phase of a request to be broken up into pieces and processed over several execution iterations, allowing the engine to provide a more stable balance of context and generation phase execution.
138+
Previously we recommended enabling paged context attention even though in our case study it didn't affect performance significantly. Now that we understand the TensorRT LLM scheduler, we can explain why this is beneficial. In short, we recommend enabling it because it enables context chunking, which allows the context phase of a request to be broken up into pieces and processed over several execution iterations, allowing the engine to provide a more stable balance of context and generation phase execution.
139139

140140
The [visualization](#the-schedulers) of the TensorRT LLM scheduler showed that initially Request 3 couldn't be scheduled because it would put the scheduler over the max-num tokens limit. However, with context chunking, this is no longer the case, and the first chunk of Request 3 can be scheduled.
141141

docs/source/features/sampling.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ The PyTorch backend supports most of the sampling features that are supported on
66
To use the feature:
77

88
1. Enable the `enable_trtllm_sampler` option in the `LLM` class
9-
2. Pass a [`SamplingParams`](../../../../tensorrt_llm/sampling_params.py#L125) object with the desired options to the `generate()` function
9+
2. Pass a [`SamplingParams`](source:tensorrt_llm/sampling_params.py#L125) object with the desired options to the `generate()` function
1010

1111
The following example prepares two identical prompts which will give different results due to the sampling parameters chosen:
1212

@@ -74,7 +74,7 @@ The PyTorch backend supports guided decoding with the XGrammar and Low-level Gui
7474
To enable guided decoding, you must:
7575

7676
1. Set the `guided_decoding_backend` parameter to `'xgrammar'` or `'llguidance'` in the `LLM` class
77-
2. Create a [`GuidedDecodingParams`](../../../../tensorrt_llm/sampling_params.py#L14) object with the desired format specification
77+
2. Create a [`GuidedDecodingParams`](source:tensorrt_llm/sampling_params.py#L14) object with the desired format specification
7878
* Note: Depending on the type of format, a different parameter needs to be chosen to construct the object (`json`, `regex`, `grammar`, `structural_tag`).
7979
3. Pass the `GuidedDecodingParams` object to the `guided_decoding` parameter of the `SamplingParams` object
8080

@@ -94,15 +94,15 @@ sampling_params = SamplingParams(
9494
llm.generate("Generate a JSON response", sampling_params)
9595
```
9696

97-
You can find a more detailed example on guided decoding [here](../../../../examples/llm-api/llm_guided_decoding.py).
97+
You can find a more detailed example on guided decoding [here](source:examples/llm-api/llm_guided_decoding.py).
9898

9999
## Logits processor
100100

101101
Logits processors allow you to modify the logits produced by the network before sampling, enabling custom generation behavior and constraints.
102102

103103
To use a custom logits processor:
104104

105-
1. Create a custom class that inherits from [`LogitsProcessor`](../../../../tensorrt_llm/sampling_params.py#L48) and implements the `__call__` method
105+
1. Create a custom class that inherits from [`LogitsProcessor`](source:tensorrt_llm/sampling_params.py#L48) and implements the `__call__` method
106106
2. Pass an instance of this class to the `logits_processor` parameter of `SamplingParams`
107107

108108
The following example demonstrates logits processing:
@@ -132,4 +132,4 @@ sampling_params = SamplingParams(
132132
llm.generate(["Hello, my name is"], sampling_params)
133133
```
134134

135-
You can find a more detailed example on logits processors [here](../../../../examples/llm-api/llm_logits_processor.py).
135+
You can find a more detailed example on logits processors [here](source:examples/llm-api/llm_logits_processor.py).

docs/source/installation/build-from-source-linux.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -185,7 +185,7 @@ example:
185185
python3 ./scripts/build_wheel.py --cuda_architectures "80-real;86-real"
186186
```
187187

188-
To use the C++ benchmark scripts under [benchmark/cpp](/benchmarks/cpp/), for example `gptManagerBenchmark.cpp`, add the `--benchmarks` option:
188+
To use the C++ benchmark scripts under [benchmark/cpp](source:benchmarks/cpp/), for example `gptManagerBenchmark.cpp`, add the `--benchmarks` option:
189189

190190
```bash
191191
python3 ./scripts/build_wheel.py --benchmarks
@@ -207,6 +207,7 @@ relevant classes. The associated unit tests should also be consulted for underst
207207

208208
This feature will not be enabled when [`building only the C++ runtime`](#link-with-the-tensorrt-llm-c++-runtime).
209209

210+
(link-with-the-tensorrt-llm-c++-runtime)=
210211
#### Linking with the TensorRT LLM C++ Runtime
211212

212213
The `build_wheel.py` script will also compile the library containing the C++ runtime of TensorRT LLM. If Python support and `torch` modules are not required, the script provides the option `--cpp_only` which restricts the build to the C++ runtime only.

0 commit comments

Comments
 (0)