[5830][feat] Improve LoRA cache memory control #1

amitz-nv · 2025-07-17T15:37:00Z

Description

Adds support for configuring LoRA cache sizes on pytorch flow.
Changes LoraConfig.max_loras and LoraConfig.max_cpu_loras to be optional. When they're not set, the cache size would be determined by the PeftCacheConfig, whose existing default values are LoRA GPU cache size is 2% of its free memory, LoRA CPU cache size is 1GiB.
Removes deprecated LoRA LLM args, as they're already specified inside lora_config: LoraConfig LLM arg: max_lora_rank, max_loras, max_cpu_loras.
Added tests that verify the LoRA cache size LLM args take effect in the expect order.

Test Coverage

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]

Launch build/test pipelines. All previously running jobs will be killed.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-[Post-Merge]-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

Signed-off-by: qqiao <[email protected]>

…g service (NVIDIA#5234) Signed-off-by: Chuang Zhu <[email protected]>

…IDIA#6000) Signed-off-by: Enwei Zhu <[email protected]>

Signed-off-by: Erin Ho <[email protected]>

…#6101) Signed-off-by: nv-guomingz <[email protected]>

Signed-off-by: Yi Zhang <[email protected]>

…IA#6115) Signed-off-by: Stanley Sun <[email protected]>

…after (NVIDIA#6007) Signed-off-by: ziyixiong-nv <[email protected]>

Signed-off-by: Linda-Stadter <[email protected]>

Signed-off-by: ixlmar <[email protected]>

Signed-off-by: Iman Tabrizian <[email protected]>

Signed-off-by: William Zhang <[email protected]>

Signed-off-by: Frank Di Natale <[email protected]>

NVIDIA#6134) Signed-off-by: qixiang-99 <[email protected]>

…VIDIA#6080) Signed-off-by: Daniel Stokes <[email protected]>

Signed-off-by: Iman Tabrizian <[email protected]>

Signed-off-by: Yifei Zhang <[email protected]>

Signed-off-by: Xavier Simmons <[email protected]>

…in test_e2e (NVIDIA#6140) Signed-off-by: Zhenhuan Chen <[email protected]>

…6106) Signed-off-by: Aurelien Chartier <[email protected]> Co-authored-by: Haohang Huang <[email protected]>

Signed-off-by: Chuang Zhu <[email protected]>

Signed-off-by: Yiqing Yan <[email protected]>

Signed-off-by: tensorrt-cicd <[email protected]> Co-authored-by: tensorrt-cicd <[email protected]>

Signed-off-by: junq <[email protected]>

Signed-off-by: qqiao <[email protected]>

… batch manager (NVIDIA#6055) Signed-off-by: Robin Kobus <[email protected]>

Signed-off-by: leslie-fang25 <[email protected]>

Signed-off-by: Erin Ho <[email protected]>

Signed-off-by: ZhanruiSunCh <[email protected]>

Signed-off-by: Stefan Niebler <[email protected]>

Signed-off-by: yechank <[email protected]>

… tests (NVIDIA#6463) Signed-off-by: Venky Ganesh <[email protected]> Co-authored-by: Larry <[email protected]>

Signed-off-by: peaceh <[email protected]>

Signed-off-by: xinhe-nv <[email protected]>

… in kv cache measure (NVIDIA#6135) Signed-off-by: zhengd-nv <[email protected]>

Signed-off-by: Jinyang Yuan <[email protected]>

) Signed-off-by: William Zhang <[email protected]>

Signed-off-by: junq <[email protected]>

… and allowing lora_config.max_loras and lora_config.max_cpu_loras to override it, changed their default value to None Signed-off-by: Amit Zuker <[email protected]>

… use of LoraConfig and PeftCacheConfig in LLM args Signed-off-by: Amit Zuker <[email protected]>

Signed-off-by: Amit Zuker <[email protected]>

…clarify test Signed-off-by: Amit Zuker <[email protected]>

… stability Signed-off-by: Amit Zuker <[email protected]>

… LoRA args Signed-off-by: Amit Zuker <[email protected]>

…_loras for stability Signed-off-by: Amit Zuker <[email protected]>

…_percent and host_cache_size, improve device_cache_percent description Signed-off-by: Amit Zuker <[email protected]>

Signed-off-by: Amit Zuker <[email protected]>

…o be non-optional with default values Signed-off-by: Amit Zuker <[email protected]>

Signed-off-by: Amit Zuker <[email protected]>

…rrelevant to pytorch backend Signed-off-by: Amit Zuker <[email protected]>

… PybindMirror, updated its PeftCacheConfig tests accordingly, removed default values from description, raise exception when unused peft_cache_config.lora_prefetch_dir was set instead of writing a warning log message Signed-off-by: Amit Zuker <[email protected]>

Signed-off-by: Amit Zuker <[email protected]>

…che sizes, fix incorrect lora request creation Signed-off-by: Amit Zuker <[email protected]>

EmmaQiaoCh and others added 30 commits July 17, 2025 16:53

[Infra] - Add wiave list for pytest when using slurm (NVIDIA#6130)

1cc4949

Signed-off-by: qqiao <[email protected]>

chore:[BREAKING CHANGE] use cacheTransceiverConfig as knobs for disag…

44c70c8

…g service (NVIDIA#5234) Signed-off-by: Chuang Zhu <[email protected]>

[TRTLLM-6406] feat: Enable guided decoding with overlap scheduler (NV…

21efb50

…IDIA#6000) Signed-off-by: Enwei Zhu <[email protected]>

chores: unwaive a few tests for v1.0 (NVIDIA#6107)

de60ae4

Signed-off-by: Erin Ho <[email protected]>

test: update max_beam_width to 1 due to torchsampler changes. (NVIDIA…

9b45499

…#6101) Signed-off-by: nv-guomingz <[email protected]>

fix: Fix DeepSeek R1 CI (NVIDIA#6129)

a718486

Signed-off-by: Yi Zhang <[email protected]>

test: fix PytestUnknownMarkWarning: Unknown pytest.mark.timeout (NVID…

9518e14

…IA#6115) Signed-off-by: Stanley Sun <[email protected]>

[TRTLLM-6352][feat] Migrate EAGLE3 and draft/target speculation to Dr…

58d22a7

…after (NVIDIA#6007) Signed-off-by: ziyixiong-nv <[email protected]>

feat: nanobind bindings (NVIDIA#5961)

5bff317

Signed-off-by: Linda-Stadter <[email protected]>

[fix] Update jenkins container images (NVIDIA#6094)

d71c6fe

Signed-off-by: ixlmar <[email protected]>

[fix] Remove duplicated KVCache transmission check (NVIDIA#6022)

10dbf4f

Signed-off-by: Iman Tabrizian <[email protected]>

[fix] Fix Mistral3VLM weight-loading & enable in pre-merge (NVIDIA#6105)

8480c12

Signed-off-by: William Zhang <[email protected]>

[fix] Fixes KV Cache overrides in trtllm-bench (NVIDIA#6103)

161490f

Signed-off-by: Frank Di Natale <[email protected]>

Refactor KVCacheManager: Simplify token availability calculation and … (

2c90203

NVIDIA#6134) Signed-off-by: qixiang-99 <[email protected]>

feat: Add support for benchmarking individual gemms in MOE benchmark (N…

ae28b3a

…VIDIA#6080) Signed-off-by: Daniel Stokes <[email protected]>

Revert "feat: nanobind bindings (NVIDIA#5961)" (NVIDIA#6160)

b75e53a

Signed-off-by: Iman Tabrizian <[email protected]>

[TRTLLM-6368] Update deepep dispatch API (NVIDIA#6037)

0155e7a

Signed-off-by: Yifei Zhang <[email protected]>

fix TMA error with GEMM+AR on TP=2 (NVIDIA#6075)

200ea9e

Signed-off-by: Xavier Simmons <[email protected]>

[https://nvbugs/5387375] fix(scaffolding): fix scaffolding aime test …

992b273

…in test_e2e (NVIDIA#6140) Signed-off-by: Zhenhuan Chen <[email protected]>

feat: add support for Modelopt fp8_pb_wo quantization scheme (NVIDIA#…

812243b

…6106) Signed-off-by: Aurelien Chartier <[email protected]> Co-authored-by: Haohang Huang <[email protected]>

fix single_disagg_test (NVIDIA#6166)

c0e4165

Signed-off-by: Chuang Zhu <[email protected]>

[TRTLLM-5179] - Update bot help messages (NVIDIA#5277)

f321692

Signed-off-by: Yiqing Yan <[email protected]>

[None][infra] Update the allow list of CI trigger (NVIDIA#6168)

519a211

Signed-off-by: tensorrt-cicd <[email protected]> Co-authored-by: tensorrt-cicd <[email protected]>

chore: add more log in FmhaDispatcher (NVIDIA#6170)

a95f31e

Signed-off-by: junq <[email protected]>

[Infra] - Waive failed tests in post-merge (NVIDIA#6176)

77acb4f

Signed-off-by: qqiao <[email protected]>

refactor: Enhanced handling of decoder requests and logits within the…

ec2b953

… batch manager (NVIDIA#6055) Signed-off-by: Robin Kobus <[email protected]>

update broken link of PyTorchModelEngine in arch_overview (NVIDIA#6171)

44040ed

Signed-off-by: leslie-fang25 <[email protected]>

fix: NVBug 5385576 py_batch_idx issue (NVIDIA#6153)

9522cde

Signed-off-by: Erin Ho <[email protected]>

infra: fix single-GPU stage failed will not raise error (NVIDIA#6165)

8454640

Signed-off-by: ZhanruiSunCh <[email protected]>

[ci] Speedup beam search unit tests with fixtures for LLM (NVIDIA#5843)

fd6ce7f

Signed-off-by: Stefan Niebler <[email protected]>

yechank-nvidia and others added 29 commits July 30, 2025 08:52

fix: support mixture of text & multimodal prompts (NVIDIA#6345)

d6eb8e2

Signed-off-by: yechank <[email protected]>

[fix] Move kv_cache_free_gpu_mem_fraction arg to benchmark command in…

ab40369

… tests (NVIDIA#6463) Signed-off-by: Venky Ganesh <[email protected]> Co-authored-by: Larry <[email protected]>

Rename layer to comply with deepseek (NVIDIA#6393)

5b420ad

Signed-off-by: peaceh <[email protected]>

test: [CI] Add failed cases into waives.txt (NVIDIA#6457)

c00d676

Signed-off-by: xinhe-nv <[email protected]>

[TRTLLM-6549] chore: record delay introduced by disaggregated serving…

c9ed1ab

… in kv cache measure (NVIDIA#6135) Signed-off-by: zhengd-nv <[email protected]>

[fix] Fix wide EP when using DeepEP with online EPLB (NVIDIA#6429)

a427f5b

Signed-off-by: Jinyang Yuan <[email protected]>

[fix] Switch placement of image placeholder for mistral 3.1 (NVIDIA#6435

d6eed1b

) Signed-off-by: William Zhang <[email protected]>

chore: clean code of PyExecutor (NVIDIA#6445)

1f39a11

Signed-off-by: junq <[email protected]>

Remove deprecated lora args from BaseLlmArgs, using peft_cache_config…

9171f88

… and allowing lora_config.max_loras and lora_config.max_cpu_loras to override it, changed their default value to None Signed-off-by: Amit Zuker <[email protected]>

Enabled use of LoraConfig in TRT_python flow, added tests of expected…

07cde29

… use of LoraConfig and PeftCacheConfig in LLM args Signed-off-by: Amit Zuker <[email protected]>

Improve comments in tests

eabe716

Signed-off-by: Amit Zuker <[email protected]>

Correct mistake in PeftCacheConfig.num_device_module_layer description

d1a896f

Signed-off-by: Amit Zuker <[email protected]>

Add validation of unsupported field in peft cache manager

e90872a

Signed-off-by: Amit Zuker <[email protected]>

Fix docstring line length

7e4e37c

Signed-off-by: Amit Zuker <[email protected]>

Fix validate_peft_cache_config

004eaf9

Signed-off-by: Amit Zuker <[email protected]>

Fix validate_peft_cache_config formatting

1afafa7

Signed-off-by: Amit Zuker <[email protected]>

Fix lora_prefetch_dir description and 'unsupported warning' message, …

c486af2

…clarify test Signed-off-by: Amit Zuker <[email protected]>

Fix tests to configure lora cache size by number of adapters for test…

138c4b1

… stability Signed-off-by: Amit Zuker <[email protected]>

Fix tests to API update - use LoraConfig instead of base LLM args for…

e26ca0a

… LoRA args Signed-off-by: Amit Zuker <[email protected]>

Fix tests to explicitly configure lora_config's max_loras and max_cpu…

ef99dd2

…_loras for stability Signed-off-by: Amit Zuker <[email protected]>

Define default values in PeftCacheConfig model class for device_cache…

797715e

…_percent and host_cache_size, improve device_cache_percent description Signed-off-by: Amit Zuker <[email protected]>

Add default value to description

53b4233

Signed-off-by: Amit Zuker <[email protected]>

Fix PeftCacheConfig.create_from_pybind after changing python fields t…

0d51a80

…o be non-optional with default values Signed-off-by: Amit Zuker <[email protected]>

Fix examples/llm-api/llm_multilora.py - use one LoraConfig

e0fcbeb

Signed-off-by: Amit Zuker <[email protected]>

Fix examples/llm-api/llm_multilora.py to not use BuildConfig that's i…

61a994b

…rrelevant to pytorch backend Signed-off-by: Amit Zuker <[email protected]>

Minor docstring fix

8cca194

Signed-off-by: Amit Zuker <[email protected]>

Fix rename

391d0f9

Signed-off-by: Amit Zuker <[email protected]>

Fix test_ptp_quickstart_multimodal_phi4mm - for stability set lora ca…

bce06ad

…che sizes, fix incorrect lora request creation Signed-off-by: Amit Zuker <[email protected]>

amitz-nv force-pushed the dev-improve-pytorch-lora-cache-memory-control branch from 50e940e to bce06ad Compare July 30, 2025 06:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[5830][feat] Improve LoRA cache memory control #1

[5830][feat] Improve LoRA cache memory control #1

Uh oh!

amitz-nv commented Jul 17, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

80 participants

[5830][feat] Improve LoRA cache memory control #1

Are you sure you want to change the base?

[5830][feat] Improve LoRA cache memory control #1

Uh oh!

Conversation

amitz-nv commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test Coverage

GitHub Bot Help

kill

skip

reuse-pipeline

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

80 participants

amitz-nv commented Jul 17, 2025 •

edited

Loading