Skip to content

Conversation

@amitz-nv
Copy link
Owner

@amitz-nv amitz-nv commented Jul 17, 2025

Description

  1. Adds support for configuring LoRA cache sizes on pytorch flow.
  2. Changes LoraConfig.max_loras and LoraConfig.max_cpu_loras to be optional. When they're not set, the cache size would be determined by the PeftCacheConfig, whose existing default values are LoRA GPU cache size is 2% of its free memory, LoRA CPU cache size is 1GiB.
  3. Removes deprecated LoRA LLM args, as they're already specified inside lora_config: LoraConfig LLM arg: max_lora_rank, max_loras, max_cpu_loras.
  4. Added tests that verify the LoRA cache size LLM args take effect in the expect order.

Test Coverage

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]

Launch build/test pipelines. All previously running jobs will be killed.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-[Post-Merge]-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

EmmaQiaoCh and others added 30 commits July 17, 2025 16:53
yechank-nvidia and others added 29 commits July 30, 2025 08:52
… and allowing lora_config.max_loras and lora_config.max_cpu_loras to override it, changed their default value to None

Signed-off-by: Amit Zuker <[email protected]>
… use of LoraConfig and PeftCacheConfig in LLM args

Signed-off-by: Amit Zuker <[email protected]>
Signed-off-by: Amit Zuker <[email protected]>
Signed-off-by: Amit Zuker <[email protected]>
…_percent and host_cache_size, improve device_cache_percent description

Signed-off-by: Amit Zuker <[email protected]>
…o be non-optional with default values

Signed-off-by: Amit Zuker <[email protected]>
…rrelevant to pytorch backend

Signed-off-by: Amit Zuker <[email protected]>
… PybindMirror, updated its PeftCacheConfig tests accordingly, removed default values from description, raise exception when unused peft_cache_config.lora_prefetch_dir was set instead of writing a warning log message

Signed-off-by: Amit Zuker <[email protected]>
Signed-off-by: Amit Zuker <[email protected]>
Signed-off-by: Amit Zuker <[email protected]>
…che sizes, fix incorrect lora request creation

Signed-off-by: Amit Zuker <[email protected]>
@amitz-nv amitz-nv force-pushed the dev-improve-pytorch-lora-cache-memory-control branch from 50e940e to bce06ad Compare July 30, 2025 06:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.