[CI] Add persistent cache mounts and fix test download paths by AndreasKaratzas · Pull Request #36951 · vllm-project/vllm

AndreasKaratzas · 2026-03-13T04:56:43Z

Adds persistent cache volume mounts for CI test containers: MODELSCOPE_CACHE, VLLM_TEST_CACHE, VLLM_CACHE_ROOT, and VLLM_MEDIA_CACHE.
Routes all test data downloads (dummy models, GSM8K datasets, tiktoken data, Prithvi GeoTIFFs) through VLLM_TEST_CACHE instead of /tmp or scattered locations.
Fixes hardcoded HF cache path in test_extraction.py to use HF_HOME env var.
Fixes snapshot_download bypass in test_token_in_token_out.py by removing explicit cache_dir that skipped HF cache.

Changes

File	Change
`.buildkite/scripts/hardware_ci/run-amd-test.sh`	Add cache volume mounts + env vars for CI containers
`tests/conftest.py`	Route dummy model creation through `VLLM_TEST_CACHE`
`tests/evals/gsm8k/gsm8k_eval.py`	Cache GSM8K downloads to `VLLM_TEST_CACHE/gsm8k/`
`tests/evals/gpt_oss/test_gpqa_correctness.py`	Cache tiktoken data to `VLLM_TEST_CACHE/tiktoken/`
`tests/plugins/.../prithvi_processor.py`	Cache URL-fetched GeoTIFFs to `VLLM_TEST_CACHE/prithvi/`
`tests/entrypoints/openai/test_token_in_token_out.py`	Use default HF cache instead of `/tmp`
`tests/v1/kv_connector/.../test_extraction.py`	Use `HF_HOME` env var instead of hardcoded path

Note: The VLLM_MEDIA_CACHE feature code (env registration + MediaConnector caching) is in a separate PR: #37123

cc @kenroche

…URLs Signed-off-by: Andreas Karatzas <akaratza@amd.com>

gemini-code-assist

Code Review

This pull request introduces persistent caching for various assets in CI to improve performance, which is a great enhancement. The changes correctly utilize environment variables to configure cache paths. My review focuses on potential race conditions in the new caching logic. I've identified two instances where concurrent writes to the cache could lead to corrupted files, one in production code which is critical, and a similar one in test code. I've provided suggestions to make the file writing atomic and prevent these race conditions.

gemini-code-assist · 2026-03-13T04:59:36Z

vllm/multimodal/media/connector.py

+    def _put_cached_bytes(self, url: str, data: bytes) -> None:
+        """Store downloaded bytes in the cache."""
+        if not self._media_cache_dir:
+            return
+        cache_path = self._media_cache_path(url)
+        cache_path.write_bytes(data)


The current implementation of _put_cached_bytes has a race condition. If multiple processes or threads attempt to download and cache the same URL concurrently, they could write to the same cache file simultaneously, leading to a corrupted file. This can cause subsequent requests to fail when reading the corrupted cache entry.

To fix this, you should write the downloaded data to a temporary file within the cache directory and then perform an atomic rename to the final cache path. This ensures that readers will only ever see a complete file.

Note: The suggested code requires importing the tempfile module at the top of the file.

def _put_cached_bytes(self, url: str, data: bytes) -> None: """Store downloaded bytes in the cache.""" if not self._media_cache_dir: return cache_path = self._media_cache_path(url) # To prevent race conditions, write to a temporary file and then atomically rename. with tempfile.NamedTemporaryFile(mode="wb", dir=self._media_cache_dir, delete=False) as tmp_file: tmp_file.write(data) tmp_path = tmp_file.name try: os.rename(tmp_path, cache_path) except OSError: # Another process might have already written the file. os.remove(tmp_path)

gemini-code-assist · 2026-03-13T04:59:36Z

tests/plugins/prithvi_io_processor_plugin/prithvi_io_processor/prithvi_processor.py

+                resp = urllib.request.urlopen(file_path)
+                with open(cached_path, "wb") as f:
+                    f.write(resp.read())
+                path = cached_path


There's a potential race condition here. If multiple tests running in parallel attempt to download and cache the same URL, they could write to cached_path simultaneously, resulting in a corrupted file. To ensure atomicity, it's safer to write the downloaded content to a temporary file and then atomically rename it to the final destination.

Suggested change

resp = urllib.request.urlopen(file_path)

with open(cached_path, "wb") as f:

f.write(resp.read())

path = cached_path

resp = urllib.request.urlopen(file_path)

# To prevent race conditions, write to a temporary file and then atomically rename.

with tempfile.NamedTemporaryFile(mode="wb", dir=cache_dir, delete=False) as tmp_file:

tmp_file.write(resp.read())

tmp_path = tmp_file.name

try:

os.rename(tmp_path, cached_path)

except OSError:

# Another process might have already written the file.

os.remove(tmp_path)

path = cached_path

…URLs Signed-off-by: Andreas Karatzas <akaratza@amd.com>

DarkLight1337

I think we should avoid modifying the main vLLM code with media cache. We already have fixtures such as image_urls which pre-download media files

AndreasKaratzas · 2026-03-13T17:08:35Z

@DarkLight1337 You're right that image_urls / LocalAssetServer already handles a lot of the test media through pre-downloaded assets. That said, I'd like to keep the VLLM_MEDIA_CACHE piece as a separate discussion because I think it still has value. Right now if a user sends the same image URL in 10 different requests, MediaConnector downloads it 10 times. URL-level caching is a pretty natural optimization, it's opt-in (disabled by default, zero behavior change), and the implementation is minimal (~20 lines). It mirrors what get_vllm_public_assets already does for S3 assets, just generalized to arbitrary URLs.

In other words, this feature can only be useful if you define the env var before tests.

DarkLight1337 · 2026-03-14T04:18:59Z

I see your point. But it should be done as a separate RFC / feature request

AndreasKaratzas · 2026-03-14T07:34:16Z

I see your point. But it should be done as a separate RFC / feature request

@DarkLight1337 I can make it ROCm specific, I just thought that upstream would also benefit from this, essentially this is a completely optional data path, that is only set if you set that env car, otherwise the execution is as is right now. And it really helps if there are network issues on a machine, because everything is under a specific cache path that can be stored in NFS. If there is any other recommendation towards that path let me know. I'm certainly open to refactoring this.

DarkLight1337 · 2026-03-14T14:32:32Z

I think this requires a broader discussion so please open a RFC for media download cache specifically, and I'll tag relevant people.

AndreasKaratzas · 2026-03-14T21:41:56Z

I think this requires a broader discussion so please open a RFC for media download cache specifically, and I'll tag relevant people.

Certainly :) #37075

…cache

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

mergify · 2026-03-17T03:44:42Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @AndreasKaratzas.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

[CI] Add persistent cache mounts for all CI test downloads and media …

f5d8dd1

…URLs Signed-off-by: Andreas Karatzas <akaratza@amd.com>

AndreasKaratzas requested review from ApostaC, DarkLight1337, NickLucche, aarnphm, mgoin, orozery, robertgshaw2-redhat, tjtanaa and ywang96 as code owners March 13, 2026 04:56

mergify bot added ci/build multi-modality Related to multi-modality (#4194) gpt-oss Related to GPT-OSS models v1 labels Mar 13, 2026

github-project-automation bot added this to gpt-oss Issues & Enhancements Mar 13, 2026

mergify bot added the kv-connector label Mar 13, 2026

github-project-automation bot moved this to To Triage in gpt-oss Issues & Enhancements Mar 13, 2026

gemini-code-assist bot reviewed Mar 13, 2026

View reviewed changes

AndreasKaratzas added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 13, 2026

[CI] Add persistent cache mounts for all CI test downloads and media …

e4d3e6c

…URLs Signed-off-by: Andreas Karatzas <akaratza@amd.com>

DarkLight1337 reviewed Mar 13, 2026

View reviewed changes

AndreasKaratzas mentioned this pull request Mar 14, 2026

[RFC]: Opt-in Media URL Cache for MediaConnector #37075

Open

1 task

AndreasKaratzas added 3 commits March 15, 2026 15:09

Merge remote-tracking branch 'origin/main' into akaratza_revamp_test_…

cdbdba8

…cache

[ROCm][CI] Split out media cache changes into separate PR

a54c8ed

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

[ROCm][CI] Split out media cache changes into separate PR

8e2c715

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

AndreasKaratzas changed the title ~~[CI] Add persistent cache mounts for all CI test downloads and media URLs~~ [CI] Add persistent cache mounts and fix test download paths Mar 15, 2026

AndreasKaratzas mentioned this pull request Mar 15, 2026

[Core][CI] Add opt-in media URL caching via VLLM_MEDIA_CACHE #37123

Open

AndreasKaratzas added the rocm Related to AMD ROCm label Mar 15, 2026

github-project-automation bot added this to AMD Mar 15, 2026

github-project-automation bot moved this to Todo in AMD Mar 15, 2026

AndreasKaratzas mentioned this pull request Mar 15, 2026

[ROCm][CI] Revamping AMD mirrors #35897

Open

mergify bot added the needs-rebase label Mar 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CI] Add persistent cache mounts and fix test download paths#36951

[CI] Add persistent cache mounts and fix test download paths#36951
AndreasKaratzas wants to merge 5 commits intovllm-project:mainfrom
ROCm:akaratza_revamp_test_cache

AndreasKaratzas commented Mar 13, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 13, 2026

Uh oh!

AndreasKaratzas Mar 13, 2026

Uh oh!

gemini-code-assist bot Mar 13, 2026

Uh oh!

AndreasKaratzas Mar 13, 2026

Uh oh!

DarkLight1337 left a comment

Uh oh!

AndreasKaratzas commented Mar 13, 2026 •

edited

Loading

Uh oh!

DarkLight1337 commented Mar 14, 2026

Uh oh!

AndreasKaratzas commented Mar 14, 2026 •

edited

Loading

Uh oh!

DarkLight1337 commented Mar 14, 2026

Uh oh!

AndreasKaratzas commented Mar 14, 2026

Uh oh!

mergify bot commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-                resp = urllib.request.urlopen(file_path)
-                with open(cached_path, "wb") as f:
-                    f.write(resp.read())
-                path = cached_path
+                resp = urllib.request.urlopen(file_path)
+                # To prevent race conditions, write to a temporary file and then atomically rename.
+                with tempfile.NamedTemporaryFile(mode="wb", dir=cache_dir, delete=False) as tmp_file:
+                    tmp_file.write(resp.read())
+                    tmp_path = tmp_file.name
+                try:
+                    os.rename(tmp_path, cached_path)
+                except OSError:
+                    # Another process might have already written the file.
+                    os.remove(tmp_path)
+                path = cached_path

Uh oh!

Conversation

AndreasKaratzas commented Mar 13, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

AndreasKaratzas Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

AndreasKaratzas Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

AndreasKaratzas commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DarkLight1337 commented Mar 14, 2026

Uh oh!

AndreasKaratzas commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DarkLight1337 commented Mar 14, 2026

Uh oh!

AndreasKaratzas commented Mar 14, 2026

Uh oh!

mergify bot commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AndreasKaratzas commented Mar 13, 2026 •

edited by github-actions bot

Loading

AndreasKaratzas commented Mar 13, 2026 •

edited

Loading

AndreasKaratzas commented Mar 14, 2026 •

edited

Loading