Add: File Based Caching for `lm_eval` tests #1900

rahul-tuli · 2025-10-06T12:46:05Z

Implements a file-based caching mechanism to reduce lm-eval test times by avoiding redundant base model evaluations when multiple tests use the same base model
configuration.

Motivation

Our lm-eval tests are run individually as separate processes via run_tests.sh, with each config file spawning a new pytest process. This means:

Each test evaluates the base model independently for recovery-based testing
Tests with the same base model repeat identical evaluations
Total test time scales linearly with number of configs

bash tests/e2e/vLLM/run_tests.sh -c tests/lmeval/configs -t tests/lmeval/test_lmeval.py || SUCCESS=$?

For example, running 8 tests with the same base model (meta-llama/Meta-Llama-3-8B-Instruct) results in 8 redundant base model evaluations, roughly doubling the total
test time.

Solution

Implemented a file-based cache that persists base model evaluation results across separate pytest processes:

Cache key: Uniquely identifies evaluations by (model, task, num_fewshot, limit, batch_size, model_args)
Storage: JSON files in .lmeval_cache/ directory (configurable via LMEVAL_CACHE_DIR)
Persistence: Survives across separate pytest processes run by run_tests.sh
Control: Can be disabled via DISABLE_LMEVAL_CACHE=1 environment variable

Implementation Details

Core components:

LMEvalCacheKey: Frozen dataclass that handles cache key generation, file I/O, and cache lookups
cached_lm_eval_run: Decorator that transparently adds caching to base model evaluation methods
File-based storage: Required because each config runs in a separate process (in-memory cache wouldn't persist)

Design decisions:

File-based over in-memory: Each test config runs as a separate pytest process, so in-memory cache would be cleared between tests
Fail-safe: Cache failures never break tests - errors are logged and execution continues without cache
Clean abstractions: Self-documenting code with minimal comments needed

Performance Impact

Expected speedup: ~30% for test suites with multiple configs sharing the same base model.

Usage

Enable caching (default):

bash tests/e2e/vLLM/run_tests.sh -c tests/lmeval/configs -t tests/lmeval/test_lmeval.py

Disable caching:

  DISABLE_LMEVAL_CACHE=1 bash tests/e2e/vLLM/run_tests.sh -c tests/lmeval/configs -t tests/lmeval/test_lmeval.py

Custom cache location:

  LMEVAL_CACHE_DIR=/tmp/my_cache bash tests/e2e/vLLM/run_tests.sh -c tests/lmeval/configs -t tests/lmeval/test_lmeval.py

Clear cache:

  rm -rf .lmeval_cache/

Testing

Tested with multiple weight-only quantization schemes sharing the same base model with and w/o the cache; the cached run was faster

Files Changed

tests/testing_utils.py: Added caching implementation (~50 lines)
tests/lmeval/test_lmeval.py: Applied @cached_lm_eval_run decorator to _eval_base_model()

Implements caching mechanism to reduce test times by avoiding redundant base model evaluations when multiple tests use the same configuration. Cache is stored in memory during test session and automatically cleared when the process exits. Signed-off-by: Rahul Tuli <[email protected]>

Signed-off-by: Rahul Tuli <[email protected]>

Copilot

Pull Request Overview

This PR implements a file-based caching mechanism for lm-eval tests to reduce test execution time by avoiding redundant base model evaluations when multiple test configurations use the same base model.

Key changes:

Replaces in-memory cache with file-based cache to persist across separate pytest processes
Introduces LMEvalCacheKey dataclass for managing cache keys and file operations
Refactors caching decorator to use the new file-based system

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

tests/testing_utils.py

Signed-off-by: Rahul Tuli <[email protected]>

rahul-tuli added 3 commits October 3, 2025 14:23

Simplify logic

dc87917

Add: Persisting Cache

b57c9ff

Signed-off-by: Rahul Tuli <[email protected]>

rahul-tuli requested a review from Copilot October 6, 2025 12:46

Copilot AI reviewed Oct 6, 2025

View reviewed changes

tests/testing_utils.py Outdated Show resolved Hide resolved

tests/testing_utils.py Show resolved Hide resolved

tests/testing_utils.py Outdated Show resolved Hide resolved

tests/testing_utils.py Outdated Show resolved Hide resolved

rahul-tuli changed the base branch from feat/lmeval-base-model-caching to main October 6, 2025 15:56

rahul-tuli mentioned this pull request Oct 6, 2025

[WIP] Add in-memory caching for lm-eval base model results #1898

Closed

Improve: Logging w.r.t the cache

6832230

Signed-off-by: Rahul Tuli <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add: File Based Caching for `lm_eval` tests #1900

Add: File Based Caching for `lm_eval` tests #1900

Uh oh!

rahul-tuli commented Oct 6, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add: File Based Caching for lm_eval tests #1900

Are you sure you want to change the base?

Add: File Based Caching for lm_eval tests #1900

Uh oh!

Conversation

rahul-tuli commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Solution

Implementation Details

Core components:

Design decisions:

Performance Impact

Usage

Enable caching (default):

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add: File Based Caching for `lm_eval` tests #1900

Add: File Based Caching for `lm_eval` tests #1900

rahul-tuli commented Oct 6, 2025 •

edited

Loading