Add: File Based Caching for lm_eval tests
#1900
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Implements a file-based caching mechanism to reduce lm-eval test times by avoiding redundant base model evaluations when multiple tests use the same base model
configuration.
Motivation
Our lm-eval tests are run individually as separate processes via
run_tests.sh, with each config file spawning a new pytest process. This means:For example, running 8 tests with the same base model (
meta-llama/Meta-Llama-3-8B-Instruct) results in 8 redundant base model evaluations, roughly doubling the totaltest time.
Solution
Implemented a file-based cache that persists base model evaluation results across separate pytest processes:
(model, task, num_fewshot, limit, batch_size, model_args).lmeval_cache/directory (configurable viaLMEVAL_CACHE_DIR)run_tests.shDISABLE_LMEVAL_CACHE=1environment variableImplementation Details
Core components:
LMEvalCacheKey: Frozen dataclass that handles cache key generation, file I/O, and cache lookupscached_lm_eval_run: Decorator that transparently adds caching to base model evaluation methodsDesign decisions:
Performance Impact
Expected speedup: ~30% for test suites with multiple configs sharing the same base model.
Usage
Enable caching (default):
Disable caching:
Custom cache location:
Clear cache:
Testing
Files Changed
tests/testing_utils.py: Added caching implementation (~50 lines)tests/lmeval/test_lmeval.py: Applied@cached_lm_eval_rundecorator to_eval_base_model()