Skip to content

fix: speedup retrieval computation#3454

Merged
Samoed merged 4 commits intomainfrom
speed_up_retrieval
Oct 20, 2025
Merged

fix: speedup retrieval computation#3454
Samoed merged 4 commits intomainfrom
speed_up_retrieval

Conversation

@Samoed
Copy link
Member

@Samoed Samoed commented Oct 20, 2025

Close #3431

I wrote this script for test

import time
import tracemalloc

import mteb


def format_memory(bytes_value: int) -> str:
    """Format bytes into human-readable format."""
    for unit in ["B", "KB", "MB", "GB"]:
        if bytes_value < 1024.0:
            return f"{bytes_value:.2f} {unit}"
        bytes_value /= 1024.0
    return f"{bytes_value:.2f} TB"


task = mteb.get_task("SWEbenchVerifiedRR")
tracemalloc.start()

# Record baseline memory
baseline_memory = tracemalloc.get_traced_memory()[0]

# Run the method
start_time = time.perf_counter()
result = task.load_data()
end_time = time.perf_counter()

# Get peak memory usage
current_memory, peak_memory = tracemalloc.get_traced_memory()
tracemalloc.stop()
elapsed = end_time - start_time
peak_memory_used = peak_memory - baseline_memory

print(f"✓ Completed in {elapsed:.4f} seconds")
print(f"  Peak memory: {format_memory(peak_memory_used)}")

And got:

Before:
✓ Completed in 382.7764 seconds
  Peak memory: 624.30 MB
After:
✓ Completed in 53.4648 seconds
  Peak memory: 708.77 MB

@KennethEnevoldsen KennethEnevoldsen changed the title speedup retrieval computation fix: speedup retrieval computation Oct 20, 2025
@KennethEnevoldsen
Copy link
Contributor

All good to merge - let us get CI in to see if everything works as intended

@Samoed Samoed enabled auto-merge (squash) October 20, 2025 18:33
@Samoed Samoed merged commit 01f3a19 into main Oct 20, 2025
8 checks passed
@Samoed Samoed deleted the speed_up_retrieval branch October 20, 2025 18:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Speedup qrels generation in retrieval tasks

2 participants