Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
92fa705
feat: Added `get_benchmark_result()` to BenchmarkResults to obtain a …
ayush1298 Dec 23, 2025
9647701
2.5.0
Dec 23, 2025
eee248a
fix: legacy clustering processing (#3791)
Samoed Dec 23, 2025
f0179e4
2.5.1
Dec 23, 2025
44555a7
better clustering fix (#3793)
Samoed Dec 23, 2025
fb53f57
docs: update MIEB contributing guide for MTEB v2 AbsTask structure (#…
isaac-chung Dec 23, 2025
a2631dd
model: add octen_models (#3789)
bflhc Dec 23, 2025
57a4b0c
Add leaderboard timing logs and join_revisions() speedups (#3790)
isaac-chung Dec 23, 2025
d3bc4cc
Optimize validate filter scores only (#3792)
isaac-chung Dec 24, 2025
522eecc
fix: Add model_type in model_meta for all models (#3751)
ayush1298 Dec 25, 2025
d94da97
2.5.2
Dec 25, 2025
10e6bc5
fix: Added warnings.warn when logging warnings (#3753)
ayush1298 Dec 26, 2025
68d8366
2.5.3
Dec 26, 2025
3ec1f63
save kwargs passed to get_model in model_meta (#3785)
ayush1298 Dec 26, 2025
a99557d
fix: add typecheck (#3550)
Samoed Dec 27, 2025
42dea01
2.5.4
Dec 27, 2025
480f1b9
Add benchmark aliases (#3767)
Samoed Dec 28, 2025
48f137e
Add function for creating mock images (#3803)
Samoed Dec 29, 2025
b1aae79
docs: add benchmark filtering examples (#3805)
isaac-chung Dec 29, 2025
9e867f5
update generate_model_card with get_benchmark_result() (#3796)
ayush1298 Dec 30, 2025
ab2d494
Update the API of Bytedance/Seed1.6-embedding-1215 (#3814)
QuanYuhan Dec 30, 2025
6ebc5fd
fix: repo exists check (#3813)
Samoed Dec 30, 2025
a49af6c
2.5.5
Dec 30, 2025
1a64ed6
feat: Add leaderboard CLI command (#3802)
isaac-chung Dec 30, 2025
1c873b4
2.6.0
Dec 30, 2025
6f4627e
Add filter for model type (#3799)
ayush1298 Dec 30, 2025
6dcbf9f
add model: bflhc/Octen-Embedding-4B (#3816)
bflhc Dec 30, 2025
3303087
fix: Download cached results zip from cached-data branch (#3795)
isaac-chung Dec 30, 2025
b895af5
2.6.1
Dec 30, 2025
75743f1
ci: Switch CI to use `uv` (#3702)
isaac-chung Dec 30, 2025
043ea38
fix: handle git lfs content for cached zip file (#3827)
isaac-chung Jan 2, 2026
9bcf921
2.6.2
Jan 2, 2026
17ef363
fix: Allow passing device to model (#3812)
ayush1298 Jan 3, 2026
738fbf8
2.6.3
Jan 3, 2026
c7c04e5
fix: Add leaderboard docker workflow (#3828)
isaac-chung Jan 3, 2026
a2341a5
2.6.4
Jan 3, 2026
3723d27
docs: Fix docs build strict mode errors (#3809)
isaac-chung Jan 3, 2026
44e9b20
model: Add SauerkrautLM-ColPali visual document retrieval models (#3804)
dgolchin Jan 4, 2026
bf2627a
fix dataset generation tags (#3835)
Samoed Jan 4, 2026
d033c24
fix: Extend framework annotations for `ModelMeta` (#3819)
ayush1298 Jan 5, 2026
acf853a
2.6.5
Jan 5, 2026
adb5b42
dataset: Vietnamese VN-MTEB TVPLRetrieval, NanoClimateFEVER-VN, NanoF…
BaoLocPham Jan 5, 2026
b905e27
test: Add HF Space Dockerfile using pre-built leaderboard image (#3838)
isaac-chung Jan 5, 2026
3d33825
Merge main into maeb branch - incorporate uv and type annotations whi…
Jan 6, 2026
b36b7fa
Update uv.lock
isaac-chung Jan 6, 2026
8bbdd70
Fix lint and type errors
isaac-chung Jan 6, 2026
7e636ec
Fix duplicate modalities kwarg in random_baseline ModelMeta
isaac-chung Jan 6, 2026
ed9203e
fix baselines
Samoed Jan 6, 2026
2631fc8
invalidate hf cache for maeb
isaac-chung Jan 6, 2026
fb6a0a9
temporarily skip 3.14 in tests
isaac-chung Jan 7, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions .github/workflows/dataset_loading.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,15 +33,19 @@ jobs:
- name: Checkout repository
uses: actions/checkout@v6

- name: Install uv
uses: astral-sh/setup-uv@v7
with:
enable-cache: true

- name: Set up Python
uses: actions/setup-python@v6
with:
python-version: '3.11'
cache: 'pip'

- name: Install dependencies
run: |
make install-for-tests
make install

- name: Run dataset loading tests
env:
Expand Down
11 changes: 5 additions & 6 deletions .github/workflows/documentation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,20 +33,19 @@ jobs:
docker system prune -af

- uses: actions/checkout@v6
- name: Install uv
uses: astral-sh/setup-uv@v7
with:
enable-cache: true
- uses: actions/setup-python@v6
with:
python-version: "3.10"

- name: Dependencies
run: |
python -m pip install --upgrade pip
pip install -e . --group docs

- name: Build and Deploy
if: github.event_name == 'push'
run: |
make build-docs-overview
mkdocs gh-deploy --force
uv run --group docs mkdocs gh-deploy --force

- name: Build
if: github.event_name == 'pull_request'
Expand Down
73 changes: 73 additions & 0 deletions .github/workflows/hf_space_docker.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
name: Build HF Space Docker Image

on:
push:
branches: [ main ]
paths:
- 'Dockerfile.hf-space'
- '.github/workflows/hf_space_docker.yml'
pull_request:
branches: [ main ]
paths:
- 'Dockerfile.hf-space'
- '.github/workflows/hf_space_docker.yml'
workflow_dispatch:

jobs:
build-hf-space:
runs-on: ubuntu-latest

steps:
- name: Checkout code
uses: actions/checkout@v4

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3

- name: Build HF Space Docker image
run: |
docker build -f Dockerfile.hf-space -t mteb-hf-space:test .
echo "✅ Docker image built successfully"

- name: Test Docker image
run: |
# Test that the image can be created and the container starts
docker run -d --name mteb-test -p 7860:7860 mteb-hf-space:test

# Give the container a moment to start
sleep 5

# Check if container is running
if docker ps | grep -q mteb-test; then
echo "✅ Container is running"
docker logs mteb-test
else
echo "❌ Container failed to start"
docker logs mteb-test
exit 1
fi

# Clean up
docker stop mteb-test
docker rm mteb-test

- name: Validate HF Space configuration
run: |
# Check that required environment variables and ports are set
docker run --rm mteb-hf-space:test sh -c '
if [ "$GRADIO_SERVER_NAME" = "0.0.0.0" ]; then
echo "✅ GRADIO_SERVER_NAME is correctly set"
else
echo "❌ GRADIO_SERVER_NAME is not set to 0.0.0.0"
exit 1
fi
'

# Check exposed port
EXPOSED_PORT=$(docker inspect mteb-hf-space:test --format='{{range $key, $value := .Config.ExposedPorts}}{{$key}}{{end}}' | grep -o '7860')
if [ "$EXPOSED_PORT" = "7860" ]; then
echo "✅ Port 7860 is exposed"
else
echo "❌ Port 7860 is not exposed"
exit 1
fi
10 changes: 5 additions & 5 deletions .github/workflows/leaderboard_build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,15 +33,15 @@ jobs:
- name: Checkout repository
uses: actions/checkout@v6

- name: Install uv
uses: astral-sh/setup-uv@v7
with:
enable-cache: true

- name: Set up Python
uses: actions/setup-python@v6
with:
python-version: '3.10'
cache: 'pip'

- name: Install dependencies (incl. leaderboard extra)
run: |
pip install ".[leaderboard]" --group dev

- name: Run leaderboard build test
run: |
Expand Down
166 changes: 166 additions & 0 deletions .github/workflows/leaderboard_docker.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,166 @@
name: Test Leaderboard Dockerfile

on:
push:
branches: [ main ]
paths:
- 'Dockerfile'
- '.github/workflows/leaderboard_docker.yml'
- 'pyproject.toml'
- 'uv.lock'
pull_request:
branches: [ main ]
paths:
- 'Dockerfile'
- '.github/workflows/leaderboard_docker.yml'
- 'pyproject.toml'
- 'uv.lock'
workflow_dispatch:

jobs:
test-dockerfile:
runs-on: ubuntu-latest

steps:
- name: Checkout code
uses: actions/checkout@v4

- name: Free disk space
run: |
sudo rm -rf \
"$AGENT_TOOLSDIRECTORY" \
/opt/ghc \
/opt/google/chrome \
/opt/microsoft/msedge \
/opt/microsoft/powershell \
/opt/pipx \
/usr/lib/mono \
/usr/local/julia* \
/usr/local/lib/android \
/usr/local/lib/node_modules \
/usr/local/share/chromium \
/usr/local/share/powershell \
/usr/share/dotnet \
/usr/share/swift
docker system prune -af

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3

- name: Log in to GitHub Container Registry
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}

- name: Build and push Docker image
uses: docker/build-push-action@v5
with:
context: .
file: ./Dockerfile
push: ${{ github.ref == 'refs/heads/main' }}
tags: |
ghcr.io/${{ github.repository }}/leaderboard:${{ github.sha }}
ghcr.io/${{ github.repository }}/leaderboard:latest
cache-from: type=gha
cache-to: type=gha,mode=max
load: true

- name: Prepare image for local testing
run: |
docker tag ghcr.io/${{ github.repository }}/leaderboard:${{ github.sha }} mteb-leaderboard:test

- name: Test container can start and run
timeout-minutes: 6
run: |
# Start the container in background
docker run -d --name mteb-test -p 7860:7860 mteb-leaderboard:test

# Monitor container with smart exit conditions
echo "Starting leaderboard container..."
START_TIME=$(date +%s)
STARTUP_DETECTED=false
PROGRESS_DETECTED=false
INIT_COMPLETE_DETECTED=false

while true; do
CURRENT_TIME=$(date +%s)
ELAPSED=$((CURRENT_TIME - START_TIME))

# Maximum timeout: 5.5 minutes - always fail at timeout
if [ $ELAPSED -gt 330 ]; then
echo "❌ Timeout reached after ${ELAPSED}s - container failed to complete initialization"
docker logs --tail 20 mteb-test
docker stop mteb-test
docker rm mteb-test
exit 1
fi

# Check if container is still running
if ! docker ps | grep -q mteb-test; then
# Container exited, check exit code and logs
EXIT_CODE=$(docker inspect mteb-test --format='{{.State.ExitCode}}')
echo "Container exited with code: $EXIT_CODE after ${ELAPSED}s"

# Check if we made good progress before exit
LOGS=$(docker logs mteb-test 2>&1)
if echo "$LOGS" | grep -q "=== Leaderboard app initialization complete" || \
echo "$LOGS" | grep -q "Running on.*http"; then
echo "✅ Container completed significant startup progress successfully"
docker rm mteb-test
exit 0
else
echo "❌ Container failed before completing startup initialization"
echo "Last logs:"
echo "$LOGS" | tail -10
docker rm mteb-test
exit 1
fi
fi

# Get current logs and check for progress indicators
LOGS=$(docker logs mteb-test 2>&1)

# Check for startup detection
if [ "$STARTUP_DETECTED" = "false" ] && echo "$LOGS" | grep -q "Starting leaderboard app in process"; then
echo "✅ Detected leaderboard app startup"
STARTUP_DETECTED=true
fi

# Check for full initialization complete
if [ "$INIT_COMPLETE_DETECTED" = "false" ] && echo "$LOGS" | grep -q "=== Leaderboard app initialization complete"; then
echo "✅ Detected full app initialization complete!"
INIT_COMPLETE_DETECTED=true
echo "🔄 Testing if server is responding..."
# Give it a moment for the server to fully start
sleep 3
# Try to ping the server
if curl -s --max-time 5 http://localhost:7860/ > /dev/null; then
echo "✅ Server is responding - container fully operational!"
docker stop mteb-test
docker rm mteb-test
exit 0
else
echo "⏳ Server not yet responding, continuing to wait..."
PROGRESS_DETECTED=true
fi
fi

# Check for Gradio server ready
if echo "$LOGS" | grep -q "Running on.*http"; then
echo "✅ Detected Gradio server ready - container fully operational!"
docker stop mteb-test
docker rm mteb-test
exit 0
fi

# Show progress every 30 seconds
if [ $((ELAPSED % 30)) -eq 0 ] && [ $ELAPSED -gt 0 ]; then
echo "Container still running after ${ELAPSED}s... (startup: $STARTUP_DETECTED, init_complete: $INIT_COMPLETE_DETECTED)"
# Show last few lines of logs for progress
echo "$LOGS" | grep -E "(Step [0-9]/7|Starting leaderboard|initialization complete|Running on)" | tail -2
fi

sleep 5
done
9 changes: 5 additions & 4 deletions .github/workflows/lint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,13 +33,14 @@ jobs:

- uses: actions/checkout@v6

- name: Install uv
uses: astral-sh/setup-uv@v7
with:
enable-cache: true

- uses: actions/setup-python@v6
with:
python-version: "3.10"
cache: "pip"

- name: Install dependencies
run: make install

- name: Lint
id: lint
Expand Down
6 changes: 5 additions & 1 deletion .github/workflows/model_loading.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,11 +33,15 @@ jobs:
- name: Checkout repository
uses: actions/checkout@v6

- name: Install uv
uses: astral-sh/setup-uv@v7
with:
enable-cache: true

- name: Set up Python
uses: actions/setup-python@v6
with:
python-version: "3.10"
cache: "pip"

- name: Install dependencies and run tests
run: |
Expand Down
9 changes: 4 additions & 5 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ jobs:
fail-fast: false
matrix:
os: [ubuntu-latest] #, macos-latest, windows-latest]
python-version: ["3.10", "3.11", "3.12", "3.13", "3.14"]
python-version: ["3.10", "3.11", "3.12", "3.13"] #, "3.14"]
include:
# Add Windows with Python 3.10 only to avoid tests taking too long
- os: windows-latest
Expand Down Expand Up @@ -54,13 +54,12 @@ jobs:
uses: actions/cache@v4
with:
path: ~/.cache/huggingface
key: ${{ runner.os }}-hf
key: ${{ runner.os }}-hf-maeb

- name: Setup Python ${{ matrix.python-version }}
uses: actions/setup-python@v6
- name: Install uv and set the Python version
uses: astral-sh/setup-uv@v7
with:
python-version: ${{ matrix.python-version }}
cache: "pip"

- name: Install dependencies
shell: bash
Expand Down
Loading