Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -208,3 +208,4 @@ cython_debug/
marimo/_static/
marimo/_lsp/
__marimo__/
.subtask/
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -255,7 +255,7 @@ See [MODELS.md](.claude/skills/vlmbench/MODELS.md) for tested models and their r
| Type | Extensions | Processing |
|---|---|---|
| Image | `.png`, `.jpg`, `.jpeg`, `.webp`, `.tiff`, `.bmp` | Base64 encode |
| PDF | `.pdf` | `pdf2image` per-page -> base64 |
| PDF | `.pdf` | `pypdfium2` per-page -> base64 |
| Video | `.mp4`, `.mov`, `.avi`, `.mkv`, `.webm` | `ffmpeg` 1fps -> frames -> base64 |

Directories processed recursively, sorted alphabetically.
Expand All @@ -272,4 +272,4 @@ Results saved as JSON to `./results/{model-slug}-{timestamp}.json` with model me
- vLLM (`uv pip install vllm`) for native `--backend vllm`
- tmux (for server management and monitoring)
- macmon (`brew install macmon`) or nvitop (GPU monitoring)
- ffmpeg (video input), poppler (PDF input) — optional
- ffmpeg (video input) — optional
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ dependencies = [
"openai>=1.0",
"tenacity>=8",
"Pillow>=10",
"pdf2image>=1.16",
"pypdfium2>=4",
]

[project.optional-dependencies]
Expand Down
45 changes: 31 additions & 14 deletions uv.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

16 changes: 10 additions & 6 deletions vlmbench/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
# "openai>=1.0",
# "tenacity>=8",
# "Pillow>=10",
# "pdf2image>=1.16",
# "pypdfium2>=4",
# ]
# ///
"""
Expand Down Expand Up @@ -972,17 +972,21 @@ def image_to_base64(path: Path) -> str:
return f"data:{mime};base64,{b64}"


def pdf_to_base64_images(path: Path) -> list[str]:
"""Convert PDF pages to base64 data URIs using pdf2image."""
from pdf2image import convert_from_path
def pdf_to_base64_images(path: Path, dpi: int = 150) -> list[str]:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The PR description mentions adding a --dpi CLI flag and threading it through to this function. However, the implementation seems incomplete. The dpi parameter is added here with a default value, but there's no --dpi flag in the run command, and the value is not passed down from load_inputs. This means the PDF rendering resolution cannot be controlled from the CLI as intended. Please ensure the --dpi flag is added and its value is passed to this function.

"""Convert PDF pages to base64 data URIs using pypdfium2."""
import pypdfium2 as pdfium

images = convert_from_path(str(path))
doc = pdfium.PdfDocument(str(path))
results = []
for img in images:
for idx in range(len(doc)):
page = doc[idx]
bitmap = page.render(scale=dpi / 72)
img = bitmap.to_pil()
buf = io.BytesIO()
img.save(buf, format="PNG")
b64 = base64.b64encode(buf.getvalue()).decode("utf-8")
results.append(f"data:image/png;base64,{b64}")
doc.close()
return results
Comment on lines +979 to 990
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

pypdfium2.PdfDocument should be used as a context manager to ensure resources are properly released, even in case of an error during PDF processing. This prevents potential resource leaks. Also, you can iterate directly over the doc object to get pages, which is more idiomatic.

Suggested change
doc = pdfium.PdfDocument(str(path))
results = []
for img in images:
for idx in range(len(doc)):
page = doc[idx]
bitmap = page.render(scale=dpi / 72)
img = bitmap.to_pil()
buf = io.BytesIO()
img.save(buf, format="PNG")
b64 = base64.b64encode(buf.getvalue()).decode("utf-8")
results.append(f"data:image/png;base64,{b64}")
doc.close()
return results
with pdfium.PdfDocument(str(path)) as doc:
results = []
for page in doc:
bitmap = page.render(scale=dpi / 72)
img = bitmap.to_pil()
buf = io.BytesIO()
img.save(buf, format="PNG")
b64 = base64.b64encode(buf.getvalue()).decode("utf-8")
results.append(f"data:image/png;base64,{b64}")
return results



Expand Down