Multimodal PDF support #1047

jamesbraza · 2025-08-06T05:58:39Z

This PR completes MVP multimodal support for PaperQA:

Moves PDF readers to support images and tables
- With several routes such as full page screenshot vs image screenshot
Creates an opt-out setting for multimodal parsings
Adds test PDF checking we can parse table data

Copilot

Pull Request Overview

This PR implements MVP multimodal support for PaperQA, allowing the system to parse and utilize images and tables from PDFs alongside text content. The implementation introduces an opt-out setting for multimodal parsing and adds comprehensive support across both PyPDF and PyMuPDF parsers.

Adds multimodal parsing capability to PDF readers supporting images, tables, and full-page screenshots
Introduces multimodal configuration setting (defaults to True) to control image/table parsing
Creates comprehensive test coverage for table querying and multimodal functionality

Reviewed Changes

Copilot reviewed 11 out of 16 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`src/paperqa/settings.py`	Adds `multimodal` boolean field to ParsingSettings for controlling image/table parsing
`src/paperqa/docs.py`	Integrates multimodal setting into document reading pipeline via `parse_images` parameter
`packages/paper-qa-pypdf/src/paperqa_pypdf/reader.py`	Implements multimodal support with pypdfium2 for full-page screenshots and media parsing
`packages/paper-qa-pymupdf/src/paperqa_pymupdf/reader.py`	Extends PyMuPDF parser to support drawings, tables, and full-page screenshots with clustering
`tests/test_paperqa.py`	Updates existing tests for multimodal behavior and adds new table querying test
`tests/test_agents.py`	Updates file count expectations to include new influence.pdf test file
Various test files	Comprehensive test coverage for multimodal parsing functionality across both PDF parsers

packages/paper-qa-pypdf/src/paperqa_pypdf/reader.py

tests/test_paperqa.py

packages/paper-qa-pymupdf/src/paperqa_pymupdf/reader.py

packages/paper-qa-pymupdf/tests/test_paperqa_pymupdf.py

mskarlin · 2025-08-06T22:49:19Z

packages/paper-qa-pymupdf/src/paperqa_pymupdf/reader.py

+            media: list[ParsedMedia] = []
+            if parse_media:
+                if full_page:  # Capture the entire page as one image
+                    pix = page.get_pixmap(dpi=image_dpi)


could we add some error handling here if a bad image is hit and returning an ImpossibleParsingError?

We already protect page = file.load_page(...) with ImpossibleParsingError, I think once a Page is loaded in and constructed, we should be good.

I haven't seen a get_pixmap crash so far, I'd like to hold off on this for the scope of PR

mskarlin

lgtm -- some minor comments

jamesbraza requested review from whitead, mskarlin, maykcaldas, SamCox822, nadolskit and Copilot August 6, 2025 05:58

jamesbraza self-assigned this Aug 6, 2025

jamesbraza added the enhancement New feature or request label Aug 6, 2025

dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Aug 6, 2025

Copilot AI reviewed Aug 6, 2025

View reviewed changes

jamesbraza force-pushed the multimodal-pdfs branch 2 times, most recently from 0bdcedc to 5bd9b72 Compare August 6, 2025 20:31

mskarlin reviewed Aug 6, 2025

View reviewed changes

packages/paper-qa-pymupdf/src/paperqa_pymupdf/reader.py Show resolved Hide resolved

mskarlin reviewed Aug 6, 2025

View reviewed changes

packages/paper-qa-pymupdf/tests/test_paperqa_pymupdf.py Show resolved Hide resolved

mskarlin reviewed Aug 6, 2025

View reviewed changes

mskarlin approved these changes Aug 6, 2025

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label Aug 6, 2025

jamesbraza force-pushed the multimodal-pdfs branch from 5bd9b72 to addd2f1 Compare August 6, 2025 23:09

jamesbraza added 4 commits August 6, 2025 16:12

Added reader support for multimodal

65b7a07

Added a 'ParsingSettings.multimodal' setting

fa4cb94

Added a PDF that enables us to test our table integration

3dbc070

Added docs on multimodal support to the README

4b0d848

jamesbraza force-pushed the multimodal-pdfs branch from addd2f1 to 4b0d848 Compare August 6, 2025 23:12

jamesbraza merged commit f71d023 into main Aug 7, 2025
7 checks passed

jamesbraza deleted the multimodal-pdfs branch August 7, 2025 05:12

jamesbraza added a commit that referenced this pull request Aug 25, 2025

Multimodal PDF support (#1047)

c6a4b99

jamesbraza added a commit that referenced this pull request Aug 26, 2025

Multimodal PDF support (#1047)

c9fd25f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multimodal PDF support #1047

Multimodal PDF support #1047

Uh oh!

jamesbraza commented Aug 6, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mskarlin Aug 6, 2025

Uh oh!

jamesbraza Aug 6, 2025

Uh oh!

mskarlin left a comment

Uh oh!

Uh oh!

Uh oh!

Multimodal PDF support #1047

Multimodal PDF support #1047

Uh oh!

Conversation

jamesbraza commented Aug 6, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mskarlin Aug 6, 2025

Choose a reason for hiding this comment

Uh oh!

jamesbraza Aug 6, 2025

Choose a reason for hiding this comment

Uh oh!

mskarlin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!