Skip to content

Commit 2994684

Browse files
committed
Created README docs on the enrichment
1 parent 40dbabe commit 2994684

File tree

1 file changed

+18
-0
lines changed

1 file changed

+18
-0
lines changed

README.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -747,6 +747,24 @@ Depending on the source document, the same image can appear multiple times
747747
Thus, clients should consider media databases
748748
to have a many-to-many relationship with chunks.
749749

750+
Since PaperQA's evidence gathering process centers on text-based retrieval,
751+
it's possible relevant image(s) or table(s) aren't retrieved
752+
because their associated text content is irrelevant.
753+
For a concrete example, imagine the figure in a paper has a terse caption
754+
and is placed one page after relevant main-text discussion.
755+
To solve this problem, PaperQA supports media enrichment at document read-time.
756+
Basically after reading in the PDF,
757+
the `parsing.enrichment_llm` is given the `parsing.enrichment_prompt`
758+
and co-located text to generate a synthetic caption for every image/table.
759+
The synthetic captions are used to shift the embeddings of each text chunk,
760+
but are kept separate from the actual source text.
761+
This way evidence gathering can fetch relevant images/tables
762+
without risk of polluting contextual summaries with LLM-generated captions.
763+
764+
If you want multimodal PDF reading, but do not want enrichment
765+
(since adds one LLM prompt/media at read-time),
766+
enrichment can be disabled by setting `parsing.multimodal` to `ON_WITHOUT_ENRICHMENT`.
767+
750768
When creating contextual summaries on a given chunk (a `Text`),
751769
the summary LLM is passed both the chunk's text and the chunk's associated media,
752770
but the output contextual summary itself remains text-only.

0 commit comments

Comments
 (0)