#3 Using PdfReader causes a crash #2817

Avgor46 · 2024-08-27T14:55:38Z

Hi!

I've found another broken pdf that causes a crash in PdfReader. The necessary information to reproduce one of them is provided below.

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
Linux-5.15.0-56-generic-x86_64-with-glibc2.31

$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==4.3.1, crypt_provider=('cryptography', '3.1'), PIL=none

upd. commit b7b3c8c

Code + PDF

This is a minimal, complete example that shows the issue:

#! /usr/bin/env python3

import pypdf
from pypdf.errors import EmptyFileError, PdfReadError, PdfStreamError
import sys

def TestOneInput(fname):
  try:
    pdf_reader = pypdf.PdfReader(fname)
    for page_number, page in enumerate(pdf_reader.pages):
        page.extract_text()
  except (EmptyFileError, PdfReadError, PdfStreamError):
      pass

if __name__ == "__main__":
    if len(sys.argv) < 2:
        exit(1)
    TestOneInput(sys.argv[1])

PoC

crash-7e1356f1179b4198337f282304cb611aea26a199.pdf

Traceback

This is the complete stderr I see:

Invalid parent xref., rebuild xref
Object 9 0 not defined.
Traceback (most recent call last):
  File "/fuzz/./poc.py", line 18, in <module>
    TestOneInput(sys.argv[1])
  File "/fuzz/./poc.py", line 10, in TestOneInput
    for page_number, page in enumerate(pdf_reader.pages):
  File "/usr/local/lib/python3.9/dist-packages/pypdf/_page.py", line 2425, in __iter__
    for i in range(len(self)):
  File "/usr/local/lib/python3.9/dist-packages/pypdf/_page.py", line 2356, in __len__
    return self.length_function()
  File "/usr/local/lib/python3.9/dist-packages/pypdf/_doc_common.py", line 352, in get_num_pages
    self._flatten()
  File "/usr/local/lib/python3.9/dist-packages/pypdf/_doc_common.py", line 1101, in _flatten
    pages = catalog["/Pages"].get_object()  # type: ignore
AttributeError: 'NoneType' object has no attribute 'get_object'

The text was updated successfully, but these errors were encountered:

pubpub-zz · 2024-08-27T16:54:53Z

your pdf has invalid xref. the xref was rebuilt successfully but was not extracting Object Streams.
I've added required code

closes py-pdf#2817

py-pdf deleted a comment Aug 27, 2024

pubpub-zz added a commit to pubpub-zz/pypdf that referenced this issue Aug 27, 2024

ENH: Robustify parsing for Object streams in XRef rebuilding

640df54

closes py-pdf#2817

pubpub-zz mentioned this issue Aug 27, 2024

ENH: Robustify parsing for Object streams in XRef rebuilding #2818

Merged

stefan6419846 closed this as completed in #2818 Sep 13, 2024

stefan6419846 closed this as completed in 9d54f63 Sep 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#3 Using PdfReader causes a crash #2817

#3 Using PdfReader causes a crash #2817

Avgor46 commented Aug 27, 2024

pubpub-zz commented Aug 27, 2024

#3 Using PdfReader causes a crash #2817

#3 Using PdfReader causes a crash #2817

Comments

Avgor46 commented Aug 27, 2024

Environment

Code + PDF

PoC

Traceback

pubpub-zz commented Aug 27, 2024