Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROB: page._get_fonts() gives KeyError: '/BaseFont' #2289

Closed
MartinThoma opened this issue Nov 9, 2023 · 2 comments · Fixed by #2469
Closed

ROB: page._get_fonts() gives KeyError: '/BaseFont' #2289

MartinThoma opened this issue Nov 9, 2023 · 2 comments · Fixed by #2469
Labels
is-robustness-issue From a users perspective, this is about robustness

Comments

@MartinThoma
Copy link
Member

MartinThoma commented Nov 9, 2023

I wanted to get the fonts of a page and got a KeyError: '/BaseFont'

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
Linux-5.4.0-166-generic-x86_64-with-glibc2.31

$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==3.17.0, crypt_provider=('cryptography', '41.0.4'), PIL=10.0.0

Code + PDF

This is a minimal, complete example that shows the issue:

from pypdf import PdfReader

reader = PdfReader('missing-base-font.pdf')
page = reader.pages[0]
page._get_fonts()

The PDF:
missing-base-font.pdf

Traceback

This is the complete Traceback I see:

Traceback (most recent call last):
  File "/home/moose/Documents/foo.py", line 5, in <module>
    page._get_fonts()
  File "/home/moose/Github/py-pdf/pypdf/pypdf/_page.py", line 2349, in _get_fonts
    fonts, embedded = _get_fonts_walk(obj, fonts, embedded)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/moose/Github/py-pdf/pypdf/pypdf/_page.py", line 2645, in _get_fonts_walk
    process_font(f)
  File "/home/moose/Github/py-pdf/pypdf/pypdf/_page.py", line 2635, in process_font
    emb.add(cast(str, f["/BaseFont"]))
                      ~^^^^^^^^^^^^^
  File "/home/moose/Github/py-pdf/pypdf/pypdf/generic/_data_structures.py", line 333, in __getitem__
    return dict.__getitem__(self, key).get_object()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: '/BaseFont'
@MartinThoma MartinThoma changed the title page._get_fonts() gives KeyError: '/BaseFont' BUG: page._get_fonts() gives KeyError: '/BaseFont' Nov 9, 2023
@MartinThoma MartinThoma changed the title BUG: page._get_fonts() gives KeyError: '/BaseFont' ROB: page._get_fonts() gives KeyError: '/BaseFont' Nov 9, 2023
@MartinThoma MartinThoma added the is-robustness-issue From a users perspective, this is about robustness label Nov 9, 2023
@MartinThoma
Copy link
Member Author

@pubpub-zz I would replace

            # the list comprehension ensures there is FontFile
            emb.add(cast(str, f["/BaseFont"]))

in _page.py by

            if (...) and "/BaseFont" in f:
                emb.add(cast(str, f["/BaseFont"]))

That would at least make the exception go away. But I don't know why we have that big if-block. I guess there is something else wrong?

@pubpub-zz
Copy link
Collaborator

The problem here is that the font is a Type 3 that does not contains /BaseFont property.
I would propose this mod:

emb.add(cast(str, f.get("/BaseFont","("+f["/SubType"]+")")))

about the big if block, so far from I remember I've written it as a list of if but ruff/black optimized it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
is-robustness-issue From a users perspective, this is about robustness
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants