Skip to content

merge_page with passed page having markup annotation fails #3467

@stefan6419846

Description

@stefan6419846

PageObject.merge_page does not work when the passed parameter contains a markup annotation. The reasons is that we internally use DictionaryObject.clone as the base class for these annotations, which assumes that creating new instances of the corresponding classes does not take parameters, which is not the case here.

In the same step, it currently is not clear for me how passing the corresponding attributes actually works at the moment.

To mitigate this, we would ideally have a generic solution, possibly by having a mapping in each class derived from DictionaryObject which maps keys of self to parameters of __init__ to avoid code duplication.

Initially discovered in #3291 (comment).

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
Linux-6.4.0-150600.23.65-default-x86_64-with-glibc2.38

$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==6.0.0, crypt_provider=('cryptography', '44.0.0'), PIL=11.1.0

Code + PDF

This is a minimal, complete example that shows the issue:

from pypdf import PdfWriter
from pypdf.annotations import Polygon


writer = PdfWriter()
writer2 = PdfWriter()
writer.add_blank_page(100, 100)
writer2.add_blank_page(100, 100)

annotation = Polygon(
    vertices=[(50, 550), (200, 650), (70, 750), (50, 700)],
)
writer.add_annotation(0, annotation)

page1 = writer.pages[0]
page2 = writer2.pages[0]
page2.merge_page(page1)

No PDF file required, as it will created on the fly by the above code.

Traceback

This is the complete traceback I see:

tests/test_page.py:1508 (test_merge_page_with_annotations)
def test_merge_page_with_annotations():
        writer = PdfWriter()
        writer2 = PdfWriter()
        writer.add_blank_page(100, 100)
        writer2.add_blank_page(100, 100)
    
        from pypdf.annotations import Polygon
        annotation = Polygon(
            vertices=[(50, 550), (200, 650), (70, 750), (50, 700)],
        )
        writer.add_annotation(0, annotation)
    
        page_one = writer.pages[0]
        page_two = writer2.pages[0]
>       page_two.merge_page(page_one)

tests/test_page.py:1523: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pypdf/_page.py:1062: in merge_page
    self._merge_page(page2, over=over, expand=expand)
pypdf/_page.py:1080: in _merge_page
    return self._merge_page_writer(
pypdf/_page.py:1235: in _merge_page_writer
    aa = a.clone(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = {'/Type': '/Annot', '/Subtype': '/Polygon', '/Vertices': [50, 550, 200, 650, 70, 750, 50, 700], '/IT': '/PolygonCloud', '/Rect': RectangleObject([50, 550, 200, 750]), '/P': IndirectObject(4, 0, 140694769602768)}
pdf_dest = <pypdf._writer.PdfWriter object at 0x7ff60dc3a310>
force_duplicate = True, ignore_fields = ('/P', '/StructParent', '/Parent')

    def clone(
        self,
        pdf_dest: PdfWriterProtocol,
        force_duplicate: bool = False,
        ignore_fields: Optional[Sequence[Union[str, int]]] = (),
    ) -> "DictionaryObject":
        """Clone object into pdf_dest."""
        try:
            if self.indirect_reference.pdf == pdf_dest and not force_duplicate:  # type: ignore
                return self
        except Exception:
            pass
    
        visited: set[tuple[int, int]] = set()  # (idnum, generation)
        print(type(self), self)
        d__ = cast(
            "DictionaryObject",
>           self._reference_clone(self.__class__(), pdf_dest, force_duplicate),
        )
E       TypeError: Polygon.__init__() missing 1 required positional argument: 'vertices'

pypdf/generic/_data_structures.py:297: TypeError

Metadata

Metadata

Assignees

No one assigned

    Labels

    genericThe generic submodule is affectedis-bugFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFworkflow-mergeFrom a users perspective, merging is the affected feature/workflow

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions