-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
PageObject.merge_page
does not work when the passed parameter contains a markup annotation. The reasons is that we internally use DictionaryObject.clone
as the base class for these annotations, which assumes that creating new instances of the corresponding classes does not take parameters, which is not the case here.
In the same step, it currently is not clear for me how passing the corresponding attributes actually works at the moment.
To mitigate this, we would ideally have a generic solution, possibly by having a mapping in each class derived from DictionaryObject
which maps keys of self
to parameters of __init__
to avoid code duplication.
Initially discovered in #3291 (comment).
Environment
Which environment were you using when you encountered the problem?
$ python -m platform
Linux-6.4.0-150600.23.65-default-x86_64-with-glibc2.38
$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==6.0.0, crypt_provider=('cryptography', '44.0.0'), PIL=11.1.0
Code + PDF
This is a minimal, complete example that shows the issue:
from pypdf import PdfWriter
from pypdf.annotations import Polygon
writer = PdfWriter()
writer2 = PdfWriter()
writer.add_blank_page(100, 100)
writer2.add_blank_page(100, 100)
annotation = Polygon(
vertices=[(50, 550), (200, 650), (70, 750), (50, 700)],
)
writer.add_annotation(0, annotation)
page1 = writer.pages[0]
page2 = writer2.pages[0]
page2.merge_page(page1)
No PDF file required, as it will created on the fly by the above code.
Traceback
This is the complete traceback I see:
tests/test_page.py:1508 (test_merge_page_with_annotations)
def test_merge_page_with_annotations():
writer = PdfWriter()
writer2 = PdfWriter()
writer.add_blank_page(100, 100)
writer2.add_blank_page(100, 100)
from pypdf.annotations import Polygon
annotation = Polygon(
vertices=[(50, 550), (200, 650), (70, 750), (50, 700)],
)
writer.add_annotation(0, annotation)
page_one = writer.pages[0]
page_two = writer2.pages[0]
> page_two.merge_page(page_one)
tests/test_page.py:1523:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pypdf/_page.py:1062: in merge_page
self._merge_page(page2, over=over, expand=expand)
pypdf/_page.py:1080: in _merge_page
return self._merge_page_writer(
pypdf/_page.py:1235: in _merge_page_writer
aa = a.clone(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = {'/Type': '/Annot', '/Subtype': '/Polygon', '/Vertices': [50, 550, 200, 650, 70, 750, 50, 700], '/IT': '/PolygonCloud', '/Rect': RectangleObject([50, 550, 200, 750]), '/P': IndirectObject(4, 0, 140694769602768)}
pdf_dest = <pypdf._writer.PdfWriter object at 0x7ff60dc3a310>
force_duplicate = True, ignore_fields = ('/P', '/StructParent', '/Parent')
def clone(
self,
pdf_dest: PdfWriterProtocol,
force_duplicate: bool = False,
ignore_fields: Optional[Sequence[Union[str, int]]] = (),
) -> "DictionaryObject":
"""Clone object into pdf_dest."""
try:
if self.indirect_reference.pdf == pdf_dest and not force_duplicate: # type: ignore
return self
except Exception:
pass
visited: set[tuple[int, int]] = set() # (idnum, generation)
print(type(self), self)
d__ = cast(
"DictionaryObject",
> self._reference_clone(self.__class__(), pdf_dest, force_duplicate),
)
E TypeError: Polygon.__init__() missing 1 required positional argument: 'vertices'
pypdf/generic/_data_structures.py:297: TypeError