Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: compress PDF merging identical objects #2794

Closed
pubpub-zz opened this issue Aug 11, 2024 · 0 comments · Fixed by #2795
Closed

ENH: compress PDF merging identical objects #2794

pubpub-zz opened this issue Aug 11, 2024 · 0 comments · Fixed by #2795

Comments

@pubpub-zz
Copy link
Collaborator

pubpub-zz commented Aug 11, 2024

@santhosh-sankaran reported in discussion #2728 that pdf compressed with PyPDF2 are smaller than the one generated by pypdf
Files_Generated_Compressed
After analysis, it has been observed that PyPDF2 was keeping a single copy of similar objects even if the original copy of the document has many copies of the objects (referenced by different indirect objects).
pypdf keeps the same objects as the original document.

a new function can be proposed to use a single copy of the objects.

example of source pdf:
Generated_wean_first_20_pages.pdf

this file size could be reduced by more than 50%

pubpub-zz added a commit to pubpub-zz/pypdf that referenced this issue Aug 11, 2024
add compress_identical_objects()
discovered in py-pdf#2728
closes py-pdf#2794
closes py-pdf#2768
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant