Skip to content

Conversation

pubpub-zz
Copy link
Collaborator

@pubpub-zz pubpub-zz commented Aug 28, 2022

Fixes #1273
Fixes #1279
Fixes #1292
Fixes #1294
Fixes #1295

ROB: Cope with xref starting on \r\n
ROB: Escaped octal code followed by decimal int
ROB: Cope with some corrupted entries in xref table
ROB: Extend xref autorepair cases

fixes py-pdf#1295
includes test file adjustment
@codecov
Copy link

codecov bot commented Aug 29, 2022

Codecov Report

Merging #1297 (4edf6f8) into main (3b74312) will decrease coverage by 0.40%.
The diff coverage is 82.01%.

@@            Coverage Diff             @@
##             main    #1297      +/-   ##
==========================================
- Coverage   95.07%   94.67%   -0.41%     
==========================================
  Files          30       30              
  Lines        4973     5106     +133     
  Branches     1023     1052      +29     
==========================================
+ Hits         4728     4834     +106     
- Misses        139      157      +18     
- Partials      106      115       +9     
Impacted Files Coverage Δ
PyPDF2/_reader.py 89.49% <72.52%> (-2.19%) ⬇️
PyPDF2/_page.py 94.36% <100.00%> (+<0.01%) ⬆️
PyPDF2/_writer.py 91.04% <100.00%> (-0.51%) ⬇️
PyPDF2/generic/_base.py 100.00% <100.00%> (+1.02%) ⬆️
PyPDF2/generic/_utils.py 100.00% <100.00%> (ø)
PyPDF2/types.py 100.00% <100.00%> (ø)
PyPDF2/_codecs/adobe_glyphs.py 100.00% <0.00%> (ø)
... and 1 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@pubpub-zz
Copy link
Collaborator Author

@MartinThoma,
Ready for review

@pubpub-zz
Copy link
Collaborator Author

stdby

fixes  py-pdf#1279 / Status_v1_Reviewers-Guide.pdf
* if chained xref/trailer are not good
* if the object header ('id' 'gen' obj) or if the object is not present in the xref table, will search the file for the object.

fixes  py-pdf#1273
reader = PdfReader(BytesIO(get_pdf_from_url(url, name=name)))
reader.xmp_metadata
assert exc.value.args[0].startswith("XML in XmpInformation was invalid")
assert exc.value.args[0].startswith("Stream length not defined")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this change? I guess the reader.xmp_metadata isn't even touched, is it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this PR, one could at least get the number of pages:

assert len(reader.pages) == 5

I guess with this PR it no longer works?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to modify the test result. I did not analyze further

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this PR, one could at least get the number of pages:

assert len(reader.pages) == 5

I guess with this PR it no longer works?

under analysis

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PDF was corrupted : the XRef package had a /Length key corrupted. I've changed the code to discard the loading of the XRef object to allow the main program to recover to a maximum information : you can now get the metadata 😊
the access to number of pages is (still?) possible

@pubpub-zz
Copy link
Collaborator Author

I had to merge iss_1292 to have a global PR.

this PR is now complete

@MartinThoma MartinThoma changed the title ENH : Process XRefStm ENH: Process XRefStm Sep 2, 2022
pubpub-zz and others added 2 commits September 3, 2022 09:41
Co-authored-by: Martin Thoma <[email protected]>
Co-authored-by: Martin Thoma <[email protected]>
@pubpub-zz
Copy link
Collaborator Author

5 sec before me 😝

@MartinThoma
Copy link
Member

I'll look into applying black automatically in the CI as an extra commit today 😄

Also, I want to make flake8 run in parallel to the tests and mypy after pytest so that I can still see issues there in a failed run.

@pubpub-zz
Copy link
Collaborator Author

I don't think it worth it.
the line missing came from the code review.
One thing I've noticed is that 3.10 check is performed twice. Do you know why ?(for energy saving)

@MartinThoma
Copy link
Member

One thing I've noticed is that 3.10 check is performed twice.

It's a different test scenario. pycryptodome is removed in that test run.

@MartinThoma MartinThoma merged commit 1252a49 into py-pdf:main Sep 3, 2022
@pubpub-zz pubpub-zz deleted the XRefStm branch September 3, 2022 19:53
MartinThoma added a commit that referenced this pull request Sep 4, 2022
Version 2.10.5, 2022-09-04
--------------------------

New Features (ENH):
-  Process XRefStm (#1297)
-  Auto-detect RTL for text extraction (#1309)

Bug Fixes (BUG):
-  Avoid scaling cropbox twice (#1314)

Robustness (ROB):
-  Fix offset correction in revised PDF (#1318)
-  Crop data of /U and /O in encryption dictionary to 48 bytes (#1317)
-  MultiLine bfrange in cmap (#1299)
-  Cope with 2 digit codes in bfchar (#1310)
-  Accept '/annn' charset as ASCII code (#1316)
-  Log errors during Float / NumberObject initialization (#1315)
-  Cope with corrupted entries in xref table (#1300)

Documentation (DOC):
-  Migration guide (PyPDF2 1.x \xe2\x9e\x94 2.x) (#1324)
-  Creating a coverage report (#1319)
-  Fix AnnotationBuilder.free_text example (#1311)
-  Fix usage of page.scale by replacing it with page.scale_by (#1313)

Developer Experience (DEV):
-  Only run coverage for PyPDF2

Maintenance (MAINT):
-  PdfReaderProtocol (#1303)
-  Throw PdfReadError if Trailer can't be read (#1298)
-  Remove catching OverflowException (#1302)

Full Changelog: 2.10.4...2.10.5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants