You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This commit was created on GitHub.com and signed with GitHub’s verified signature.
0.15.10
Enhancements
Enhance pdfminer element cleanup Expand removal of pdfminer elements to include those inside all non-pdfminer elements, not just tables.
Modified analysis drawing tools to dump to files and draw from dumps If the parameter analysis of the partition_pdf function is set to True, the layout for Object Detection, Pdfminer Extraction, OCR and final layouts will be dumped as json files. The drawers now accept dict (dump) objects instead of internal classes instances.
Vectorize pdfminer elements deduplication computation. Use numpy operations to compute IOU and sub-region membership instead of using simply loop. This improves the speed of deduplicating elements for pages with a lot of elements.