Skip to content

Releases: VikParuchuri/marker

Marker Bugfixes and Improvements to `pdftext`

12 Dec 19:09
9185517
Compare
Choose a tag to compare

What's Changed

  • Fix chunk_convert.sh to handle output_dir correctly by @Leon-Sander in #415
  • pdftext Improvements and Misc Bugfixes by @VikParuchuri and @iammosespaulr in #422
    • Blank page and TOC bugfixes
    • Fix README.md and updated examples
    • Update to the latest pdftext release, incorporating heuristic-based segmentation for enhanced performance and accuracy
    • Update surya and tabled dependencies, incorporating various bugfixes.

New Contributors

Full Changelog: v1.0.2...v1.1.0

Bugfixes - python 3.10 compatibility, quotes, images

03 Dec 21:12
6ded3b9
Compare
Choose a tag to compare
  • Fix issue with python 3.10
  • Fix positions of quote characters
  • Change default image output type to JPEG for speed and smaller filesize with minimal quality loss

Bugfixes and parsing improvements

03 Dec 01:16
f446e56
Compare
Choose a tag to compare
  • Fix lots of misc bugs, including encoding, empty page problems, and image rendering
  • Improve list processing with joining and nesting
  • Add in blockquotes
  • Slightly improve performance

What's Changed

Full Changelog: v1.0.0...v1.0.1

Marker v1!

27 Nov 17:47
75091a0
Compare
Choose a tag to compare

This is the release of marker v1, a complete rewrite from scratch.

  • 2x faster due to a new layout model
  • Consistent internal schema for blocks and pages
  • Modular architecture with processors and renderers that can easily be overridden
  • JSON chunk and markdown output
  • Lots of units tests
  • Much higher output quality

What's Changed

New Contributors

Full Changelog: v0.3.10...v1.0.0

Performance improvements, API server

31 Oct 15:20
b8a8736
Compare
Choose a tag to compare
  • Improve performance by 10-15%
  • Add a simple API server for local use-cases

Flatten PDF, fix page separators, fix torch/transformers bugs

25 Oct 17:04
b2cae2e
Compare
Choose a tag to compare
  • Fix issues with transformers 4.46 and torch 2.5
  • Improve page separators - they now appear that the start of the page, and show the page number
  • Flatten form fields into the PDF before extracting markdown

Fix table bug

24 Oct 17:55
1b4b413
Compare
Choose a tag to compare
  • Fix bug that caused conversion to fail when start_page was set and the document had tables

Undo threads

23 Oct 17:37
8e28b05
Compare
Choose a tag to compare

Threads cause issues on a small % of devices

Speedups, bug fixes

23 Oct 16:30
6b25f06
Compare
Choose a tag to compare
  • Fix some edge case OCR bugs
  • ~20% end to end speedup by improving layout and text detection

Fix OCR bugs

23 Oct 02:12
2f3f0d7
Compare
Choose a tag to compare
  • Fix bbox issue with OCR and resizing
  • Fix issue with layout bboxes missing after OCR