Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 0.9.3 #969

Merged
merged 63 commits into from
Nov 15, 2024
Merged

Release 0.9.3 #969

merged 63 commits into from
Nov 15, 2024

Conversation

myhloli
Copy link
Collaborator

@myhloli myhloli commented Nov 15, 2024

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.

Motivation

Please describe the motivation of this PR and the goal you want to achieve through this PR.

Modification

Please briefly describe what modification is made in this PR.

BC-breaking (Optional)

Does the modification introduce changes that break the backward compatibility of the downstream repositories?
If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.

Use cases (Optional)

If this PR introduces a new feature, it is better to list some use cases here and update the documentation.

Checklist

Before PR:

  • Pre-commit or other linting tools are used to fix the potential lint issues.
  • Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests.
  • The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
  • The documentation has been modified accordingly, like docstring or example tutorials.

After PR:

  • If the modification has potential influence on downstream or other related projects, this PR should be tested with those projects.
  • CLA has been signed and all committers have signed the CLA in this PR.

myhloli and others added 30 commits November 6, 2024 18:04
docs(README): update badges
- Implement xycut algorithm to sort blocks when layoutreader fails
- Add recursive_xy_cut function to perform the xycut algorithm- Update pdf_parse_union_core_v2.py to use xycut when layoutreader fails
- Modify draw_bbox.py to handle cases where layoutreader fails to sort blocks
feat(model): add xycut algorithm for block sorting
- Decrease the maximum line count from 512 to 316 for layoutreader
- Lower the line count threshold from 316 to 200 to ensure compatibility
- This change aims to prevent potential issues with layoutreader's maximum line support
refactor(pdf_parse): adjust line count threshold for layoutreader
- Add RapidTable model support for table recognition
- Update table model configuration and initialization
- Modify table recognition process to use RapidTable when specified
- Add RapidTable dependency to setup.py
- Change the default table model from TABLE_MASTER to RAPID_TABLE
feat(table): integrate RapidTable model for table recognition
- Add missing '.jpg' file type to the list of allowed file types for upload
fix(gradio-app): add missing file type in upload
… output

- Add orig_model_list parameter to maintain original model data
- Deep copy model_json and pipe.model_list to preserve data integrity
- Update json_md_dump function call to include orig_model_list
- Improve condition check for empty model_json
refactor(magic_pdf_parse_main): optimize model data handling and JSON output
Modify the test directory
- Update test_image2html to use unittest framework
- Add more assertions
test(table): improve ppTableModel test coverage
- Integrate RapidOCR with RapidTable model for table recognition
- Improve memory management for devices with <= 8GB VRAM
- Update table recognition process to use RapidOCR for RapidTable
- Add rapidocr-paddle dependency in setup.py
feat(table): add RapidOCR support for RapidTable model
- Add DocLayout-YOLO repository link
- Add RapidTable repository link
myhloli and others added 27 commits November 11, 2024 15:25
docs(README_ja-JP.md): update warning message and remove outdated content
fix(para_split_v3): Fix IndexError in para_split_v3.py for empty line handling
- Update the URL for downloading the model setup script in Dockerfile
- Upgrade struct-eqtable to version 0.3.2 and remove pypandoc
- Add new dependencies: einops, accelerate, doclayout_yolo, rapidocr-paddle, and rapid_table
build(Dockerfile): update model download script and dependencies
- Add digit check for single-character content to avoid adding unnecessary spaces
fix(ocr_mkcontent): improve handling of single-character content #937
Co-authored-by: xu rui <[email protected]>
…tial PDFs due to file corruption or non-standard format by forcing a re-print.
fix(parse_pipeline): Resolve post-processing exceptions caused by partial PDFs due to file corruption or non-standard format by forcing a re-print.
- Rename ppTableModel to TableMasterPaddleModel in test_tablemaster.py
refactor(model): rename and restructure model modules
docs:update docs for 0.9.3
docs(README): update project references and translations
@myhloli myhloli merged commit 845a3ff into master Nov 15, 2024
1 of 2 checks passed
Copy link
Contributor


Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.


I have read the CLA Document and I hereby sign the CLA


1 out of 3 committers have signed the CLA.
✅ (hyastar)[https://github.com/hyastar]
@xu rui
@DTwz
xu rui seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot.

@github-actions github-actions bot locked and limited conversation to collaborators Nov 15, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants