You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This commit was created on GitHub.com and signed with GitHub’s verified signature.
0.15.2
Enhancements
Improve directory handling when extracting image blocks. The figures directory is no longer created when the extract_image_block_to_payload parameter is set to True.
Features
Added per-class Object Detection metrics in the evaluation. The metrics include average precision, precision, recall, and f1-score for each class in the dataset.
Fixes
Updates NLTK data file for compatibility with nltk>=3.8.2. The NLTK data file now container punkt_tab, making it possible to upgrade to nltk>=3.8.2. The nltk==3.8.2 patches CVE-2024-39705.
Renames Astra to Astra DB Conforms with DataStax internal naming conventions.
Accommodate single-column CSV files. Resolves a limitation of partition_csv() where delimiter detection would fail on a single-column CSV file (which naturally has no delimeters).
Accommodate image/jpg in PPTX as alias for image/jpeg. Resolves problem partitioning PPTX files having an invalid image/jpg (should be image/jpeg) MIME-type in the [Content_Types].xml member of the PPTX Zip archive.
Fixes an issue in Object Detection metrics The issue was in preprocessing/validating the ground truth and predicted data for object detection metrics.
Removes dependency on unstructured.pytesseract Unstructured forked pytesseract while waiting for code to be upstreamed. Now that the new version has been released, this fork can be removed.