You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This commit was created on GitHub.com and signed with GitHub’s verified signature.
0.16.4
Enhancements
value attribute in <input/> element is parsed to OntologyElement.text in ontology
id and class attributes removed from Table subtags in HTML partitioning
cleaned to_html and newly introduced to_text in OntologyElement
Elements created from V2 HTML are less granular Added merging of adjacent text elements and inline html tags in the HTML partitioner to reduce the number of elements created from V2 HTML.
Features
Add support for link extraction in pdf hi_res strategy. The partition_pdf() function now supports link extraction when using the hi_res strategy, allowing users to extract hyperlinks from PDF documents more effectively.