You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
After the latest update, pdf mode no longer works. New lines seem to always get recognized as new sentences. To Reproduce
Steps to reproduce the behavior:
Input text - "This is a sentence\ncut off in the middle because pdf."
Expected behavior
Expected output - "This is a sentence\ncut off in the middle because pdf."
The text was updated successfully, but these errors were encountered:
@matthewmcintire Hey it's recommended to use doc_type="pdf" mode along with clean=True since cleaner trims those intermediate newlines and you would no longer be able to use char_span functionality since the original text gets modified.
Thanks for pointing out.
I will update tests to raise an exception and force the user to follow the above-mentioned usage.
Describe the bug
After the latest update, pdf mode no longer works. New lines seem to always get recognized as new sentences.
To Reproduce
Steps to reproduce the behavior:
Input text - "This is a sentence\ncut off in the middle because pdf."
Expected behavior
Expected output - "This is a sentence\ncut off in the middle because pdf."
The text was updated successfully, but these errors were encountered: