-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apostrophe and Double Quote Parsing Issue #86
Comments
? |
I've been busy with work. Do you think you can write up a failing test? Have you asked the Tika project for assistance? |
Not sure but I think this may be related to how our content handler is handling the quotes in content. |
Hi KevM, I work with rbower54 and wanted to test this, but we download this component using NuGet from VS. I am not sure how to get a build of the package(s) that has this fix in: the stuff VS shows seems rather old (January). |
I uploaded a pre-release nuget. Go ahead and give it a try. https://www.nuget.org/packages/TikaOnDotnet.TextExtractor/1.14.2-pre |
Hi,
I'm running into a small issue with TikaOnDotNet where we've parsed in some *.doc documents that contain quoted strings and apostrophes such as:
Robert's famous quote of "I Love TikaOnDotNet" and some more words that follow it.
The parsing will result in the following literal results:
Robert famous quote of Love TikaOnDotNetand some more words that follow it.
It strips the apostrophe and double quotes and the characters/spaces which follow them.
Any guidance would be greatly appreciated!
Thanks
The text was updated successfully, but these errors were encountered: