-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problems with cyrillic symbols #65
Comments
? |
Sorry, pressed 'Ctrl+Enter' |
Can you provide or give me an idea of what I should test? Feel like this problem was solved awhile ago but there may have been a regression. |
Now I found that for both types of files that is the samo problem |
test.docx And result: |
If file have only latin characters then 'Textract' work correct, as it can be. |
This should be gtg. Was only happening for Try |
Im having the exact same issue This is my config.
NOTE: This only happend with .doc files im on a MAC also. |
Can you give me a sample doc? (And maybe open a new issue with it to track?) |
Sure. Here is #71 Thanks |
When I execute js file with node.js with following content(for example with .doc file):
var textract = require('textract');
textract.fromFileWithPath('test.doc', function( error, text ) {
if (error) throw error;
console.log(text);
})
with .doc file, all cyrillic symbols ureadable (but when I execute Catdoc, then I can read it)
and with .docx file all cyrillic symbols removes.
The text was updated successfully, but these errors were encountered: