You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm not very familiar with nltk_contrib, so perhaps I'm just using it wrong...but it seems to fail regardless of whether I pass in a bytestring or unicode string to ReadabilityTool. I forked nltk_contrib and changed textanalyzer.py so that it takes unicode instead of bytes, and that seems to have fixed the problem for me.
I ran into a problem trying to apply the readability tests to a block of text with some UTF-8 characters (fancy quotes).
Sample text: http://pastebin.com/eRKGMGYn
Test script: http://pastebin.com/aE2DaRvk
I'm not very familiar with nltk_contrib, so perhaps I'm just using it wrong...but it seems to fail regardless of whether I pass in a bytestring or unicode string to ReadabilityTool. I forked nltk_contrib and changed textanalyzer.py so that it takes unicode instead of bytes, and that seems to have fixed the problem for me.
My fork: https://github.com/priceonomics/nltk_contrib
Can someone confirm the issue I'm seeing and whether my fix is appropriate? Feel free to merge it back if it's useful.
The text was updated successfully, but these errors were encountered: