-
Notifications
You must be signed in to change notification settings - Fork 356
python3 port ? #50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
python3 port ? #50
Conversation
|
I'd like to support python 2.5 and 2.6 too, and so to have python3 support in a different branch. |
|
Python 2.5 and 2.6 are quite old, even Django or big libraries like this don't support them anymore… But yeah, that's your choice… But yeah, so why use str ? Make readability accept only unicode input, and then unicode everywhere in readability, that's way simpler. Ok ok, BTW I'm really interested in this python library. Before, I was using a custom solution, and I have discovered that's really not easy… |
Well, mostly I do updates for my users -- I have rarely a chance to use the package myself more than once a year (until this year). I mean, for more than several sites at one time.
Libxml, which is the base of lxml, uses utf-8 under the cover. You'll get automatic conversion to utf-8 anyway, it just really a matter if you would like to see that implicit or explicit. For older lxml and python 2 there were no implicit utf8/unicode conversions, that's why I used explicit one. Maybe things has changed a little.
Except that in real life requests package doesn't work for a lot of real pages.
Yes, I know this, but most of all I'm interested in the scalable approaches. If one parses only several sites and some pages from them -- that's ok to use almost any tool, but if one parses thousands of sites -- you need a tool that won't break and won't need much customization for every specific site.
|
|
Thanks a lot! |
Hi,
I'm very sad this library is not ported to python3.
I have made a port, that is quite different than the one of @Ftzeng as I have removed all the encoding stuff, and the tests still seems to pass with python2.7 & python3. I use requests for downloading the webpages and detecting the correct encoding.
I'm nearly sure there is still work to do, as my tests were very shallow, but I would really like to have a port done… (and that encoding.py is quite… bad, I think : you use utf-8 strings everywhere and there should be no problem).
What do you think ? :)