Support function urlparse #20

malloxpb · 2018-06-15T21:26:47Z

Hey @lopuhin , as I am working on adding more funcs to the project as mentioned in #18, I realized to support urlparse, we will need to add some options to the current urlsplit function, particularly option scheme and allow_fragments.

Currently, I know that if the input does not have scheme and the scheme option is specified, then the output will contain the input scheme. That's why I created this PR. Do you think that is correct? Please let me know 😄

lopuhin

@nctl144 based on the testing I did, this looks correct. Do we have tests that check passing scheme?

lopuhin · 2018-06-18T08:40:02Z

urlparse4/cgurl.pyx

@@ -221,14 +223,14 @@ class SplitResultNamedTuple(tuple):
        return stdlib_urlunsplit(self)


-def urlsplit(url):
+def urlsplit(url, scheme='', allow_fragments=True):


Is allow_fragments used?

Do you mean is it used in Scrapy? I included it because it is used in function urlparse (https://github.com/python/cpython/blob/master/Lib/urllib/parse.py#L373). I don't really know if we should include it in this project since allow_fragments' only use is combining fragments with query.

lopuhin · 2018-06-18T08:41:02Z

urlparse4/cgurl.pyx

@@ -245,3 +247,18 @@ def urljoin(base, url, allow_fragments=True):
        return joined_url

    return stdlib_urljoin(base, url, allow_fragments=allow_fragments)
+#


did you mean to uncomment it? or add in anotheer PR?

I will work on implementing this function now, Konstantin!

malloxpb · 2018-06-18T18:30:22Z

Hey @lopuhin , we do have urlsplit test with scheme passed https://github.com/nctl144/urlparse4/blob/master/tests/test_urlparse.py#L764

malloxpb · 2018-06-18T21:19:55Z

Hey Konstantin, after checking the performance test, I got the performance of urlparse is 0.19 sec (if it does not have to decode the result) and 0.22sec (if it has to decode the result), compared to 0.13 and 0.15 sec of urlsplit. There might still be a way to further optimize this function 😄

malloxpb · 2018-06-18T21:20:18Z

I will work on the failed tests now 😄 particularly creating a tuple class for this function

lopuhin

Looks good, please have a look at using literal byte strings, I didn't put comment on every instance.

lopuhin · 2018-06-26T15:06:02Z

urlparse4/cgurl.pyx


 cimport cython


+uses_params = [scheme.encode('utf-8') for scheme in ['', 'ftp', 'hdl',


can be written as [b'', b'ftp', ..., without the need to call encoding, I think it will be much more clear

lopuhin · 2018-06-26T15:07:22Z

urlparse4/cgurl.pyx

+        ParseStandardURL(url, len(url), parsed)
+    elif CompareSchemeComponent(url, url_scheme, kMailToScheme):
+        """
+        Discuss: Is this correct?


the logic looks good to me, kMailToScheme -> ParseMailtoURL, what would you like to discuss?

I will create an issue to discuss about this. I believe there's a test that urlparse4 did not pass because of this 😄

lopuhin · 2018-06-26T15:09:00Z

urlparse4/cgurl.pyx

+    """
+    this function can be modified to enhance the performance?
+    """
+    slash, semcol = '/'.encode('utf-8'), ';'.encode('utf-8')


'/'.encode('utf-8') can be replaced with b'/', and same below

malloxpb added 3 commits June 15, 2018 15:57

add urlparse function from stdlib

4d6870a

add scheme option for urlsplit

40e96aa

recompile cython py3

893bdc3

lopuhin reviewed Jun 18, 2018

View reviewed changes

malloxpb added 7 commits June 18, 2018 13:32

add test which includes extra func input for urlsplit

2f01138

copy urlparse source from urllib

770a058

compile cython

29cb1aa

add performance test for urlparse

1570854

add urlparse into urlparse4

00a33ae

adapt urlparse to the proj

ab04c54

compile cython

bdcee5b

malloxpb added 16 commits June 19, 2018 10:33

reorganize code

56418d0

move class methods to a func

762b03a

DRY code

a2de522

prep for creating extra classes

9bb69cc

add note to the func

88ae788

add docstring to the func

f4a487c

reorganize the code

23f52f1

use the classes to inherit extra prop

39e2de0

correct function type

b2c0703

add extra attributes

01a7de6

compile in cython

7a4878c

correct class name

e05fae9

recompile cython

6d510b1

re-correct class method call

5007f00

recompile cython

088655c

correct the get attr func

0a8fbde

malloxpb added 21 commits June 19, 2018 21:33

make the parsed result named tuple work

c3fffe4

enable params property for urlparse

0d2ca05

reorganize code

005facd

compile cython

6e63818

Merge branch 'urlparse_fix' into urlparse

98d848f

recompile

33fe4dc

add some notes

37576c8

this test is failed as expected

f8c9a99

fix path parsing func

e55cd84

compile cython

42a9e2d

fix params type

505cce2

compile cython

0da8526

mark test as failed and state reason

437729e

skip no scheme test for now

6a5a13e

move func to the right place

a0a1ee8

fix hostname issue

9224e6b

cython compile

d46dfe9

marked the test as failed for now

9c46280

recompile cython

5807186

cython some functions

622d510

fix input type

1e63797

malloxpb closed this Jun 25, 2018

malloxpb reopened this Jun 26, 2018

lopuhin reviewed Jun 26, 2018

View reviewed changes

malloxpb added 4 commits June 26, 2018 11:36

fix bytes string encode and some conditions

5105245

compile cython

9a49740

fix splitparams condition

8924d3f

compile cython

9fdd4ce

malloxpb merged commit d3e8eee into master Jun 26, 2018

malloxpb deleted the urlparse branch June 26, 2018 19:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support function urlparse #20

Support function urlparse #20

malloxpb commented Jun 15, 2018 •

edited

Loading

lopuhin left a comment

lopuhin Jun 18, 2018

malloxpb Jun 18, 2018

lopuhin Jun 18, 2018

malloxpb Jun 18, 2018

malloxpb commented Jun 18, 2018

malloxpb commented Jun 18, 2018 •

edited

Loading

malloxpb commented Jun 18, 2018 •

edited

Loading

lopuhin left a comment

lopuhin Jun 26, 2018

lopuhin Jun 26, 2018

malloxpb Jun 26, 2018

lopuhin Jun 26, 2018


		cimport cython


		uses_params = [scheme.encode('utf-8') for scheme in ['', 'ftp', 'hdl',

Support function urlparse #20

Support function urlparse #20

Conversation

malloxpb commented Jun 15, 2018 • edited Loading

lopuhin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

malloxpb commented Jun 18, 2018

malloxpb commented Jun 18, 2018 • edited Loading

malloxpb commented Jun 18, 2018 • edited Loading

lopuhin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

malloxpb commented Jun 15, 2018 •

edited

Loading

malloxpb commented Jun 18, 2018 •

edited

Loading

malloxpb commented Jun 18, 2018 •

edited

Loading