You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If the value of the parameter is already percent-encoded (in my scenario, this value came from a JSON response of an API), it is not possible to specify safe characters that should not be encoded. This argument is available in urlencode signature: https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urlencode
My suggestion is to include a safe argument in add_or_replace_parameter to improve the control over the encoding of your values:
Actual: w3lib.url.add_or_replace_parameter (url, name, new_value)
New: w3lib.url.add_or_replace_parameter(url, name, new_value, safe='')
In [1]: from urllib.parse import urlencode
...: from w3lib.url import add_or_replace_parameter
...: # This came from an API response already encoded
...: full_hierarchy = 'Appliances_Air+Purifiers+%26+Dehumidifiers_Air+Purifiers'
...: url = 'http://example.com'
In [2]: urlencode({'hierarchy': full_hierarchy})
Out[2]: 'hierarchy=Appliances_Air%2BPurifiers%2B%2526%2BDehumidifiers_Air%2BPurifiers'
In [3]: urlencode({'hierarchy': full_hierarchy}, safe='+%')
Out[3]: 'hierarchy=Appliances_Air+Purifiers+%26+Dehumidifiers_Air+Purifiers'
In [4]: add_or_replace_parameter(url, 'hierarchy', full_hierarchy)
Out[4]: 'http://example.com?hierarchy=Appliances_Air%2BPurifiers%2B%2526%2BDehumidifiers_Air%2BPurifiers'
The text was updated successfully, but these errors were encountered:
Do you have any other use cases for overriding safe characters?
It looks like in your case a workaround could be to unescape parameters before passing them to add_or_replace_parameter, as this function works with raw, unescaped parameter values - these are values server gets after decoding.
Allowing to override safe characters can be seen as a performance optimization, a hack to avoid unescaping and re-escaping, unless you see other use cases for it. While it may solve a problem, I find the resulting API unintuitive. E.g. does safe parameter affect only parameters, or does it affect encoding of the original URL as well? What should be its value when you want to pass already-escaped parameters, or just use something as-is?
add_or_replace_parameter uses urlencode function to convert a string to percent-encoded ASCII string:
https://github.com/scrapy/w3lib/blob/master/w3lib/url.py#L229
If the value of the parameter is already percent-encoded (in my scenario, this value came from a JSON response of an API), it is not possible to specify safe characters that should not be encoded. This argument is available in urlencode signature:
https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urlencode
My suggestion is to include a safe argument in add_or_replace_parameter to improve the control over the encoding of your values:
Actual: w3lib.url.add_or_replace_parameter (url, name, new_value)
New: w3lib.url.add_or_replace_parameter(url, name, new_value, safe='')
The text was updated successfully, but these errors were encountered: