-
-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature: private tlds can be used at call-time #207
feature: private tlds can be used at call-time #207
Conversation
8b9bcc3
to
271ff63
Compare
- Adds `include_psl_private_domains` to the `__call__` method. This is now something you can choose on a per-call basis. The object level argument now is only a default value for each call. - The entire dataset from publicsuffix.org is saved to cache - Ensured no weird cache issues happen when using with different `suffix_list_urls` by using different filenames per `suffix_list_urls` - Use filelock to support multiprocessing and multithreading use cases - Updates the bundled snapshot to be the raw publicsuffix data. Need to look at performance impact of this. - Breaking change cache_file => cache_dir - various other cleanups
271ff63
to
d68e6de
Compare
@john-kurkowski for your consideration. This replaces #144. I think it's cleaner than the first attempt. Hopefully we can merge this faster than the last PR 😄 |
Let's! 👍 😅 Will look later today. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense! Clean! Can merge in the next couple days.
Nice catches. Addressed your feedback. Left as a separate commit but will squash it once you've looked at it |
Oh one thing I should point out is I deleted that snapshot diff functionality. Admittedly was just feeling a little lazy at figuring out how to implement that again. Can figure it out if it's important though. |
The lock needed to wrap the retrieval, parsing, and storage of the suffix lists
Glad you pointed that out. I think it's fine without. I'll note in the CHANGELOG. |
Thank you very much for this much requested feature, and seeing it through years! |
Thanks! I hope i didn’t sneak that multiprocessing test and related fix by you ( the last two commits) |
Tests and fixes always welcome. I see no sneaking. 🙏 |
https://build.opensuse.org/request/show/843031 by user mia + dimstar_suse - Update to 3.0.0: This release fixes the long standing bug that public and private suffixes were generated separately and could not be switched at runtime, john-kurkowski/tldextract#66 * Breaking Changes + Rename `cache_file` to `cache_dir` as it is no longer a single file but a directory (john-kurkowski/tldextract#207) + Rename CLI arg also, from `--cache_file` to `--cache_dir` + Remove Python 2.7 support * Features + Can pass `include_psl_private_domains` on call, not only on construction + Use filelocking to support multi-processing and multithreading environments * Bugfixes + Select public or private suffixes at runtime (https://github.com/john-kurkowski/tldextract/issu
Fantastic changes, great job In [1]: from tldextract import TLDExtract
In [2]: TLDExtract.tlds
Out[2]: <property at 0x7faa2da45c50>
In [3]: t = TLDExtract()
In [4]: t.tlds
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-4-2bd415795d39> in <module>
----> 1 t.tlds
~/anaconda3/envs/ocean/lib/python3.7/site-packages/tldextract/tldextract.py in tlds(self)
243 @property
244 def tlds(self):
--> 245 return self._get_tld_extractor().tlds
246
247 def _get_tld_extractor(self):
AttributeError: '_PublicSuffixListTLDExtractor' object has no attribute 'tlds' |
🤨 hmm I suspect sarcasm Yes it does seem we missed that. Given that we now track public and private tlds we probably won't be able to support this ambiguous property. Perhaps a public_tlds and a private_tlds? |
Yes, thats sensible names. |
Are these results correct?
Also this:
|
I can't know whether those results are correct without knowing what calls produced them. The second issue you pointed out is another oversight. I've made a PR to address these issues here: #210 |
We were using wrapped patched version of tldextract to properly handle private domains and invoke tldextract with private parameter if the domain is private and vice versa. Above are our tests
|
This is a second attempt at doing what was done in #144.
Addresses #66
include_psl_private_domains
to the__call__
method. This is now something you can choose on a per-call basis. The object level argument now is only a default value for each call.suffix_list_urls
by using different filenames persuffix_list_urls
cache_file
=>cache_dir