Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cannot get info for sa.gov.au #313

Closed
gsouf opened this issue Jun 3, 2021 · 8 comments
Closed

cannot get info for sa.gov.au #313

gsouf opened this issue Jun 3, 2021 · 8 comments

Comments

@gsouf
Copy link

gsouf commented Jun 3, 2021

Issue summary

We use pdp to determine whether a given url uses a valid domain name.

As per issue #251 the library does not allow to determine whether or not sa.gov.au is a valid hostname, when it is actually. Please visit it at https://sa.gov.au/

Standalone code, or other way to reproduce the problem

$manager = new Manager(new Cache(), new CurlHttpClient());
var_dump($manager->getRules()->resolve('sa.gov.au'));

Expected result

anything that helps to tell that sa.gov.au is a valid hostname

Actual result

object(Pdp\Domain)#66 (7) {
  ["domain"]=>
  string(9) "sa.gov.au"
  ["registrableDomain"]=>
  NULL
  ["subDomain"]=>
  NULL
  ["publicSuffix"]=>
  NULL
  ["isKnown"]=>
  bool(false)
  ["isICANN"]=>
  bool(false)
  ["isPrivate"]=>
  bool(false)
}

Happy to give contributions if required.

Thanks

@nyamsprod
Copy link
Collaborator

nyamsprod commented Jun 3, 2021

@gsouf it seems that you are using the v5 version which is EOL. As far as I am concerned you should use v6 for which you would get a clearer understanding of the expected behaviour.

In v6 the following code throws an exception:

<?php

use Pdp\Rules;

$rules = Rules::fromPath('path-to-your-local-copy-of-public-suffix-list.txt');
$rules->getICANNDomain('sa.gov.au');
//throw  Pdp\UnableToResolveDomain: The public suffix and the domain name are is identical `sa.gov.au`.

Like explained in #251 sa.gov.au is a public suffix on its own (it is explicitly added to the PSL list) so you can't resolve it just like you can't resolve ac.be or co.uk. To resolve a domain it needs to at least have a subdomain attached to it so to summarise:

  • bbc.co.uk will be resolve but not co.uk
  • ulb.ac.be will be resolve but not ac.be

Hope this clarify your issue.

PS: in the current state a way to resolve this is to either have sa.gov.au removed from the list or maybe add a resolveSuffix method to the Rules class 🤔 .

@gsouf
Copy link
Author

gsouf commented Jun 3, 2021

@nyamsprod Thanks for the clear explanations and for the version notice. For the moment we bypass pdp for sa.gov.au to get things working.

I understand that sa.gov.au is in the PSL. That probably makes sense because it is a registrable domain (https://www.domainname.gov.au/apply-new-sagovau-domain-name).

However, my knowledge is certainly limited on this topic but, it seems that there is an inconsistency in this list because it mixes things that you cannot browse and are purely reserved for registering a FQDN. Like obviously a tld com or a generic second level domain like co.uk (ie url https://com and https://co.uk or email someone@com do not work). With specialized domains that resolve to something like gov.au or sa.gov.au or even ngrok.io ❓❓

Do you know if there is a way to distinct them somehow?

What are you thinking for the method resolveSuffix exactly?

@nyamsprod
Copy link
Collaborator

nyamsprod commented Jun 3, 2021

Do you know if there is a way to distinct them somehow?

Again if you upgrade to v6 you will hopefully get your answer as it exposes more strict methods:

  • Rules::getICANNDomain
  • Rules::getPrivateDomain
  • Rules::getCookieDomain

check https://github.com/jeremykendall/php-domain-parser#resolving-domains for more informations.

the resolveSuffix was just a thought but I think it does not make sense to implement it. the PSL is not just a suffix list it really is a collection of rules that can validate or invalidate suffixes. Hence the name of the class Rules.

@SaschaMai
Copy link

@gsouf Are you able to download www.sa.gov.au instead of sa.gov.au?

@gsouf
Copy link
Author

gsouf commented Jun 11, 2021

@SaschaMai that would definitely work but that's not the desired behavior, because the issue is not limited to this domain, but to any domain in the psl. If I added a www in front of each domain then things like www.com would be validated too. The tricky part is that technically we wouldn't want either things like co.uk to be validated. But it seems there is no easy way to achieve this.

I'll first upgrade to v6, probably next week and leave a feedback here on how I solved the problem as soon as I have got it working

@nyamsprod
Copy link
Collaborator

@gsouf what is the problem of having www.com being a valid domain AFAIK when registering a domain you are in fact registering a second level domain.

In other words, you can never registered sa.gov.au but you must register www.sa.gov.au.

Reason why I said that the issue, if issue there is must be taken to the PSL repo and not to the current package 😉

@gsouf
Copy link
Author

gsouf commented Jun 11, 2021

@nyamsprod because the proposition was to add "www" in front of the string to validate.

That means that if I'm trying to validate the string "com" I'd validate "www.com" that is valid, but does not make "com" alone valid.

as for sa.gov.au it is an actual website and people have email addresses with sa.gov.au ([email protected]), so even if it's not registrable it's still used and it is a valid domain name.

I'll try to add open a ticket in PSL repo but I'm not sure it's going anywhere

@nyamsprod
Copy link
Collaborator

@gsouf seems your issue has a relevant yet complicate issue already opened on the PSL repo see
publicsuffix/list#788.

TL;DR: definitely an issue on the upstream public suffix list and not one this package can fix/resolve.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants