Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Links with "_" in the domain name are not regarded as links #95

Open
ZibanPirate opened this issue Jan 17, 2021 · 2 comments
Open

Links with "_" in the domain name are not regarded as links #95

ZibanPirate opened this issue Jan 17, 2021 · 2 comments

Comments

@ZibanPirate
Copy link

what is the issue?

Links with "_" in the domain name, for eg:

are not regarded as links, which is no true, see :
https://stackoverflow.com/a/2183140/8113942

the same goes for fuzzy links, for eg:

  • api_stage.dz_code.io
  • api_stage.dz_code.io
@rlidwka
Copy link
Member

rlidwka commented Apr 18, 2022

As far as I've been able to research, api_stage.dzcode.io is an alias for api-stage.dzcode.io, and dz_code.io simply isn't a thing.

Please provide an example of widely used domains with underscores in them.

Underscores in domain names are very rare because:

Linkify-it isn't meant to find every single link (which is impossible), so we have to restrict ourselves to the most common cases. I'm not sure if domains with underscores are worth supporting, especially given false-positive potential of them being introduced in fuzzy links.

@domakas
Copy link

domakas commented Dec 27, 2023

Is it possible we get this resolved already? It seems like we are discussing whether this is a valid case or not, but it's obvious that there are cases like this around the web. This library has 100% test coverage, so it's safe to add this change without worrying it would break something. We hear "false-positive potential" mentioned before, but what are the exact cases which could be false-positives?

There is also other option that gets suggested - to use onCompile to override src_domain regexp, however, since most of the regexps are dependant on one of another this simple change needs to be applied like this:

LinkifyIt.prototype.onCompile = function onCompile() {
  const re = this.re;
  const text_separators = '[><\uff5c]';

  re.src_domain =
    '(?:' +
    re.src_xn +
    '|' +
    '(?:' + re.src_pseudo_letter + ')' +
    '|' +
    '(?:' + re.src_pseudo_letter + '(?:-|_|' + re.src_pseudo_letter + '){0,61}' + re.src_pseudo_letter + ')' +
    ')';

  re.src_host =
    '(?:' +
    '(?:(?:(?:' + re.src_domain + ')\\.)*' + re.src_domain/* _root */ + ')' +
    ')';

  re.tpl_host_fuzzy =
    '(?:' +
    re.src_ip4 +
    '|' +
    '(?:(?:(?:' + re.src_domain + ')\\.)+(?:%TLDS%))' +
    ')';

  re.src_host_strict =
    re.src_host + re.src_host_terminator;

  re.tpl_host_fuzzy_strict =
    re.tpl_host_fuzzy + re.src_host_terminator;

  re.src_host_port_strict =
    re.src_host + re.src_port + re.src_host_terminator;

  re.tpl_host_port_fuzzy_strict =
    re.tpl_host_fuzzy + re.src_port + re.src_host_terminator;

  re.tpl_email_fuzzy =
    '(^|' + text_separators + '|"|\\(|' + re.src_ZCc + ')' +
    '(' + re.src_email_name + '@' + re.tpl_host_fuzzy_strict + ')';

  re.tpl_link_fuzzy =
    '(^|(?![.:/\\-_@])(?:[$+<=>^`|\uff5c]|' + re.src_ZPCc + '))' +
    '((?![$+<=>^`|\uff5c])' + re.tpl_host_port_fuzzy_strict + re.src_path + ')';

  re.tpl_link_no_ip_fuzzy =
    '(^|(?![.:/\\-_@])(?:[$+<=>^`|\uff5c]|' + re.src_ZPCc + '))' +
    '((?![$+<=>^`|\uff5c])' + re.tpl_host_port_no_ip_fuzzy_strict + re.src_path + ')';

};

I don't think that's maintainable on our codebase.

I actually see couple of options here:

  1. Merge Bugfix for: Links with "_" in the domain name are not regarded as links #96 which adds test coverage for these cases and fixes the issue.
  2. Make this library extendable/configurable in a better way, which doesn't include having half of regexps codebase on consumer side, maintaining backwards compatibility.

Please make some kind of decision, as doing nothing and ignoring OS community issues for years is not a valid solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants