Skip to content
This repository has been archived by the owner on Apr 19, 2024. It is now read-only.
This repository has been archived by the owner on Apr 19, 2024. It is now read-only.

URL regex doesn't recognise important TLDs like .cat #3

Closed
FauxFaux opened this issue Oct 9, 2011 · 7 comments
Closed

URL regex doesn't recognise important TLDs like .cat #3

FauxFaux opened this issue Oct 9, 2011 · 7 comments
Labels

Comments

@FauxFaux
Copy link
Owner

FauxFaux commented Oct 9, 2011

Hovering URLs like http://nyan.cat/ and clicking take you to http://nyan.ca/ , as the .cat TLD isn't recognised.

TLD specific code should probably be removed anyway, due to the introduction of arbitrary TLDs.

@ljani
Copy link

ljani commented Jan 25, 2012

There are also other problems with the regex. Hosts such as http://localhostr.com get detected as http://localhost/

@joosera
Copy link

joosera commented Jan 29, 2012

Also .pro is affected.
.[a-zA-Z][a-zA-Z] could possibly be changed to something like .[a-zA-Z]{3} (not sure what regex markup putty tray uses, this didn't seem to work when i manually edited the regex, though it didn't break the old behaviour of only hilighting the .pr part so not really sure what's going on there)

@wodim
Copy link

wodim commented Oct 17, 2012

Bump.

FauxFaux added a commit that referenced this issue Apr 30, 2013
FauxFaux added a commit that referenced this issue Apr 30, 2013
@incognico
Copy link

Why not just simplify the regexp? The current one leads to nowhere. Why not use something like (pcre syntax that is) \b((?:[hH][tT][tT][pP]://|[wW][wW][wW]\.)[^\s)'"]+) or even \b((?:[a-zA-Z]+://|[wW][wW][wW]\.)[^\s'")]+) for any protocol? I don't know if the used libary can do word boundaries but in general this regexp needs to be minimal. Imagine the new gTLDs are starting :p

@FauxFaux
Copy link
Owner Author

FauxFaux commented May 1, 2013

The aim of the regex has previously been to match only plausible URLs, not anything that could conceivably be a URL; I'm guessing this is what people expect (it's what I expect).

It's customisable; if you want to use something more liberal feel free.

Also, no, the regex engine supports basically nothing. #4 is to fix that, but it's hard work.

@incognico
Copy link

Everything that conceivably is a URL is also a plausible URL in my opinion.

So, if anyone is interested:
I am now using (([a-zA-Z]+://|[wW][wW][wW]\.)[^ '")>]+).

In contrast to the very complex default regexp which needs a lot of maintence it allows me to click the following URIs:

www.example.newgtld
http://foo.intern
http://hostname
http://user:[email protected]:8081/foo
whatever://protocol

I also giggled at * @(#)regexp.c 1.3 of 18 April 87

FauxFaux added a commit that referenced this issue Jun 1, 2013
FauxFaux added a commit that referenced this issue Jun 1, 2013
FauxFaux added a commit that referenced this issue Jul 13, 2013
FauxFaux added a commit that referenced this issue Jul 13, 2013
FauxFaux added a commit that referenced this issue Jul 14, 2013
Finally found a case where this breaks horribly:

* attempt launch an invalid url
* we panic, and think we can never launch a url again
* future non-http urls get run with the browser, with mixed results

While I could fix that actual bug, I'd rather remove the panic code,
which should never be being hit anyway.
@FauxFaux
Copy link
Owner Author

I've added a default option of @incognico's suggestion, and liberalised the "classic" default a bit.

I've additionally removed the nasty browser detection code, as this was just generally broken even launching urls like "www.google.com".

FauxFaux added a commit that referenced this issue Jul 14, 2013
FauxFaux added a commit that referenced this issue Jul 14, 2013
FauxFaux added a commit that referenced this issue Jul 14, 2013
Finally found a case where this breaks horribly:

* attempt launch an invalid url
* we panic, and think we can never launch a url again
* future non-http urls get run with the browser, with mixed results

While I could fix that actual bug, I'd rather remove the panic code,
which should never be being hit anyway.
FauxFaux added a commit that referenced this issue Aug 6, 2013
FauxFaux added a commit that referenced this issue Aug 6, 2013
FauxFaux added a commit that referenced this issue Aug 6, 2013
Finally found a case where this breaks horribly:

* attempt launch an invalid url
* we panic, and think we can never launch a url again
* future non-http urls get run with the browser, with mixed results

While I could fix that actual bug, I'd rather remove the panic code,
which should never be being hit anyway.
FauxFaux added a commit that referenced this issue Aug 7, 2013
FauxFaux added a commit that referenced this issue Aug 7, 2013
FauxFaux added a commit that referenced this issue Aug 7, 2013
Finally found a case where this breaks horribly:

* attempt launch an invalid url
* we panic, and think we can never launch a url again
* future non-http urls get run with the browser, with mixed results

While I could fix that actual bug, I'd rather remove the panic code,
which should never be being hit anyway.
FauxFaux added a commit that referenced this issue Aug 11, 2013
FauxFaux added a commit that referenced this issue Aug 11, 2013
FauxFaux added a commit that referenced this issue Aug 11, 2013
Finally found a case where this breaks horribly:

* attempt launch an invalid url
* we panic, and think we can never launch a url again
* future non-http urls get run with the browser, with mixed results

While I could fix that actual bug, I'd rather remove the panic code,
which should never be being hit anyway.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

5 participants