URL regex doesn't recognise important TLDs like .cat #3

FauxFaux · 2011-10-09T00:28:47Z

Hovering URLs like http://nyan.cat/ and clicking take you to http://nyan.ca/ , as the .cat TLD isn't recognised.

TLD specific code should probably be removed anyway, due to the introduction of arbitrary TLDs.

ljani · 2012-01-25T13:56:46Z

There are also other problems with the regex. Hosts such as http://localhostr.com get detected as http://localhost/

joosera · 2012-01-29T22:51:27Z

Also .pro is affected.
.[a-zA-Z][a-zA-Z] could possibly be changed to something like .[a-zA-Z]{3} (not sure what regex markup putty tray uses, this didn't seem to work when i manually edited the regex, though it didn't break the old behaviour of only hilighting the .pr part so not really sure what's going on there)

wodim · 2012-10-17T22:28:04Z

Bump.

incognico · 2013-05-01T12:45:42Z

Why not just simplify the regexp? The current one leads to nowhere. Why not use something like (pcre syntax that is) \b((?:[hH][tT][tT][pP]://|[wW][wW][wW]\.)[^\s)'"]+) or even \b((?:[a-zA-Z]+://|[wW][wW][wW]\.)[^\s'")]+) for any protocol? I don't know if the used libary can do word boundaries but in general this regexp needs to be minimal. Imagine the new gTLDs are starting :p

FauxFaux · 2013-05-01T22:27:11Z

The aim of the regex has previously been to match only plausible URLs, not anything that could conceivably be a URL; I'm guessing this is what people expect (it's what I expect).

It's customisable; if you want to use something more liberal feel free.

Also, no, the regex engine supports basically nothing. #4 is to fix that, but it's hard work.

incognico · 2013-05-02T13:04:49Z

Everything that conceivably is a URL is also a plausible URL in my opinion.

So, if anyone is interested:
I am now using (([a-zA-Z]+://|[wW][wW][wW]\.)[^ '")>]+).

In contrast to the very complex default regexp which needs a lot of maintence it allows me to click the following URIs:

www.example.newgtld
http://foo.intern
http://hostname
http://user:[email protected]:8081/foo
whatever://protocol

I also giggled at * @(#)regexp.c 1.3 of 18 April 87

Finally found a case where this breaks horribly: * attempt launch an invalid url * we panic, and think we can never launch a url again * future non-http urls get run with the browser, with mixed results While I could fix that actual bug, I'd rather remove the panic code, which should never be being hit anyway.

FauxFaux · 2013-07-14T17:49:47Z

I've added a default option of @incognico's suggestion, and liberalised the "classic" default a bit.

I've additionally removed the nasty browser detection code, as this was just generally broken even launching urls like "www.google.com".

Finally found a case where this breaks horribly: * attempt launch an invalid url * we panic, and think we can never launch a url again * future non-http urls get run with the browser, with mixed results While I could fix that actual bug, I'd rather remove the panic code, which should never be being hit anyway.

stfnm mentioned this issue Nov 9, 2012

.cat domain breaks link detection #58

Closed

FauxFaux added a commit that referenced this issue Apr 30, 2013

GH-3: Reformat regex

199180d

FauxFaux added a commit that referenced this issue Apr 30, 2013

GH-3: Add some more popular TLDs

8861c24

FauxFaux added a commit that referenced this issue Jun 1, 2013

GH-3: Reformat regex

e5b5388

FauxFaux added a commit that referenced this issue Jun 1, 2013

GH-3: Add some more popular TLDs

3bd571d

FauxFaux added a commit that referenced this issue Jul 13, 2013

GH-3: Reformat regex

8f9106d

FauxFaux added a commit that referenced this issue Jul 13, 2013

GH-3: Add some more popular TLDs

898e260

FauxFaux added a commit that referenced this issue Jul 14, 2013

GH-3: Add very liberal URL regex as a default option

80f4e4c

FauxFaux added a commit that referenced this issue Jul 14, 2013

GH-3: Thinking about it, not allowing http://foo seems dumb

a27f247

FauxFaux added a commit that referenced this issue Jul 14, 2013

GH-3: http://google.com stopped working, regex engine bug?

8ba1f9d

FauxFaux closed this as completed Jul 14, 2013

FauxFaux added a commit that referenced this issue Jul 14, 2013

GH-3: Reformat regex

534efe1

FauxFaux added a commit that referenced this issue Jul 14, 2013

GH-3: Add some more popular TLDs

b56ddd3

FauxFaux added a commit that referenced this issue Jul 14, 2013

GH-3: Add very liberal URL regex as a default option

1f4312e

FauxFaux added a commit that referenced this issue Jul 14, 2013

GH-3: Thinking about it, not allowing http://foo seems dumb

2325613

FauxFaux added a commit that referenced this issue Jul 14, 2013

GH-3: http://google.com stopped working, regex engine bug?

36c65bc

FauxFaux added a commit that referenced this issue Aug 6, 2013

GH-3: Reformat regex

c5ae841

FauxFaux added a commit that referenced this issue Aug 6, 2013

GH-3: Add some more popular TLDs

67063fc

FauxFaux added a commit that referenced this issue Aug 6, 2013

GH-3: Add very liberal URL regex as a default option

b544774

FauxFaux added a commit that referenced this issue Aug 6, 2013

GH-3: Thinking about it, not allowing http://foo seems dumb

44ac4f3

FauxFaux added a commit that referenced this issue Aug 6, 2013

GH-3: http://google.com stopped working, regex engine bug?

4a4115b

FauxFaux added a commit that referenced this issue Aug 7, 2013

GH-3: Reformat regex

18198fb

FauxFaux added a commit that referenced this issue Aug 7, 2013

GH-3: Add some more popular TLDs

b4df52d

FauxFaux added a commit that referenced this issue Aug 7, 2013

GH-3: Add very liberal URL regex as a default option

3ef0bfc

FauxFaux added a commit that referenced this issue Aug 7, 2013

GH-3: Thinking about it, not allowing http://foo seems dumb

72e3e66

FauxFaux added a commit that referenced this issue Aug 7, 2013

GH-3: http://google.com stopped working, regex engine bug?

cdb9b42

FauxFaux added a commit that referenced this issue Aug 11, 2013

GH-3: Reformat regex

781afde

FauxFaux added a commit that referenced this issue Aug 11, 2013

GH-3: Add some more popular TLDs

4a8995c

FauxFaux added a commit that referenced this issue Aug 11, 2013

GH-3: Add very liberal URL regex as a default option

4a781b4

FauxFaux added a commit that referenced this issue Aug 11, 2013

GH-3: Thinking about it, not allowing http://foo seems dumb

5f71c14

FauxFaux added a commit that referenced this issue Aug 11, 2013

GH-3: http://google.com stopped working, regex engine bug?

76fa3de

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

URL regex doesn't recognise important TLDs like .cat #3

URL regex doesn't recognise important TLDs like .cat #3

FauxFaux commented Oct 9, 2011

ljani commented Jan 25, 2012

joosera commented Jan 29, 2012

wodim commented Oct 17, 2012

incognico commented May 1, 2013

FauxFaux commented May 1, 2013

incognico commented May 2, 2013

FauxFaux commented Jul 14, 2013

URL regex doesn't recognise important TLDs like .cat #3

URL regex doesn't recognise important TLDs like .cat #3

Comments

FauxFaux commented Oct 9, 2011

ljani commented Jan 25, 2012

joosera commented Jan 29, 2012

wodim commented Oct 17, 2012

incognico commented May 1, 2013

FauxFaux commented May 1, 2013

incognico commented May 2, 2013

FauxFaux commented Jul 14, 2013