Skip to content

Commit 776b0d8

Browse files
committed
allow Relaxed to match punycode TLDs
For example, it should match "test.xn--8y0a063a" just like it matches "test.联通". Instead of doubling the size of the regexp by adding the punycode version of every known TLD, simply match any valid punycode string which follows "xn--". It's highly unlikely that this would cause false positives. Fixes #27.
1 parent 32cda0c commit 776b0d8

File tree

2 files changed

+8
-1
lines changed

2 files changed

+8
-1
lines changed

xurls.go

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,9 @@ func strictExp() string {
7272
}
7373

7474
func relaxedExp() string {
75-
site := domain + `(?i)` + anyOf(append(TLDs, PseudoTLDs...)...) + `(?-i)`
75+
punycode := `xn--[a-z0-9-]+`
76+
knownTLDs := anyOf(append(TLDs, PseudoTLDs...)...)
77+
site := domain + `(?i)(` + punycode + `|` + knownTLDs + `)(?-i)`
7678
hostName := `(` + site + `|` + ipAddr + `)`
7779
webURL := hostName + port + `(/|/` + pathCont + `?|\b|$)`
7880
return strictExp() + `|` + webURL

xurls_test.go

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -188,6 +188,11 @@ func TestRegexes(t *testing.T) {
188188
{`foo.onion`, true},
189189
{`中国.中国`, true},
190190
{`中国.中国/foo中国`, true},
191+
{`test.联通`, true},
192+
{`test.xn--8y0a063a`, true},
193+
{`test.xn--8y0a063a/foobar`, true},
194+
{`test.xn-foo`, nil},
195+
{`test.xn--`, nil},
191196
{`foo.com/`, true},
192197
{`1.1.1.1`, true},
193198
{`10.50.23.250`, true},

0 commit comments

Comments
 (0)