Description
#103848 updated the URL parsing algorithm to handle IPv6 and IPvFuture addresses when parsing URLs.
However, the algorithm is incomplete. [
and ]
are only permitted in the hostname portion if they are the first and last characters and only if they then contain an IPv6 or IPvFuture address. The current implementation ignores everything before the first [
and everything after the first ]
found in the netloc
portion.
The WhatWG URL standard states that [
and ]
are forbidden characters in a hostname, and the host parser only looks for IPv6 or IPvFuture if the [
and ]
characters are the first and last characters of the section, respectively.
The current implementation thus accepts such bizarre hostnames as:
http://prefix.[v1.example]/
http://[v1.example].postfix/
but then only reports the portion between the brackets as the hostname:
>>> urlparse('http://prefix.[v1.example]/').hostname
'v1.example'
>>> urlparse('http://[v1.example].postfix/').hostname
'v1.example'
The .netloc
attribute, in both cases, contains the whole string.
Both URLs should have been rejected instead.
Your environment
- CPython versions tested on: 3.12.0b1
- Operating system and architecture: Darwin M1
Linked PRs
- gh-105704: Disallow IPv6 URLs with invalid prefix/suffix #111261
- gh-105704: Disallow square brackets (
[
and]
) in domain names for parsed URLs #129418 - [3.13] gh-105704: Disallow square brackets (
[
and]
) in domain names for parsed URLs (GH-129418) #129526 - [3.12] gh-105704: Disallow square brackets (
[
and]
) in domain names for parsed URLs (GH-129418) #129527 - [3.11] gh-105704: Disallow square brackets (
[
and]
) in domain names for parsed URLs (GH-129418) #129528 - [3.10] gh-105704: Disallow square brackets (
[
and]
) in domain names for parsed URLs (GH-129418) #129529 - [3.9] gh-105704: Disallow square brackets (
[
and]
) in domain names for parsed URLs (GH-129418) #129530