Skip to content

[CVE-2025-0938] urlparse does not flag hostname *containing* [ or ] as incorrect #105704

Closed
@mjpieters

Description

@mjpieters

#103848 updated the URL parsing algorithm to handle IPv6 and IPvFuture addresses when parsing URLs.

However, the algorithm is incomplete. [ and ] are only permitted in the hostname portion if they are the first and last characters and only if they then contain an IPv6 or IPvFuture address. The current implementation ignores everything before the first [ and everything after the first ] found in the netloc portion.

The WhatWG URL standard states that [ and ] are forbidden characters in a hostname, and the host parser only looks for IPv6 or IPvFuture if the [ and ] characters are the first and last characters of the section, respectively.

The current implementation thus accepts such bizarre hostnames as:

  • http://prefix.[v1.example]/
  • http://[v1.example].postfix/

but then only reports the portion between the brackets as the hostname:

>>> urlparse('http://prefix.[v1.example]/').hostname
'v1.example'
>>> urlparse('http://[v1.example].postfix/').hostname
'v1.example'

The .netloc attribute, in both cases, contains the whole string.

Both URLs should have been rejected instead.

Your environment

  • CPython versions tested on: 3.12.0b1
  • Operating system and architecture: Darwin M1

Linked PRs

Metadata

Metadata

Labels

stdlibPython modules in the Lib dirtype-bugAn unexpected behavior, bug, or error

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions