Skip to content

FILTER_VALIDATE_URL returns false when underscore present in URL #17842

Open
@eelkefierstra

Description

@eelkefierstra

Description

The following code:

<?php
var_dump(filter_var('https://sub_domain.example.com', FILTER_VALIDATE_URL));
var_dump(filter_var('https://ex_ample.com', FILTER_VALIDATE_URL));

Resulted in this output:

bool(false)
bool(false)

But I expected this output instead:

string(30) "https://sub_domain.example.com"
string(20) "https://ex_ample.com"

The underscore is a valid character according to the RFC 2396 section 2.3:

Unreserved Characters

Data characters that are allowed in a URI but do not have a reserved
purpose are called unreserved. These include upper and lower case
letters, decimal digits, and a limited set of punctuation marks and
symbols.

  unreserved  = alphanum | mark

  mark        = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"

Unreserved characters can be escaped without changing the semantics
of the URI, but this should not be done unless the URI is being used
in a context that does not allow the unescaped character to appear.

But this filter fails if a underscore is present in the domain or subdomain portion of the URL.

This RFC is superseded by RFC 3986, but the underscore is still in the unreserved characters:

unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"

PHP Version

PHP 8.4.4

Operating System

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions