Skip to content

glob("*") does not support matching non-utf8 filenames #11916

Closed
@lilyball

Description

@lilyball

glob::glob() does not have any support right now for matching non-utf8 filenames. Not only are its patterns restricted to strings, but it also explicitly skips any non-utf8 filenames it encounters (which should at least be able to match a * pattern).

Tasks that need to be done:

  • glob() needs to accept both strings and byte-vectors. It can do this using std::path::BytesContainer
  • glob() needs to process its pattern as a byte vector instead of a string, which will allow it to process filenames as byte vectors. This includes matching non-utf8 filenames against * and ? tokens (for the latter, matching a single byte is appropriate; ideally, it would match however many bytes are supposed to be consumed to create a U+FFFD REPLACEMENT CHARACTER as per the unicode standard)

This is a sub-task of #9639.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-UnicodeArea: UnicodeE-easyCall for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue.E-mentorCall for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions