Skip to content

Transliteration table has several mistakes and more gaps, should use standard library #37802

Open
@Crissov

Description

@Crissov

Preconditions and environment

Steps to reproduce

  1. set up a new product whose name includes characters ſ (long s), þ (thorn) and ð (eth)
  2. save to get a slugified URL key generated for SEO

Expected result

In the generated URL key,

  • ſ becomes s
  • þ becomes th
  • ð becomes d, although dh and even th would also be acceptable

Actual result

  • ſ becomes z
  • þ becomes p
  • ð is removed

Additional information

These are just some mistakes I easily spotted by looking at the file. I’m pretty sure there are also errors (or questionable choices) in the romanisation of Cyrillic, Greek, Hebrew and Devanagari. The selection of less than 500 characters to be transliterated seems very random, so people created modules to properly support languages like Romanian and Vietnamese.

Just for Japanese, magento2-jp already introduces the use of PHP’s Transliterator which is the right tool for the job. Its data comes from ICU which in turn uses CLDR data, both maintained by Unicode, i.e. it is as reliable as it gets (and will still be improved in the future).

If Transliterator is not to be used for some reason, Magento should at least use the Unicode data for Latin-ASCII and …-Latn.

PS: Ideally, Magento would support setting a language for a store view which would then be respected for stuff like German umlauts (äae) that deviates from the script default (a) – CLDR offers de-ASCII for that, also see #23292. Administrators should also be able to opt into UTF-8 percent encoding in all cases, but let’s keep this a bug report and not a feature request.
PPS: This won’t cover stuff like ½″ which would ideally become half-inch but at best will be 1-2, or 0.5 cm which would better become 5mm than 0-5-cm.

Release note

No response

Triage and priority

  • Severity: S0 - Affects critical data or functionality and leaves users without workaround.
  • Severity: S1 - Affects critical data or functionality and forces users to employ a workaround.
  • Severity: S2 - Affects non-critical data or functionality and forces users to employ a workaround.
  • Severity: S3 - Affects non-critical data or functionality and does not force users to employ a workaround.
  • Severity: S4 - Affects aesthetics, professional look and feel, “quality” or “usability”.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Area: SEOComponent: UrlIssue: ConfirmedGate 3 Passed. Manual verification of the issue completed. Issue is confirmedPriority: P3May be fixed according to the position in the backlog.Reported on 2.4.xIndicates original Magento version for the Issue report.Reproduced on 2.4.xThe issue has been reproduced on latest 2.4-develop branch

    Type

    No type

    Projects

    Status

    Ready for Development

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions