Skip to content

Allow ISO8859-1 encoding in properties #26

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 8, 2022
Merged

Allow ISO8859-1 encoding in properties #26

merged 1 commit into from
Nov 8, 2022

Conversation

cmaglie
Copy link
Member

@cmaglie cmaglie commented Nov 8, 2022

A .properties file is parsed as UTF-8 encoded, but if we found some non UTF-8 characters, we now assume it's ISO8859-1 and we convert it back to UTF-8. It is common to find ISO8859-1 encoded .properties files in old java projects.

ISO8859-1 range is 0x00-0xFF so the conversion is done by simply converting each byte in the corresponding code point in UTF-8: https://stackoverflow.com/a/13511463/1655275

rune is an alias for int32, and when it comes to encoding, a rune is assumed to have a Unicode character value (code point). So the value b in rune(b) should be a unicode value. For 0x00 - 0xFF this value is identical to Latin-1, so you don't have to worry about it.

Then you need to encode the runes into UTF8. But this encoding is simply done by converting a []rune to string.

@cmaglie cmaglie merged commit fc70991 into master Nov 8, 2022
@cmaglie cmaglie deleted the non-utf-8 branch November 8, 2022 14:20
@per1234 per1234 added type: enhancement Proposed improvement topic: code Related to content of the project itself labels Nov 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: code Related to content of the project itself type: enhancement Proposed improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants