Skip to content

Examples #433

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

Examples #433

wants to merge 2 commits into from

Conversation

bobzhang
Copy link
Member

@bobzhang bobzhang commented Jun 2, 2016

No description provided.

Hongbo Zhang added 2 commits June 1, 2016 16:19
@bobzhang bobzhang closed this Jun 27, 2019
@bobzhang bobzhang deleted the examples branch June 27, 2019 01:58
kevinbarabash pushed a commit to kevinbarabash/rescript-compiler that referenced this pull request Dec 24, 2021
* ## Unicode support

This PR adds support for Unicode codepoints at the syntax level: ReScript source code is now unicode text encoded in UTF-8.

Fixes rescript-lang/syntax#397

### Codepoint literals

A codepoint literal represents an integer value identifying a unicode code point. It is expressed as one or more characters enclosed in single quotes. Examples are `’x’`, `’\n’` or `\u{00A9}`. Multiple UTF-8-encoded bytes may represent a single integer value.

### String literals

String literals are (possibly multi-byte) UTF-8 encoded character sequences between double quotes, as in `"fox"`.

### New escape sequences

Both codepoint and string literals accept the following new escape sequences:

1) Unicode escape sequences
Any character with a character code lower than 65536 can be escaped using the hexadecimal value of its character code, prefixed with `\u`. Unicode escapes are six characters long. They require exactly four characters following `\u` . If the hexadecimal character code is only one, two or three characters long, you’ll need to pad it with leading zeroes.
Example: `'\u2665'` (Represents ♥)

2) Unicode codepoint escape sequences
Any code point or character can be escaped using the hexadecimal value of its character code, prefixed with `\u{` and suffixed with `}` . This allows for code points up to 0x10FFFF, which is the highest code point defined by Unicode. Unicode code point escapes consist of at least five characters. At least one hexadecimal character can be wrapped in `\u{…}` . There is no upper limit on the number of hex digits in use (for example '\u{000000000061}' == 'a')
Example: `'\u{2318}'` (Represents ⌘)

* Rename Character token to Codepoint token.

Codepoint makes more sense with unicode

* Add comment about codepoint literal encoding for printer.

* Parse all normal strings as {js||js} strings.

The compiler processes these strings with js semantics.

Previously {js||js} where interpreted as template literal strings.
The internal encoding has been changed to use an attribute (@res.template) to detect template literal strings
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant