Skip to content

Merge syntax repo #5347

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 321 commits into from
Closed

Conversation

kevinbarabash
Copy link
Contributor

@kevinbarabash kevinbarabash commented Dec 24, 2021

This is a possible alternative to #5268. I've copied the history from the syntax repo using the approach outlined here.

I tested the change by running ./scripts/ninja.js config and ./scripts/ninja.js build and both completed successfully.

In future PRs we can figure out how to better organize things, but in the meantime this should make it a bit easier to author/review changes involving the parser/printer.

TODO:

  • move syntax/.github/workflows/ up a level or port it to the circle-ci config

IwanKaramazow and others added 30 commits July 26, 2020 11:08
* Implement outcome printing of polymorphic variants

* Omit printing of leading Ptyp_variant bar when layout doesn't break.

type color = [ #Red | #Blue | #Green ]
VS
type color = [
  | #Red
  | #Blue
  | #Green
]

Should be consistent with outcome printer

* Improve consistency spacing brackets surrounding poly vars in outcome printer

[ #red ] should be [#red]

* Improve consistency spacing brackets surrounding poly vars in typexpr printer

[ #red ] should be [#red]

* Print exotic names escaped in poly-var outcome printer

* Document meaning of outcome printer
Bumps [lodash](https://github.com/lodash/lodash) from 4.17.15 to 4.17.19.
- [Release notes](https://github.com/lodash/lodash/releases)
- [Commits](lodash/lodash@4.17.15...4.17.19)

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Main is misleading since the cli part is only used for this repo, for
ease of testing.
Also clean up some other rules
if you code using vscode, run the task, it has a watch mode, enjoy the better editing experience
- -absname for precise error reporting
lib/text.exe was hanging because of
rescript-lang/syntax@0ec18d1#diff-b67911656ef5d18c4ae36cb6741b7965R39
which accidentally bundled the cli (which waits for user input) into the
test binary
…rting than BS super-errors (rescript-lang#87)

* First pass at using a similar logic & display for terminal error reporting than BS super-errors

* Update snapshots
2 newlines turn into 1 inside bsb. Not sure why. Format is temporary, so
this'll do until we remove it completely.
Forces the `exit` annotation to `let ()` binding.
* Enable roundtrip-tests on ci.

It provides stronger tests:
- Napkin is bootstrapped
- Equality check between the Parsetree from different parsers
- prints napkin code twice to check for inconsistencies.

* Comment out Refmt bug to make roundtrip-tests pass.

We should really fix this at the refmt level…
The previous approach put the scanner in "Template" mode when the parser
was going to parse template literals. This resulted in some very awkward
code; upon scanning a token there was logic checking whether we were
in "Template" mode. Every non-template literal token would pay for this
extra branch…

The new parsing strategy works differently: when the parser needs to
parse template literals, it just asks the scanner immediately for a
template literal token. This is both more performant an easier to reason
about. Template literals are a different language, it makes sense to
split this into a separate scanning function.
* Implement parsing of lightweight syntax for poly variants containing exotic idents

Example:
  #"ease-in"

grammar:
 HASH STRING

* Implement printing of light weight poly var syntax
chenglou and others added 25 commits May 1, 2021 01:23
* Clean up outcome printer test infra

It's just another snapshot. We'll reuse the previous rock solid snapshot "infra" aka `git diff`
Modules types don't use a `=`, it should be a `:`.

```
module Expr: {
  …
}
```
rescript-lang#410)

Fixes rescript-lang/syntax#409

The parser should parse `type queryDelta = Compute({"blocked_ids": unit} => unit)` without parens surrounding the object type-expr `{"blocked_ids": unit}`. The parens are optional in this case.
In other places like `{"blocked_ids": unit} => unit`, this was already the case.
…escript-lang#408)

* Don't parse Int token with suffices as hash ident for poly variants

`#10s` should not be accepted as a numeric polyvariant identifier.

Fixes rescript-lang/syntax#407

* Tweak `variantIdent` error message based on feedback from bloodyowl

Co-authored-by: Matthias Le Brun <[email protected]>

* Update hashIdent error tests

* Add printer test to verify that int tokens with suffix aren't printed as numeric polyvars

* Refine error message for numeric polyvars followed by a letter.

Co-authored-by: Matthias Le Brun <[email protected]>
…script-lang#414)

Fixes GH413

`Array.get(_, 0)` shouldn't be printed as `_[0]`
…ang#416)

`{path as p}` formats to `{path: p`}, which is the right syntax. The as syntax is unnecessarily confusing sugar.
Fixes rescript-lang/syntax#412

Currently the grammar allows for a list of primitives in an external declaration: i.e. `"hi" "hx"` in `external f: (int, int) => int = "hi" "hx"`. This stems from the fact that user primitives with arity greater than 5 should be implemented by two C functions. The first function, to be used in conjunction with the bytecode compiler ocamlc, receives two arguments: a pointer to an array of OCaml values (the values for the arguments), and an integer which is the number of arguments provided. The other function, to be used in conjunction with the native-code compiler ocamlopt, takes its arguments directly. However in the case of compiling to JS, we don't need to deal with this. In order to reduce some complexity, we'll now parse just one primitive.
…rescript-lang#139)

* implement syntax for arity zero vs arity one in uncurried application

Since there is no syntax space for arity zero vs arity one,
we parse
  `fn(. ())` into
  `fn(. {let __res_unit = (); __res_unit})`
  when the parsetree is intended for type checking

`fn(.)` is treated as zero arity application

* add CHANGELOG entry
* Handle windows CRLF correct.

`\r\n` should be picked up as one line break, not two.

* Add comment about CRLF

Co-authored-by: Iwan <[email protected]>
* Refactor parsing of string literals with a state machine

* Only log string escape sequence errors during scanning

* Remove unused string escape error messages in parsing.

Errors are reported during scanning, we don't need to report them both.
Ideally we should all do this in one pass…
…pt-lang#446)

Fixes rescript-lang/syntax#445

**before**
```
(~?x: 'a, ~y: 'b) => option<'a>
```

**after**
```
(~?x: 'a, ~y: 'b) => option<'a>
```

Co-authored-by: Iwan <[email protected]>
* Fix syntax error in tests

* Add tests for illegal identifier

* Fix parsing lident
* ## Unicode support

This PR adds support for Unicode codepoints at the syntax level: ReScript source code is now unicode text encoded in UTF-8.

Fixes rescript-lang/syntax#397

### Codepoint literals

A codepoint literal represents an integer value identifying a unicode code point. It is expressed as one or more characters enclosed in single quotes. Examples are `’x’`, `’\n’` or `\u{00A9}`. Multiple UTF-8-encoded bytes may represent a single integer value.

### String literals

String literals are (possibly multi-byte) UTF-8 encoded character sequences between double quotes, as in `"fox"`.

### New escape sequences

Both codepoint and string literals accept the following new escape sequences:

1) Unicode escape sequences
Any character with a character code lower than 65536 can be escaped using the hexadecimal value of its character code, prefixed with `\u`. Unicode escapes are six characters long. They require exactly four characters following `\u` . If the hexadecimal character code is only one, two or three characters long, you’ll need to pad it with leading zeroes.
Example: `'\u2665'` (Represents ♥)

2) Unicode codepoint escape sequences
Any code point or character can be escaped using the hexadecimal value of its character code, prefixed with `\u{` and suffixed with `}` . This allows for code points up to 0x10FFFF, which is the highest code point defined by Unicode. Unicode code point escapes consist of at least five characters. At least one hexadecimal character can be wrapped in `\u{…}` . There is no upper limit on the number of hex digits in use (for example '\u{000000000061}' == 'a')
Example: `'\u{2318}'` (Represents ⌘)

* Rename Character token to Codepoint token.

Codepoint makes more sense with unicode

* Add comment about codepoint literal encoding for printer.

* Parse all normal strings as {js||js} strings.

The compiler processes these strings with js semantics.

Previously {js||js} where interpreted as template literal strings.
The internal encoding has been changed to use an attribute (@res.template) to detect template literal strings
…ng#455)

Fixes rescript-lang/syntax#451.

`type call =  CleanStart` should be printed as `type call = CleanStart`. Notice the correct space after `=`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.