Skip to content

Clean up strings in the compiler toolchain #5522

Closed
@cristianoc

Description

@cristianoc

See #5521 and rescript-lang/syntax#602


Just writing down some notes here.

This reliance on indirect type checking by putting "" at either end of string concatenation seems brittle.
Also there seem to be several layers of sediment left presumably from old ways of doing things.

So `stuff` is the same as js`stuff` by convention I think.

Then there is j`stuff` which somehow obeys different rules. So in j`$x` the $x actually has a meaning. And x does not need to be a string. The consequence of this is that "" + x is not the same as x when x is not a string. So removing empty string concatenation in the back-end of the compiler is also delicate as it's easy to do it wrong (3 + "" can't be removed).

Then there's json`stuff` which I don't know maybe it's the same as j but old, not really sure. Are they really treated in the same way at every stage in the compiler? Not sure.

All this is represented internally by putting together strings that have a tag "j" or "js" or "json".
In addition to all that, strings produced by the parser are now by default unicode, and that uses the tag "*j". But, there's also the OCaml parser for .ml files which will never generate "*j" for normal strings.

In addition to all this, there's some half attempt to also use a type "unicode" inside the back-end of the compiler, which seems incomplete.

Also, there's a quoting mechanism that happens on dump (code generation) which depends on which kind of string it is.

Goes without saying, all this needs a good cleanup.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions