Skip to content

DSLTree #134

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Feb 2, 2022
Merged

DSLTree #134

merged 7 commits into from
Feb 2, 2022

Conversation

milseman
Copy link
Member

@milseman milseman commented Jan 27, 2022

DSLTree is an internal representation shared between literals and result builders.

  • Lowering/transforming AST -> DSLTree
    • TODO: More atoms, built-in character classes, etc
  • Pretty-printing DSLTree
    • At rough parity of the original PrintAsPattern
  • Switch compiler over to DSLTree
  • Switch result builders over to DSLTree
  • Legacy VM support for everything except the new extensions
  • CaptureStructure support

Comment on lines 38 to 39
/// An embedded literal
case regexLiteral(AST.Node)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would an embedded literal be able to contain global options?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, global options that pertain only to that fragment I suppose. How would global options interact with the ability to interpolate regex literals?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yeah, I guess they would have to form their own scope in which they're only active in that portion of the regex

Comment on lines 15 to 17
/// (?(cond) true-branch | false-branch)
case conditional(
AST.Conditional.Condition.Kind, Node, Node)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the condition kind get converted too (as it can contain a recursive AST.Group)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I didn't look into that part far enough yet. Good catch.

Comment on lines 197 to 200
// No need to nest single children concatenations
if astChildren.count == 1 {
return astChildren.first!.dslTreeNode
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When do single-child concatenations occur? I kind of wonder whether AST.Concatenation should enforce that children.count >= 2 (and maybe have a utility that produces concat-or-self), or is it better for compatibility with the DSL to allow them?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m in favor of enforcing a plural count.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would we enforce that in the DSL? Would we error out if given a single member? I worry there could be impact with sensible refactoring like quoting a literal run of characters vs listing each separately.

Copy link
Contributor

@rxwei rxwei Feb 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The builder DSL always supports having a singular element in a block, e.g. Regex { "a" }. It's just that buildBlock(_:) (unary) never creates a concatenation in this scenario, but returns the argument instead. As such, I'm not sure a singular concatenation would ever be created.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'm fine with the invariant on the parser's AST, though we could still argue that we want to coalesce runs of characters into quotes anyways, so that abc|def is

Alternation {
  "abc"
  "def"
}

but I agree it doesn't have to be part of this transformation, could be part of printing, etc.

We'll want to do some analysis and optimization, and doing so on a tree is a classic compiler smell, but I doubt we'll have a real IR in the near future.

@milseman milseman marked this pull request as ready for review February 2, 2022 00:37
@milseman
Copy link
Member Author

milseman commented Feb 2, 2022

@swift-ci please test linux platform

@milseman
Copy link
Member Author

milseman commented Feb 2, 2022

@hamishknight @rxwei I had to move some stuff around, can you take a look at this PR again to see if it's mergeable?

@milseman milseman merged commit 612f104 into swiftlang:main Feb 2, 2022
@milseman milseman deleted the reforestation branch February 2, 2022 17:55
milseman added a commit to milseman/swift-experimental-string-processing that referenced this pull request Feb 12, 2022
DSLTree, a shared representation for compilation and printing

Bug fix in matching engine for reading subsequences out of bounds

DSLTree hooked up to legacy VMs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants