Skip to content

Better coalesce adjacent scalars #574

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jul 26, 2022

Conversation

hamishknight
Copy link
Contributor

@hamishknight hamishknight commented Jul 13, 2022

Previously we would only coalesce adjacent scalars in regex literals outside of custom character classes. Change the behavior such that we start coalescing scalars inside custom character classes in grapheme mode, e.g [e\u{301}] matches e\u{301}, and start coalescing adjacent scalars in DSL.

Previously for the DSL we would emit a series of scalars as a series of individual characters in grapheme semantic mode. Change the behavior such that we coalesce any adjacent scalars and characters, including those in regex literals and nested concatenations. We then perform grapheme breaking over the result, and can emit character matches for scalars that coalesced into a grapheme.

This transform subsumes a similar transform we performed for regex literals when converting them to a DSLTree. This has the nice side effect of allowing us to better preserve scalar syntax in the DSL transform.

Resolves #572 (rdar://96942688)
Resolves #586 (rdar://97209131)
Resolves #573

@hamishknight hamishknight requested a review from milseman July 13, 2022 16:39
Copy link
Member

@milseman milseman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about adjacent content in a custom character class?

@hamishknight
Copy link
Contributor Author

@rctcwyvrn mind taking a look at 203868f?

@hamishknight hamishknight requested a review from rctcwyvrn July 18, 2022 19:24
Copy link
Contributor

@rctcwyvrn rctcwyvrn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@hamishknight hamishknight changed the title Coalesce adjacent scalars and characters in the DSL Better coalesce adjacent scalars Jul 19, 2022
@hamishknight hamishknight force-pushed the better-together branch 2 times, most recently from ef29854 to 3595070 Compare July 19, 2022 12:30
Previously we would emit a series of scalars
written in the DSL as a series of individual
characters in grapheme semantic mode. Change the
behavior such that we coalesce any adjacent
scalars and characters, including those in regex
literals and nested concatenations. We then
perform grapheme breaking over the result, and can
emit character matches for scalars that coalesced
into a grapheme.

This transform subsumes a similar transform we
performed for regex literals when converting them
to a DSLTree. This has the nice side effect of
allowing us to better preserve scalar syntax in
the DSL transform.

rdar://96942688
Previously we would only match entire characters.
Update to use the generic Character consumer logic
that can handle scalar semantic mode.

rdar://97209131
In grapheme semantic mode, coalesce adjacent
character and scalar members of a custom character
class, over which we can perform grapheme breaking.
This involves potentially re-writing ranges such
that they contain a complete grapheme of adjacent
scalars.
Make sure we throw the right error for ranges
that are invalid in grapheme mode, but are valid
in scalar mode.
I also noticed that `lexQuantifier` could silently
eat trivia if it failed to lex a quantification,
so also fix that.
@hamishknight
Copy link
Contributor Author

@swift-ci please test

@hamishknight hamishknight merged commit e47cc63 into swiftlang:main Jul 26, 2022
@hamishknight hamishknight deleted the better-together branch July 26, 2022 14:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants