-
Notifications
You must be signed in to change notification settings - Fork 49
Allow unbounded quoted sequences \Q...
#432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
c8c21aa
to
1bcd2cb
Compare
\Q...
\Q...
\Q...
@swift-ci please test |
PCRE and ICU both support quoted sequences that don't have a terminating `\E`. Update the parsing to allow this. Additionally, allow empty quoted sequences outside of custom character classes, which is consistent with ICU. Finally, don't allow quoted sequences to span multiple lines in extended syntax literals.
0029e7c
to
57a5092
Compare
@swift-ci please test |
@@ -579,7 +579,7 @@ extension Source { | |||
|
|||
/// Try to consume quoted content | |||
/// | |||
/// Quote -> '\Q' (!'\E' .)* '\E' | |||
/// Quote -> '\Q' (!'\E' .)* '\E'? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the quote get terminated by a group or does it have to be terminated by the end of the regex? In that case, it seems more like:
Quote -> '\Q' (!'\E' .)* ('\E' | <end-of-input>)
(or whatever our notational convention was for end of input)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has to be terminated by the end of the regex yeah, I can update the grammar to be explicit about that. I don't believe we ever added a convention for it, <end-of-input>
seems reasonable to me tho.
} | ||
if context.experimentalQuotes, src.tryEat("\"") { | ||
// TODO: Can experimental quotes be empty? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect we'll be dropping experimental mode in favor of (?"quoted")
#/ | ||
\Q | ||
/# | ||
"""#, .quoteMayNotSpanMultipleLines) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this just be the regex /\Q/
, i.e. it's not spanning multiple lines because there's no additional line? I'm ok either way, and perhaps reporting an error is helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! Currently we don't do any whitespace stripping, so if we just allowed it as-is, it would parse as \Q\n \E
. That being said, it seems reasonable enough to strip everything after-and-including the last newline given we enforce that the closing delimiter must appear on a newline. Does that sound reasonable to you, or should be leave it as an error for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... for now, if someone wants verbatim multi-line content, they have to use a string inside builder for that. We'll eventually get at least interpolation if not quoting as well. Otherwise the multi-line literal ignores the lines and whitespaces. It might be the case these are all the same regex:
Regex { "re gex" }
#/
re\Q gex
/#
#/
re\Q gex\E
/#
#/
\Qre gex
/#
#/
re\ gex
/#
/re\Q gex/
/re\Q gex\E/
For errors:
#/
re\Q # error: it spans lines (same logic as isolated option setting?)
gex
/#
What about:
#/
re\Q gex
/#
Newlines are a consideration of the host language. So perhaps \Q
in a literal cannot quote a newline and a matching \E
has to appear on the same line or it's the last semantic line in the literal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By semantic line, do you mean any line that doesn't consist entirely of whitespace? So:
#/
re\Q gex
/#
would be legal as we'd strip all trailing whitespace after the x
character, forming the quote \Q gex\E
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A related question here is whether (?-x)
should be allowed if it's used on the last semantic line, e.g:
// Okay
#/
(?-x) abc
/#
// Not okay
#/
(?-x)abc
d
/#
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could go either way, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should probably have a consistent behavior between (?-x)
and unterminated \Q
whichever way we go. For now at least, I'm tempted to say we should keep everything using the rule where semantic whitespace cannot span multiple lines anywhere in the literal, so (?-x)
and unterminated \Q
are always invalid. So this test case would remain as-is. Given this rule only affects literals, we'd be free to lift in the future if we wanted. Does that sound reasonable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
@swift-ci please test |
Going to merge this for now, happy to look into the trailing semantic whitespace rule in a follow-up |
PCRE and ICU both support quoted sequences that don't have a terminating
\E
. Update the parsing to allow this. Additionally, allow empty quoted sequences outside of custom character classes, which is consistent with ICU. Finally, don't allow quoted sequences to span multiple lines in extended syntax literals.