Skip to content

Commit ac99645

Browse files
authored
Fix GH-10634: Lexing memory corruption (#10866)
We're not relying on re2c's bounds checking mechanism because re2c:yyfill:check = 0; is set. We just return 0 if we read over the end of the input in YYFILL. Note that we used to use the "any character" wildcard in the comment regexes. But that means if we go over the end in the comment regexes, we don't know that and it's just like the 0 bytes are part of the token. Since a 0 byte already is considered as an end-of-file, we can just block those in the regex. For the regexes with newlines, I had to not only include \x00 in the denylist, but also \n and \r because otherwise it would greedily match those and let the single-line comment run over multiple lines.
1 parent 4da0da7 commit ac99645

File tree

2 files changed

+31
-3
lines changed

2 files changed

+31
-3
lines changed

Zend/tests/gh10634.phpt

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
--TEST--
2+
GH-10634 (Lexing memory corruption)
3+
--FILE--
4+
<?php
5+
function test_input($input) {
6+
try {
7+
eval($input);
8+
} catch(Throwable $e) {
9+
var_dump($e->getMessage());
10+
}
11+
}
12+
13+
test_input("y&/*");
14+
test_input("y&/**");
15+
test_input("y&#");
16+
test_input("y&# ");
17+
test_input("y&//");
18+
?>
19+
--EXPECT--
20+
string(36) "Unterminated comment starting line 1"
21+
string(36) "Unterminated comment starting line 1"
22+
string(36) "syntax error, unexpected end of file"
23+
string(36) "syntax error, unexpected end of file"
24+
string(36) "syntax error, unexpected end of file"

Zend/zend_language_scanner.l

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1369,9 +1369,13 @@ TOKENS [;:,.|^&+-/*=%!~$<>?@]
13691369
ANY_CHAR [^]
13701370
NEWLINE ("\r"|"\n"|"\r\n")
13711371
OPTIONAL_WHITESPACE [ \n\r\t]*
1372-
MULTI_LINE_COMMENT "/*"([^*]*"*"+)([^*/][^*]*"*"+)*"/"
1373-
SINGLE_LINE_COMMENT "//".*[\n\r]
1374-
HASH_COMMENT "#"(([^[].*[\n\r])|[\n\r])
1372+
/* We don't use re2c with bounds checking, we just return 0 bytes if we read past the input.
1373+
* If we use wildcard matching for comments, we can read past the input, which crashes
1374+
* once we try to report a syntax error because the 0 bytes are not actually part of
1375+
* the token. We prevent this by not allowing 0 bytes, which already aren't valid anyway. */
1376+
MULTI_LINE_COMMENT "/*"([^*\x00]*"*"+)([^*/\x00][^*\x00]*"*"+)*"/"
1377+
SINGLE_LINE_COMMENT "//"[^\x00\n\r]*[\n\r]
1378+
HASH_COMMENT "#"(([^[\x00][^\x00\n\r]*[\n\r])|[\n\r])
13751379
WHITESPACE_OR_COMMENTS ({WHITESPACE}|{MULTI_LINE_COMMENT}|{SINGLE_LINE_COMMENT}|{HASH_COMMENT})+
13761380
OPTIONAL_WHITESPACE_OR_COMMENTS ({WHITESPACE}|{MULTI_LINE_COMMENT}|{SINGLE_LINE_COMMENT}|{HASH_COMMENT})*
13771381

0 commit comments

Comments
 (0)