Skip to content

Tokenize yield from as T_YIELD+T_FROM #15041

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

bwoebi
Copy link
Member

@bwoebi bwoebi commented Jul 20, 2024

Fixes GH-14926.

Changing it in PHP 8.3 would be a BC break, but for PHP 8.4 it should be a proper fix.

@iluuu1994
Copy link
Member

That looks like the better solution, thank you!

@bwoebi bwoebi force-pushed the fix-yield-from-tokenization branch from 3af8fdc to 42e9f66 Compare July 20, 2024 17:06
@bwoebi
Copy link
Member Author

bwoebi commented Jul 20, 2024

@iluuu1994 Do you think it's reasonable to merge this soon? After all, it's strictly superior to the current implementation - even if we decide to revert anything, we'd just revert this as well.

@iluuu1994
Copy link
Member

Given there's an ongoing discussion on the list, let's not jump the gun and cause more drama. But we should make sure to merge it before feature freeze.

@bwoebi
Copy link
Member Author

bwoebi commented Jul 20, 2024

Sure, soon, not now :-) So I agree.

REGISTER_LONG_CONSTANT("T_YIELD_FROM", T_YIELD_FROM, CONST_PERSISTENT);
REGISTER_LONG_CONSTANT("T_FROM", T_FROM, CONST_PERSISTENT);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should T_YIELD_FROM be deprecated first (as unused as of this PR) and removed in 9.0 ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't really apply for tokenizer constants. They always match what exists for the specific PHP version.
(It's also a generated file.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, just makes the BC break larger again.

@jrfnl
Copy link
Contributor

jrfnl commented Jul 21, 2024

@bwoebi I appreciate you opening this PR to improve the situation.

Was there any bike-shedding done on what the tokenization should be ? I've seen several different ideas floating around, so is there a pertinent reason why this implementation is favoured over the other possible ones ?

yield from gen();
yield /*comment*/ from gen();
  • PHP 7.0 - 8.2:
    Example 1: T_YIELD_FROM (including the whitespace)
    Example 2: Parse error - T_YIELD + whitespace + comment + whitespace + T_STRING (from)
  • PHP 8.3.0 - 8.3.9:
    Example 1: T_YIELD_FROM (including the whitespace)
    Example 2: T_YIELD_FROM (including the comment and whitespace)
  • Proposed implementation in this PR:
    Example 1: T_YIELD + whitespace + T_FROM
    Example 2: T_YIELD + whitespace + comment + whitespace + T_FROM (or T_FROM_FOR_YIELD_FROM as suggested in the Internals mail thread by @TimWolla)

Some alternative options:

  • Both keywords are tokenized as T_YIELD_FROM if separated by more than just whitespace. This would be a smaller BC-break for consumers of the token stream.
    Example 1: T_YIELD_FROM (including the whitespace)
    Example 2: T_YIELD_FROM + whitespace + comment + whitespace + T_YIELD_FROM
  • Keep the T_YIELD_FROM for the yield keyword in yield from, new token for from. This, again, would be a smaller BC-break.
    Example 1: T_YIELD_FROM + whitespace + T_FROM_FOR_YIELD
    Example 2: T_YIELD_FROM + whitespace + comment + whitespace + T_FROM_FOR_YIELD

Other questions:

  • What branch of PHP will this target ? PHP 8.3 or 8.4 ?
    The description mentions 8.4, but as the 8.3 change was not documented, this can still be regarded as a bug fix for PHP 8.3, especially as it doesn't break code which has started to add comments between the keywords.
  • Maybe it should get a couple of changelog (UPGRADING) entries ?
    Based on the current PR:
    • Removal/deprecation of T_YIELD_FROM constant
    • New constant T_FROM
    • Entry in "Changed" for the Tokenizer extension

@bwoebi
Copy link
Member Author

bwoebi commented Jul 21, 2024

I don't care about T_FROM_FOR_YIELD vs T_FROM.

Also, tokenization is very explicitly going to change from minor to minor. I very clearly do not care about BC with respect to tokenization between minors. It's an internal detail of the parser, which we expose and guarantee stability for within a minor.

From that perspective I am going towards the approach which does not carry any technical debt (like a T_YIELD_FROM token which occurs only in specific scenarios).

I'm targeting PHP 8.4 due to the BC break. I personally don't mind changing it in PHP 8.3 too. However, you'll have to persuade internals about this.

@jrfnl
Copy link
Contributor

jrfnl commented Jul 22, 2024

Also, tokenization is very explicitly going to change from minor to minor. I very clearly do not care about BC with respect to tokenization between minors. It's an internal detail of the parser, which we expose and guarantee stability for within a minor.

Sorry, but I find that statement a little disingenuous.

Yes, tokens change in every minor due to new features - which always go via RFCs.
Other than that, it is very rare for tokenization of tokens to change once they are in PHP.

The Tokenizer is part of the public API and is used by all sorts of tooling (yes, PHPCS, but also PHPUnit, Composer, PHP Parallel Lint, phpDocumentor etc etc), so BC-breaks should be treated the same as anywhere else and be clearly annotated and documented. How can tooling be expected to keep up and deal with it, if not ?

@bwoebi
Copy link
Member Author

bwoebi commented Jul 22, 2024

Yes, the change should have been documented. But doing the change in the first place was perfectly fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PHP 8.3: "yield /*comment*/ from" is no longer a parse error ?
4 participants