Skip to content

Align highlight_string|file with HTML standard and modern browsers #11913

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

kamil-tekiela
Copy link
Member

See php/phd#79 (comment)

This PR fixes:

  • Instead of <br> output \n
  • Instead of &nbsp; output
  • Instead of <code> use <pre>
  • Instead of <span> use <code>
  • Remove the first enclosing <code> (old <span>)
  • Remove the enclosing newlines

@nielsdos
Copy link
Member

nielsdos commented Aug 8, 2023

I think this can't target stable branches due to the BC break, both for the internal function exposed as ZEND_API and also for the userland behaviour.

@kamil-tekiela
Copy link
Member Author

According to the documentation the HTML markup can change at any time. https://www.php.net/manual/en/function.highlight-string.php#refsect1-function.highlight-string-notes

The HTML markup generated is subject to change.

So I think it's ok to go into minor. There should be no noticeable change in the visual output. The only difference is that the output is now block instead of inline.

@nielsdos
Copy link
Member

nielsdos commented Aug 8, 2023

Ah ok, wasn't aware of the note.

@thg2k
Copy link
Contributor

thg2k commented Aug 8, 2023

I see no reason to target a stable branch with something that hasn't been touched in decades. What's wrong with 8.3?

@kamil-tekiela
Copy link
Member Author

I see no reason to target a stable branch with something that hasn't been touched in decades. What's wrong with 8.3?

Because technically it's a bug fix. People did complain some time ago that the output of this when displayed in the browser, cannot be cleanly copied. This isn't just a stylistic change, but rather meant to fix the bug.

@nielsdos
Copy link
Member

nielsdos commented Aug 8, 2023

I have a quick question: why is \t translated to four spaces instead of just a tab? If we're wrapping it in a pre anyway, we might as well use a tab, or am I forgetting something?

@kamil-tekiela
Copy link
Member Author

I have a quick question: why is \t translated to four spaces instead of just a tab? If we're wrapping it in a pre anyway, we might as well use a tab, or am I forgetting something?

Excellent question. I thought of doing the same, but I noticed that we used 4 NBSPs before. If I changed it now to a proper tab, it would change the output, because the tab is a wider character than 4 NBSP.

Test it yourself.

<pre>
&#09;HTML Tab
&nbsp;&nbsp;&nbsp;&nbsp;Four NBSPs
<?="    "?>Four spaces
<?="\t"?>Tab
</pre>

@TimWolla
Copy link
Member

TimWolla commented Aug 9, 2023

The size of a tab in HTML is configurable using https://developer.mozilla.org/en-US/docs/Web/CSS/tab-size

Copy link
Member

@TimWolla TimWolla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if using a separate <code> tag per line is actually more correct than the current version. MDN says:

To represent multiple lines of code, wrap the <code> element within a <pre> element.

Which is phrased as a single <code> element.

Copy link
Member

@Girgias Girgias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, the proper way is to do <pre><code>code goes here where <span class="bold">highlighting</span> works with span elements</code></pre>

Also this can't change in a stable release, I think this is fine for 8.3 if RMs agree @bukka @ericmann

Comment on lines 137 to 130
zend_printf("</span>");
zend_printf("</code>");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those should still be span elements

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you are right. I checked how it works on Stack Overflow and they do it exactly like this. I changed it now.

@kamil-tekiela
Copy link
Member Author

Also this can't change in a stable release

But why not? The documentation clearly states that the HTML cannot be relied upon. So as long as the actual visual output isn't changed, there should be no issue. Of course, if RMs do not agree, then let's do it in 8.3.

@kamil-tekiela
Copy link
Member Author

kamil-tekiela commented Aug 9, 2023

I am not sure if using a separate <code> tag per line is actually more correct than the current version. MDN says:

To represent multiple lines of code, wrap the <code> element within a \<pre\> element.

Which is phrased as a single <code> element.

Correct, but then it says:

The <code> element by itself only represents a single phrase of code or line of code.

@Girgias
Copy link
Member

Girgias commented Aug 9, 2023

Also this can't change in a stable release

But why not? The documentation clearly states that the HTML cannot be relied upon. So as long as the actual visual output isn't changed, there should be no issue. Of course, if RMs do not agree, then let's do it in 8.3.

The documentation is not the spec, the documentation reflects, and always will, the behaviour of the engine. So the argument "the documentation says something" does not hold up. Because if it did, dynamics properties should/could have been removed whenever, as it wasn't documented until they were deprecated.

And yes the output should not be relied upon, but changing this in a patch version makes no sense. We already have people complain about tests breaking to output changes when fixing actual bugs, so I really don't want to hear this for something that is just a cosmetic change.

@Girgias
Copy link
Member

Girgias commented Aug 9, 2023

I am not sure if using a separate <code> tag per line is actually more correct than the current version. MDN says:

To represent multiple lines of code, wrap the <code> element within a \<pre\> element.

Which is phrased as a single <code> element.

Correct, but then it says:

The <code> element by itself only represents a single phrase of code or line of code.

Nothing in the living spec document seems to indicate this being valid markup: https://html.spec.whatwg.org/multipage/text-level-semantics.html#the-code-element and https://html.spec.whatwg.org/multipage/grouping-content.html#the-pre-element

@bukka
Copy link
Member

bukka commented Aug 11, 2023

I agree that this should not go to stable branches. I'm fine with 8.3 though.

@kamil-tekiela
Copy link
Member Author

I rebased it against master, since this is what everyone agrees should be the target branch.

@kamil-tekiela kamil-tekiela marked this pull request as draft August 11, 2023 19:59
@kamil-tekiela kamil-tekiela marked this pull request as ready for review August 11, 2023 23:23
Copy link
Member

@TimWolla TimWolla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM now.

Copy link
Member

@Girgias Girgias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just add an entry into UPGRADING before merging (or after) about the format change.

Comment on lines 40 to +41
case '\t':
ZEND_PUTS("&nbsp;&nbsp;&nbsp;&nbsp;");
ZEND_PUTS(" ");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I'm a fan of replacing tabs by 4 spaces, but that's by the by.

ju1ius pushed a commit to ju1ius/php-src that referenced this pull request Aug 15, 2023
jorgsowa pushed a commit to jorgsowa/php-src that referenced this pull request Aug 16, 2023
cmb69 added a commit to cmb69/phd that referenced this pull request Oct 5, 2024
PHP 8.3 changed the details of highlighting PHP code[1], basically like
we did[2] at roughly the same time.  However, both don't work well
together, so we avoid that.

[1] <php/php-src#11913>
[2] <php#79>
cmb69 added a commit to php/phd that referenced this pull request Oct 5, 2024
PHP 8.3 changed the details of highlighting PHP code[1], basically like
we did[2] at roughly the same time.  However, both don't work well
together, so we avoid that.

[1] <php/php-src#11913>
[2] <#79>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants