Skip to content

improve pseudo-html entity conversion #2932

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Sep 4, 2018
Merged

improve pseudo-html entity conversion #2932

merged 4 commits into from
Sep 4, 2018

Conversation

alexcjohnson
Copy link
Collaborator

  • Fix Some HTML entities are double-decoded. #2927 - Move entity conversion later in the convertToTspans pipeline, so we don't try to interpret &lt; as < and make a tag out of it
  • Support all numbered entities, decimal and hex - including the correct ampersand code!
  • Improve performance - instead of running the string through a separate regex replace for every entity, just run it through one regex replace that finds all entities. Note that this would make it feasible, performance-wise, to include ALL named html entities, but the full list is over 2000 items, and there are other options (unicode literals, or numbered entities if you want your code/JSON to be pure ASCII) so I suggest we keep it at just the existing set.
  • Make the GL version work the same as the SVG version. Rather than fix the fixEntities function there, which was overly aggressive about finding entities, I just pointed it to the SVG version. It's possible that the simple loop-based function there would be faster, but making it work correctly would have slowed it down and made it more complicated, and anyway the other transformations in html2unicode use regex replacements so I guess this wasn't hyper-optimized anyway.

cc @etpinard

- move it later in the convertToTspans pipeline
- support all numbered entities (including the correct ampersand code!)
});

it('decodes some HTML entities in text (number case)', function() {
var node = mockTextSVGElement(
'100&#956; &#28; &#60; 10 &#62; 0 &#160;' +
'100&#956; &#38; &#60; 10 &#62; 0 &#160;' +
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the ampersand is not &#28;...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. My bad.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh haha I assumed I did that 🙈

var stringFromCodePoint = String.fromCodePoint;
if(stringFromCodePoint) return stringFromCodePoint(code);

// IE doesn't have String.fromCodePoint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be fair, not just IE doesn't support fromCodePoint, 2015 Chrome and 2014 FF didn't have it either.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough - though we don't support such old versions of Chrome and FF, do we?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we?

We used to have a blob on https://plot.ly/javascript/ (that I can't find right now) saying plotly.js is supported in all SVG-enabled browsers. But yeah, the fraction of users still using Chrome41 and FF29 must be tiny.

@@ -10,7 +10,7 @@
'use strict';

var toSuperScript = require('superscript-text');
var stringMappings = require('../constants/string_mappings');
var fixEntities = require('./svg_text_utils').convertEntities;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice here.

Now string_mapping is only required in svg_text_utils.js. Would you mind moving the entityToUnicode lookup table to svg_text_utils.js and 🔪 string_mapping.js?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call -> 🔪 string_mapping.js in a943db2

var outChar;
if(innerMatch.charAt(0) === '#') {
// cannot use String.fromCodePoint in IE
outChar = fromCodePoint(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Support all numbered entities, decimal and hex - including the correct ampersand code!

Brilliant ✨

@etpinard
Copy link
Contributor

Very nice PR. 🥇

I tried to find a case where the new convertEntities logic would contradict the works of #804, but to no avail.

I'll give you a 💃 , but I wouldn't mind waiting a few days before merging it in case we need to patch 1.40.0.


Now about,

Note that this would make it feasible, performance-wise, to include ALL named html entities, but the full list is over 2000 items, and there are other options (unicode literals, or numbered entities if you want your code/JSON to be pure ASCII) so I suggest we keep it at just the existing set.

If someone really wants us to decode all named html entities, I suggest we make an optional component (similar to calendars) that would require some third-party module (e.g. he) to handle the conversions.

@etpinard etpinard added this to the v1.41.0 milestone Aug 23, 2018
@alexcjohnson alexcjohnson merged commit b3dd749 into master Sep 4, 2018
@alexcjohnson alexcjohnson deleted the text-entities branch September 4, 2018 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug something broken feature something new
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Some HTML entities are double-decoded.
2 participants