Closed
Description
Here's a piece of MathML that breaks the parser:
<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
<semantics>
<mrow>
<mrow>
<mtext>Mass of electron</mtext>
<mo>=</mo>
<mn>1.602</mn>
<mspace width="0.2em" />
<mo>×</mo>
<mspace width="0.2em" />
<msup>
<mrow>
<mn>10</mn>
</mrow>
<mrow>
<mn>−19</mn>
</mrow>
</msup>
<mspace width="0.2em" />
<mtext>C</mtext>
<mspace width="0.2em" />
<mo>×</mo>
<mspace width="0.4em" />
<mfrac>
<mrow>
<mn>1</mn>
<mspace width="0.2em" />
<mtext>kg</mtext>
</mrow>
<mrow>
<mn>1.759</mn>
<mspace width="0.2em" />
<mo>×</mo>
<mspace width="0.2em" />
<msup>
<mrow>
<mn>10</mn>
</mrow>
<mrow>
<mn>11</mn>
</mrow>
</msup>
<mspace width="0.2em" />
<mtext>C</mtext>
</mrow>
</mfrac>
<mspace width="0.2em" />
<mo>=</mo>
<mn>9.107</mn>
<mspace width="0.2em" />
<mo>×</mo>
<mspace width="0.2em" />
<msup>
<mrow>
<mn>10</mn>
</mrow>
<mrow>
<mn>−31</mn>
</mrow>
</msup>
<mspace width="0.2em" />
<mtext>kg</mtext>
</mrow>
</mrow>
<annotation-xml encoding="MathML-Content">
<mrow><mtext>Mass of electron</mtext><mo>=</mo><mn>1.602</mn><mspace width="0.2em"></mspace><mo>×</mo><mspace width="0.2em"></mspace><msup><mrow><mn>10</mn></mrow><mrow><mn>−19</mn></mrow></msup><mspace width="0.2em"></mspace><mtext>C</mtext><mspace width="0.2em"></mspace><mo>×</mo><mspace width="0.4em"></mspace><mfrac><mrow><mn>1</mn><mspace width="0.2em"></mspace><mtext>kg</mtext></mrow><mrow><mn>1.759</mn><mspace width="0.2em"></mspace><mo>×</mo><mspace width="0.2em"></mspace><msup><mrow><mn>10</mn></mrow><mrow><mn>11</mn></mrow></msup><mspace width="0.2em"></mspace><mtext>C</mtext></mrow></mfrac><mspace width="0.2em"></mspace><mo>=</mo><mn>9.107</mn><mspace width="0.2em"></mspace><mo>×</mo><mspace width="0.2em"></mspace><msup><mrow><mn>10</mn></mrow><mrow><mn>−31</mn></mrow></msup><mspace width="0.2em"></mspace><mtext>kg</mtext></mrow>
</annotation-xml>
</semantics>
</math>
When I pass it to html5lib.parse
I get the following traceback:
.../html5lib/html5parser.pyc in mainLoop(self)
173 (currentNodeNamespace == namespaces["mathml"] and
174 currentNodeName == "annotation-xml" and
--> 175 token["name"] == "svg") or
176 (self.isHTMLIntegrationPoint(currentNode) and
177 type in (StartTagToken, CharactersToken, SpaceCharactersToken))):