Skip to content

annotation-xml in MathML breaks the parser #258

Closed
@andreyfedoseev

Description

@andreyfedoseev

Here's a piece of MathML that breaks the parser:

<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
  <semantics>
    <mrow>
      <mrow>
        <mtext>Mass of electron</mtext>
        <mo>=</mo>
        <mn>1.602</mn>
        <mspace width="0.2em" />
        <mo>&#xD7;</mo>
        <mspace width="0.2em" />
        <msup>
          <mrow>
            <mn>10</mn>
          </mrow>
          <mrow>
            <mn>&#x2212;19</mn>
          </mrow>
        </msup>
        <mspace width="0.2em" />
        <mtext>C</mtext>
        <mspace width="0.2em" />
        <mo>&#xD7;</mo>
        <mspace width="0.4em" />
        <mfrac>
          <mrow>
            <mn>1</mn>
            <mspace width="0.2em" />
            <mtext>kg</mtext>
          </mrow>
          <mrow>
            <mn>1.759</mn>
            <mspace width="0.2em" />
            <mo>&#xD7;</mo>
            <mspace width="0.2em" />
            <msup>
              <mrow>
                <mn>10</mn>
              </mrow>
              <mrow>
                <mn>11</mn>
              </mrow>
            </msup>
            <mspace width="0.2em" />
            <mtext>C</mtext>
          </mrow>
        </mfrac>
        <mspace width="0.2em" />
        <mo>=</mo>
        <mn>9.107</mn>
        <mspace width="0.2em" />
        <mo>&#xD7;</mo>
        <mspace width="0.2em" />
        <msup>
          <mrow>
            <mn>10</mn>
          </mrow>
          <mrow>
            <mn>&#x2212;31</mn>
          </mrow>
        </msup>
        <mspace width="0.2em" />
        <mtext>kg</mtext>
      </mrow>
    </mrow>
    <annotation-xml encoding="MathML-Content">
      <mrow><mtext>Mass of electron</mtext><mo>=</mo><mn>1.602</mn><mspace width="0.2em"></mspace><mo>×</mo><mspace width="0.2em"></mspace><msup><mrow><mn>10</mn></mrow><mrow><mn>−19</mn></mrow></msup><mspace width="0.2em"></mspace><mtext>C</mtext><mspace width="0.2em"></mspace><mo>×</mo><mspace width="0.4em"></mspace><mfrac><mrow><mn>1</mn><mspace width="0.2em"></mspace><mtext>kg</mtext></mrow><mrow><mn>1.759</mn><mspace width="0.2em"></mspace><mo>×</mo><mspace width="0.2em"></mspace><msup><mrow><mn>10</mn></mrow><mrow><mn>11</mn></mrow></msup><mspace width="0.2em"></mspace><mtext>C</mtext></mrow></mfrac><mspace width="0.2em"></mspace><mo>=</mo><mn>9.107</mn><mspace width="0.2em"></mspace><mo>×</mo><mspace width="0.2em"></mspace><msup><mrow><mn>10</mn></mrow><mrow><mn>−31</mn></mrow></msup><mspace width="0.2em"></mspace><mtext>kg</mtext></mrow>
    </annotation-xml>
  </semantics>
</math>

When I pass it to html5lib.parse I get the following traceback:

.../html5lib/html5parser.pyc in mainLoop(self)
    173                         (currentNodeNamespace == namespaces["mathml"] and
    174                          currentNodeName == "annotation-xml" and
--> 175                          token["name"] == "svg") or
    176                         (self.isHTMLIntegrationPoint(currentNode) and
    177                          type in (StartTagToken, CharactersToken, SpaceCharactersToken))):

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions