Skip to content

TestReadHtml.test_computer_sales_page - what's it doing? #17074

Closed
@jowens

Description

@jowens

@chris-b1 or anyone else, help a brother out? Can you tell me what this test does? It's just expecting the parser to throw an error? The output from the test code (where it's failing) is at the bottom. It's a pretty weird HTML file.

computer_sales_page_html

Now, if I call it with my current in-progress code as dfs = pd.read_html('computer_sales_page.html', header=[0, 1]), I see:

Index([         (u'Unnamed: 0_level_0', u'Unnamed: 0_level_1'),
                (u'Unnamed: 1_level_0', u'Unnamed: 1_level_1'),
                     (u'Three months ended April?30', u'2013'),
              u'(u'Three months ended April\xa030', '2013').1',
       (u'Three months ended April?30', u'Unnamed: 4_level_1'),
                     (u'Three months ended April?30', u'2012'),
              u'(u'Three months ended April\xa030', '2012').1',
                (u'Unnamed: 7_level_0', u'Unnamed: 7_level_1'),
                       (u'Six months ended April?30', u'2013'),
                u'(u'Six months ended April\xa030', '2013').1',
        (u'Six months ended April?30', u'Unnamed: 10_level_1'),
                       (u'Six months ended April?30', u'2012'),
                u'(u'Six months ended April\xa030', '2012').1',
              (u'Unnamed: 13_level_0', u'Unnamed: 13_level_1')],
      dtype='object')

and if I call it without a header argument (dfs = pd.read_html('computer_sales_page.html')), I see:

Index([   (u'Unnamed: 0_level_0', u'Unnamed: 0_level_1', u'Unnamed: 0_level_2'),
          (u'Unnamed: 1_level_0', u'Unnamed: 1_level_1', u'Unnamed: 1_level_2'),
                      (u'Three months ended April?30', u'2013', u'In millions'),
                u'(u'Three months ended April\xa030', '2013', 'In millions').1',
        (u'Three months ended April?30', u'Unnamed: 4_level_1', u'In millions'),
                      (u'Three months ended April?30', u'2012', u'In millions'),
                u'(u'Three months ended April\xa030', '2012', 'In millions').1',
                 (u'Unnamed: 7_level_0', u'Unnamed: 7_level_1', u'In millions'),
                        (u'Six months ended April?30', u'2013', u'In millions'),
                  u'(u'Six months ended April\xa030', '2013', 'In millions').1',
         (u'Six months ended April?30', u'Unnamed: 10_level_1', u'In millions'),
                        (u'Six months ended April?30', u'2012', u'In millions'),
                  u'(u'Six months ended April\xa030', '2012', 'In millions').1',
       (u'Unnamed: 13_level_0', u'Unnamed: 13_level_1', u'Unnamed: 13_level_2')],
      dtype='object')

These seem like OK outputs to me. I'm not sure what the original test is supposed to show. I think I'd like to just delete the test if it's supposed to fail (and no longer fails).

____________________ TestReadHtml.test_computer_sales_page _____________________

self = <pandas.tests.io.test_html.TestReadHtml object at 0x1120aa390>

    def test_computer_sales_page(self):
        data = os.path.join(DATA_PATH, 'computer_sales_page.html')
        with tm.assert_raises_regex(ParserError,
                                    r"Passed header=\[0,1\] are "
                                    r"too many rows for this "
                                    r"multi_index of columns"):
>           self.read_html(data, header=[0, 1])

pandas/tests/io/test_html.py:778:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <pandas.util.testing._AssertRaisesContextmanager object at 0x1120aab50>
exc_type = None, exc_value = None, trace_back = None

    def __exit__(self, exc_type, exc_value, trace_back):
        expected = self.exception

        if not exc_type:
            exp_name = getattr(expected, "__name__", str(expected))
>           raise AssertionError("{0} not raised.".format(exp_name))
E           AssertionError: ParserError not raised.

pandas/util/testing.py:2491: AssertionError

Metadata

Metadata

Assignees

No one assigned

    Labels

    IO HTMLread_html, to_html, Styler.apply, Styler.applymapTestingpandas testing functions or related to the test suite

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions