Skip to content

split up pandas/tests/indexes/test_multi.py #18644 #21514

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jul 3, 2018

Conversation

elmq0022
Copy link
Contributor

@elmq0022 elmq0022 commented Jun 17, 2018

The file is split out, the number of tests remains 377 (2 skips on my machine; all other test cases pass), the code uses fixtures in all instances, new fixtures are in a local conftest.py file, and there are no dependencies on classes.

I'm happy to receive and address any feedback.

@codecov
Copy link

codecov bot commented Jun 17, 2018

Codecov Report

Merging #21514 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #21514   +/-   ##
=======================================
  Coverage   91.92%   91.92%           
=======================================
  Files         158      158           
  Lines       49703    49703           
=======================================
  Hits        45689    45689           
  Misses       4014     4014
Flag Coverage Δ
#multiple 90.3% <ø> (ø) ⬆️
#single 41.99% <ø> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8f32214...baaace3. Read the comment docs.



@pytest.fixture
def _index():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need to prefix with '_'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The non-fixture variable "index" is used in many of these tests. Several of these also use the fixture "index" which I dubbed _index in an attempt to distinguish them.

I suggest using something like m_index or index_fixture to provide some separation.

Thoughts?

from pandas.core.indexes.datetimelike import DatetimeIndexOpsMixin


def verify_pickle(indices):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't create functions like this, inline it instead (unless there is a lot of usage about this, but indicate that)

# GH11193, when an existing index is passed, and a new name is not
# specified, the new index should inherit the previous object name
expected = _index
if not isinstance(expected, MultiIndex):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would this NOT be a MI?

@@ -0,0 +1,938 @@
# -*- coding: utf-8 -*-

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as long as you are splitting, would not create a test_base, rather create functional named files, similar to how pandas/tests/indexes/datetimes; since you already have some functional area files below, just distribute these

assert not _index.is_numeric()


def test_bounds(_index):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where did this come from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the file itself, correct? I somewhat started over, and inretrospect, I really shouldn't have. I can more closely follow Xavier's orginal stucture or that of another similar directory.

expected = mi.get_level_values(level)


def test_multiindex_compare():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move compare/equal/identical to a new file


idx = MultiIndex.from_arrays([['qux', 'baz', 'foo', 'bar'], np.arange(
4)])
result = idx.isin(values)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isin should be in indexing

level='first')


def test_reindex_preserves_names_when_target_is_list_or_ndarray(_index):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call this test_indexing

@jreback jreback added Testing pandas testing functions or related to the test suite MultiIndex labels Jun 18, 2018
@elmq0022
Copy link
Contributor Author

@jreback thank you for the quick feedback. I will make updates over the next day or two.

elmq0022 added a commit to elmq0022/pandas that referenced this pull request Jun 21, 2018
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking really good. a couple of comments / questions.

assert result == i


def test_dims():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

next PR can come back and remove this

first.tolist() - idx[-3:]


def test_nlevels(idx):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be in integrity

# GH 21149
# Ensure comparison operations for MultiIndex with nlevels == 1
# behave consistently with those for MultiIndex with nlevels > 1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to equivalence

tm.assert_raises_regex(ValueError, "Duplicated level name:",
mi.rename, names[:2], level=[0, 2])


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you split out duplicated & uniques tests to a separate file

@@ -0,0 +1,88 @@
# -*- coding: utf-8 -*-
import numpy as np
import pandas.util.testing as tm
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you have a couple of tests for unsorted in integrity, can you move here

return index


@pytest.fixture
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that you are iterating over this in a number of tests. why is that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's more or less a holdover from how the inherited base test class was implemented. I was thinking I'd leave it in case anyone wanted to extend some of these cases with another test multi-index, but yes it looks kind of sloppy. I'll refactor it and remove the duplicate fixture.

Copy link
Contributor Author

@elmq0022 elmq0022 Jun 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, done with that, but some tests are not relevant to the MultiIndex class or will need to be refactored. I marked those tests with a TODO. The tests that I could refactor have been. 😅

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep that's ok. i expect after your refactor will need to go over this again and clean a bit

@jreback jreback added this to the 0.24.0 milestone Jun 22, 2018
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a number of comments, but should be straightforward. ping on green.


@pytest.fixture
def idx():
major_axis = Index(['foo', 'bar', 'baz', 'qux'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment describing the fixture (always do this for fixtures)

return ['first', 'second']


@pytest.fixture
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok w/o using these a leading _ (e.g. holder) is ok

def f():
if idx:
pass

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you change to do this with with

Copy link
Contributor Author

@elmq0022 elmq0022 Jun 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not following here. What is the need to use a context manager here? None of the functions seems to have a valid __enter__ or __exit__ methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want something like with pytest.raises(ValueError):?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep exactly

def f():
if common:
pass

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above



def test_reconstruct_sort():

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these 2 should be in test_sorting (reconstrut_*)

assert not ind.is_monotonic


def test_partial_string_timestamp_multiindex():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partial string should be in test_indexing, or test_timeseries

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this to test_indexing.

@pytest.mark.parametrize('first_type,second_type', [
('int64', 'int64'),
('datetime64[D]', 'str')])
def test_remove_unused_levels_large(first_type, second_type):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this to test_sorting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

['a', 'd', 'b', 'unused']])
@pytest.mark.parametrize('level1', [['w', 'x', 'y', 'z'],
['w', 'x', 'y', 'z', 'unused']])
def test_remove_unused_nan(level0, level1):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to test_sorting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved it, but why not in test_missing.py? Is it because of the implementation of the underlying algorithm?

(u('x'), u('out'), u('z'), 7, u('y'), u('in'), u('z'), 119),
(u('x'), u('out'), u('z'), 9, u('y'), u('in'), u('z'), 135),
(u('x'), u('out'), u('z'), 13, u('y'), u('in'), u('z'), 145),
(u('x'), u('out'), u('z'), 14, u('y'), u('in'), u('z'), 158),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just rename module to: test_unique_and_duplicates

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

assert idx[:0].empty


def test_unique_na():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move test_unique_and_duplicates

elmq0022 added a commit to elmq0022/pandas that referenced this pull request Jun 26, 2018
@elmq0022
Copy link
Contributor Author

elmq0022 commented Jun 26, 2018

So, someone removed the get_data_path from the pandas.utils.testing. I forgot to run pytest after I rebased. I can inline the function if need be.

The bigger issue is the rebase broke my ability to run pytest even on master. Any thoughts on how to deal with the below?

Traceback (most recent call last):
  File "/home/zero/anaconda3/envs/pandas-dev/lib/python3.6/site-packages/_pytest/config.py", line 334, in _getconftestmodules
    return self._path2confmods[path]
KeyError: local('/home/zero/pandas/pandas/tests/indexes/test_multi.py')

During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/home/zero/anaconda3/envs/pandas-dev/lib/python3.6/site-packages/_pytest/config.py", line 334, in _getconftestmodules
    return self._path2confmods[path]
KeyError: local('/home/zero/pandas/pandas/tests/indexes')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):  File "/home/zero/anaconda3/envs/pandas-dev/lib/python3.6/site-packages/_pytest/config.py", line 365, in _importconftest
    return self._conftestpath2mod[conftestpath]
KeyError: local('/home/zero/pandas/pandas/conftest.py')

During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/home/zero/anaconda3/envs/pandas-dev/lib/python3.6/site-packages/_pytest/config.py", line 371, in _importconftest
    mod = conftestpath.pyimport()
  File "/home/zero/anaconda3/envs/pandas-dev/lib/python3.6/site-packages/py/_path/local.py", line 668, in pyimport
    __import__(modname)
  File "/home/zero/pandas/pandas/__init__.py", line 42, in <module>
    from pandas.core.api import *
  File "/home/zero/pandas/pandas/core/api.py", line 10, in <module>
    from pandas.core.groupby.groupby import Grouper
  File "/home/zero/pandas/pandas/core/groupby/__init__.py", line 2, in <module>
    from pandas.core.groupby.groupby import (
  File "/home/zero/pandas/pandas/core/groupby/groupby.py", line 46, in <module>
    from pandas.core.index import (Index, MultiIndex,
  File "/home/zero/pandas/pandas/core/index.py", line 2, in <module>
    from pandas.core.indexes.api import *
  File "/home/zero/pandas/pandas/core/indexes/api.py", line 4, in <module>
    from pandas.core.indexes.base import (Index,
  File "/home/zero/pandas/pandas/core/indexes/base.py", line 7, in <module>
    from pandas._libs import (lib, index as libindex, tslib as libts,
  File "pandas/_libs/index.pyx", line 28, in init pandas._libs.index
    from pandas._libs.tslibs import period as periodlib
  File "pandas/_libs/tslibs/period.pyx", line 59, in init pandas._libs.tslibs.period
  File "/home/zero/pandas/pandas/tseries/offsets.py", line 2327, in <module>
    offset=BDay(), time_rule=None):
  File "/home/zero/pandas/pandas/tseries/offsets.py", line 484, in __init__
    BaseOffset.__init__(self, n, normalize)
TypeError: object.__init__() takes no parameters
ERROR: could not load /home/zero/pandas/pandas/conftest.py

@WillAyd
Copy link
Member

WillAyd commented Jun 26, 2018

You need to rebuild the Cython extensions after pulling from master as there have been recent changes with those

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

last round! thanks for this. ping on green.



@pytest.mark.xfail
def test_hasnans_isnans(idx):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move to test_missing

if len(idx) == 0:
pass
elif isinstance(idx, DatetimeIndexOpsMixin):
values[1] = iNaT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can simplify this here

assert result.names == index.names


@pytest.mark.skip
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs to be @pytest.mark.skipif(PY3, reason=......)

from pandas.compat import lrange, range


def test_get_loc(idx):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_loc* also goes to test_indexing


pytest.raises(KeyError, idx.get_loc, (False, True))
pytest.raises(KeyError, idx.get_loc, (True, False))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thru here


arrays = [[], []]
index = pd.MultiIndex.from_arrays(arrays)
result = index.get_level_values(0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i indicated some more to move

assert result == (0, len(index))


def test_shift(idx):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to test_analytics

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the "I indicated some more to move" above, my impression is to move all the test_get_level_* to test_indexing.py. Is that correct?

assert indexer.dtype == np.intp


def test_partial_string_timestamp_multiindex():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move test_partial to a new file: test_partial_indexing.py



def test_nan_stays_float():

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to test_missing

tm.assert_numpy_array_equal(index.notna(), ~result)


@pytest.mark.parametrize('level', [0, 1])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these last 2 here move to test_indexing

@elmq0022
Copy link
Contributor Author

elmq0022 commented Jun 26, 2018 via email

@elmq0022
Copy link
Contributor Author

I think the datapath fixture is a step backward from the get_data_path which was relative to the test directory. I can't run pytest from the directly from the tests/indexes/mulit directory successfully anymore. I now have to run it from the root directory if I want the test to pass. This should probably be relative to the director of the test like before.

@jreback
Copy link
Contributor

jreback commented Jun 26, 2018

I think the datapath fixture is a step backward from the get_data_path which was relative to the test directory. I can't run pytest from the directly from the tests/indexes/mulit directory successfully anymore. I now have to run it from the root directory if I want the test to pass. This should probably be relative to the director of the test like before.

hmm. I almost always run at the root directory. but I suppose this could be addressed as well. can you open a separate issue for this?

cc @TomAugspurger

@TomAugspurger
Copy link
Contributor

How does pytest handle fixtures when you don't run from the root of the tests dir?

@elmq0022
Copy link
Contributor Author

elmq0022 commented Jun 26, 2018 via email

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jun 26, 2018 via email

@elmq0022
Copy link
Contributor Author

I will try to get to the last round of comments later this week. I pushed some fixes prior to seeing the comments.

@elmq0022
Copy link
Contributor Author

@TomAugspurger I see where you're going with the fixture. The approach definitely makes sense, but I just like that absolute file paths let me run code regardless of where I am on the command line. I opened an issue, #21646, and submitted a pull request to address it. Thanks.

@jreback
Copy link
Contributor

jreback commented Jun 26, 2018

thanks @elmq0022 ping when you want me to look.

@elmq0022
Copy link
Contributor Author

@jreback looks like everything is green. I believe all the comments are addressed. I also included the last updates to test_multi.py in test unique and duplicates.

@elmq0022
Copy link
Contributor Author

elmq0022 commented Jul 3, 2018

@jreback it looks like there were a couple of tests added that I need to include and rebase. Do you have any additional feedback. I would like to get this merged as soon as possible so everything can stay in sync. Thanks!

@jreback
Copy link
Contributor

jreback commented Jul 3, 2018

yes I agree. pls rebase. ping as soon as green and can merge.

@elmq0022 elmq0022 force-pushed the split_multi_test branch from 8f360d9 to baaace3 Compare July 3, 2018 22:24
@elmq0022
Copy link
Contributor Author

elmq0022 commented Jul 3, 2018

@jreback I just pushed the rebased code. Will check back in a half hour or so to see if everything is green. Thanks!

@jreback
Copy link
Contributor

jreback commented Jul 3, 2018

great thanks! I will go thru and possibly make some comments. but take them as for a followup.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bunch of comments. but let's do these in a follow up please. thanks!

tm.assert_index_equal(result, expected)


def test_from_arrays_invalid_input():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parameterize

pytest.raises(TypeError, MultiIndex.from_arrays, arrays=i)


def test_from_arrays_different_lengths():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parameterize

labels=[[], [], []], names=names)
tm.assert_index_equal(result, expected)


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parametrize

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broke into several parameterized tests as the length of the names and the levels were different in each. Trying to do it all in one level seemed clunky to me.



def test_from_product_index_series_categorical():
# GH13743
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parametrize

index = idx
index.names = ['foo', 'bar']
result = pd.Index(index)
tm.assert_index_equal(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

write the expected out on a separate line

idx.astype('category')


def test_repeat():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to test_analytics

first.tolist() - idx[-3:]


def test_argsort(idx):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to the sorting tests

tm.assert_numpy_array_equal(result, expected)


def test_map(idx):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move map tests to test_analytics

with np.errstate(all='ignore'):
func(idx)

for func in [np.isfinite, np.isinf, np.isnan, np.signbit]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

want to split & parameterize these tests

tm.assert_index_equal(result, expected)


def test_numpy_ufuncs(idx):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just combine test_operations and test_analytics (call it test_analytics)

@jreback jreback merged commit e77e428 into pandas-dev:master Jul 3, 2018
@jreback
Copy link
Contributor

jreback commented Jul 3, 2018

thanks @elmq0022 very very nice! great job!

if you can do a followup to address comments (and some TODO's you had in the code would be much appreciated)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
MultiIndex Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TST: split up pandas/tests/indexes/test_multi.py
4 participants