Skip to content

TST: corrwith and tshift in groupby/groupby.transform #32069

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
83 changes: 83 additions & 0 deletions pandas/tests/groupby/test_transform.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,16 @@
from pandas.core.groupby.groupby import DataError


@pytest.fixture
def df_for_transformation_func():
return DataFrame(
{
"A": [121, 121, 121, 121, 231, 231, 676],
"B": [1.0, 2.0, 2.0, 3.0, 3.0, 3.0, 4.0],
}
)


def assert_fp_equal(a, b):
assert (np.abs(a - b) < 1e-12).all()

Expand Down Expand Up @@ -346,6 +356,79 @@ def test_transform_transformation_func(transformation_func):
tm.assert_frame_equal(result, expected)


def test_groupby_corrwith(df_for_transformation_func):

# GH 27905
df = df_for_transformation_func
g = df.groupby("A")

op = lambda x: getattr(x, "corrwith")(df)
result = op(g)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is g.corrwith(df).

should we be testing g.transform("corrwith", df) to address #27905? (which raises AttributeError: 'Series' object has no attribute 'corrwith')

g.corrwith(df) is tested in pandas\tests\groupby\test_groupby.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the only instance using corrwith in test_groupby.py is in test_dup_labels_output_shape which doesn't seem to be fully testing the output of corrwith as is done here

expected = pd.DataFrame(dict(B=[1, np.nan, np.nan], A=[np.nan] * 3))
expected.index = pd.Index([121, 231, 676], name="A")
tm.assert_frame_equal(result, expected)


def test_groupby_transform_nan(df_for_transformation_func):

# GH 27905
df = df_for_transformation_func
g = df.groupby("A")

df["B"] = [1, np.nan, np.nan, 3, np.nan, 3, 4]
result = g.transform("fillna", value=1)
expected = pd.DataFrame({"B": [1.0, 1.0, 1.0, 3.0, 1.0, 3.0, 4.0]})
tm.assert_frame_equal(result, expected)
op = lambda x: getattr(x, "fillna")(1)
result = op(g)
tm.assert_frame_equal(result, expected)


def test_groupby_tshift(df_for_transformation_func):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this test be testing g.transform("tshift", ...)?

Copy link
Contributor Author

@ryankarlos ryankarlos Feb 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output of g.transform("tshift", ...) seems to be buggy - ive just raised an issue #32344. Ive added a case for this and marked as pytest.xfail for now.


# GH 27905
df = df_for_transformation_func
dt_periods = pd.date_range("2013-11-03", periods=7, freq="D")
df["C"] = dt_periods
g = df.set_index("C").groupby("A")

op = lambda x: getattr(x, "tshift")(2, "D")
result = op(g)
df["C"] = dt_periods + dt_periods.freq * 2
expected = df
tm.assert_frame_equal(
result.reset_index().reindex(columns=["A", "B", "C"]), expected
)


def test_check_original_and_transformed_index(transformation_func):

# GH 27905
df = DataFrame(
{
"A": [121, 121, 121, 121, 231, 231, 676],
"B": [1.0, 2.0, 2.0, 3.0, 3.0, 3.0, 4.0],
}
)

df = DataFrame({"A": [0, 0, 0, 1, 1, 1], "B": [0, 1, 2, 3, 4, 5]})
g = df.groupby("A")

if transformation_func in [
"cummax",
"cummin",
"cumprod",
"cumsum",
"diff",
"ffill",
"pct_change",
"rank",
"shift",
]:
result = g.transform(transformation_func)
tm.assert_index_equal(result.index, df.index)


def test_transform_select_columns(df):
f = lambda x: x.mean()
result = df.groupby("A")[["C", "D"]].transform(f)
Expand Down