Skip to content

BUG: Groupby.transform with tshift giving incorrect result #32344

Open
@ryankarlos

Description

@ryankarlos

Came across this inconsistency whilst trying to write a test for tshift in #32069

>>> import pandas as pd
>>> df = pd.DataFrame({'A':[121, 121, 121, 121, 231, 231, 676], 'B':[1.0, 2.0, 2.0, 3.0, 3.0, 3.0, 4.0], "C": pd.date_range("2013-11-03", periods=7)})

>>> df
     A    B          C
0  121  1.0 2013-11-03
1  121  2.0 2013-11-04
2  121  2.0 2013-11-05
3  121  3.0 2013-11-06
4  231  3.0 2013-11-07
5  231  3.0 2013-11-08
6  676  4.0 2013-11-09

>>> g = df.set_index("C").groupby("A")

>>> g.transform(lambda x: x.tshift(2, "D"))
              B
C              
2013-11-03  1.0
2013-11-04  2.0
2013-11-05  2.0
2013-11-06  3.0
2013-11-07  3.0
2013-11-08  3.0
2013-11-09  4.0

>>> g.transform("tshift", *[2, "D"])
              B
C              
2013-11-03  1.0
2013-11-04  2.0
2013-11-05  2.0
2013-11-06  3.0
2013-11-07  3.0
2013-11-08  3.0
2013-11-09  4.0

Problem description

Using tshift in groupby.transform seems to drop A from the index. Also, this seems to be leaving the dates unshifted as seen in the results above.

Expected Output

Would expect something like below which is achieved correctly using groupby.tshift

>>> g.tshift(2, "D")

                  B
A   C              
121 2013-11-05  1.0
    2013-11-06  2.0
    2013-11-07  2.0
    2013-11-08  3.0
231 2013-11-09  3.0
    2013-11-10  3.0
676 2013-11-11  4.0

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 94befe6
python : 3.7.6.final.0
python-bits : 64
OS : Darwin
OS-release : 19.0.0
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8

pandas : 0.25.0.dev0+3348.g94befe6.dirty
numpy : 1.17.3
pytz : 2019.3
dateutil : 2.8.1
pip : 19.3.1
setuptools : 42.0.2.post20191201
Cython : 0.29.14
pytest : 5.3.2
hypothesis : 4.56.3
sphinx : 2.3.1
blosc : None
feather : None
xlsxwriter : 1.2.7
lxml.etree : 4.4.2
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.10.2
pandas_datareader: None
bs4 : 4.8.2
bottleneck : 1.3.1
fastparquet : 0.3.2
gcsfs : None
lxml.etree : 4.4.2
matplotlib : 3.1.2
numexpr : 2.7.0
odfpy : None
openpyxl : 3.0.1
pandas_gbq : None
pyarrow : 0.15.1
pytables : None
pytest : 5.3.2
pyxlsb : None
s3fs : 0.4.0
scipy : 1.4.1
sqlalchemy : 1.3.12
tables : 3.6.1
tabulate : None
xarray : 0.14.1
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.7
numba : 0.46.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions