Skip to content

Groupby creates emptu groups depending on base parameter  #25161

Closed
@LucaAmerio

Description

@LucaAmerio

Code Sample, a copy-pastable example if possible

"Generate test dataframe"
case = 1
if case == 1:
    start = '2018-11-26 16:17:43.510000'
else:
    start = '2018-11-26 16:17:43.500000'

rng = pd.date_range(start, periods=10, freq='1S')
df = pd.DataFrame({'a':np.random.randn(len(rng)),'b':np.random.randn(len(rng))}, index=rng)

"Set interval and start time of the buckets"
interval = dt.timedelta(minutes=10)
t0 = df.index[0]
base = t0.minute + (t0.second +t0.microsecond/1e6)/60
groups = df.groupby(pd.Grouper(freq=interval, base=base))

print(groups.size())

Problem description

The code above generates either 6 or 7 groups depending if the dataframe starts at '2018-11-26 16:17:43.500000' (case 1) or '2018-11-26 16:17:43.510000' (case 2).

The correct output is clearly the one obtained in case 2. Case 1, instead, creates an empty group at the end of the dataframe. This can cause troubles with groupby.apply() if the applied function does not handle empty dataframes.

Actual Output

2018-11-26 16:17:43.510 10
2018-11-26 16:27:43.510 #0
dtype: int64

Expected Output

2018-11-26 16:17:43.510 10
dtype: int64

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.7.1.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en
LOCALE: None.None

pandas: 0.23.4
pytest: 4.0.2
pip: 18.1
setuptools: 40.6.3
Cython: 0.29.2
numpy: 1.15.4
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 7.2.0
sphinx: 1.8.2
patsy: 0.5.1
dateutil: 2.7.5
pytz: 2018.7
blosc: None
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.8
feather: None
matplotlib: 3.0.2
openpyxl: 2.5.12
xlrd: 1.2.0
xlwt: 1.3.0
xlsxwriter: 1.1.2
lxml: 4.2.5
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.2.15
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions