Description
Code Sample, a copy-pastable example if possible
"Generate test dataframe"
case = 1
if case == 1:
start = '2018-11-26 16:17:43.510000'
else:
start = '2018-11-26 16:17:43.500000'
rng = pd.date_range(start, periods=10, freq='1S')
df = pd.DataFrame({'a':np.random.randn(len(rng)),'b':np.random.randn(len(rng))}, index=rng)
"Set interval and start time of the buckets"
interval = dt.timedelta(minutes=10)
t0 = df.index[0]
base = t0.minute + (t0.second +t0.microsecond/1e6)/60
groups = df.groupby(pd.Grouper(freq=interval, base=base))
print(groups.size())
Problem description
The code above generates either 6 or 7 groups depending if the dataframe starts at '2018-11-26 16:17:43.500000' (case 1) or '2018-11-26 16:17:43.510000' (case 2).
The correct output is clearly the one obtained in case 2. Case 1, instead, creates an empty group at the end of the dataframe. This can cause troubles with groupby.apply() if the applied function does not handle empty dataframes.
Actual Output
2018-11-26 16:17:43.510 10
2018-11-26 16:27:43.510 #0
dtype: int64
Expected Output
2018-11-26 16:17:43.510 10
dtype: int64
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.7.1.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en
LOCALE: None.None
pandas: 0.23.4
pytest: 4.0.2
pip: 18.1
setuptools: 40.6.3
Cython: 0.29.2
numpy: 1.15.4
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 7.2.0
sphinx: 1.8.2
patsy: 0.5.1
dateutil: 2.7.5
pytz: 2018.7
blosc: None
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.8
feather: None
matplotlib: 3.0.2
openpyxl: 2.5.12
xlrd: 1.2.0
xlwt: 1.3.0
xlsxwriter: 1.1.2
lxml: 4.2.5
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.2.15
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None