Skip to content

why resample with group by not pad null value #13151

Closed
@starplanet

Description

@starplanet

Code Sample, a copy-pastable example if possible

I use resample with group by method to run following code:

import pandas as pd
data = [{'buyer_id': 1, 'pay_time': '2016-01-01', 'tid': '11'}, {'buyer_id': 1, 'pay_time': '2016-01-03', 'tid': '12'}, {'buyer_id': 1, 'pay_time': '2016-01-05', 'tid': '13'}, {'buyer_id': 2, 'pay_time': '2016-01-01', 'tid': '21'}, {'buyer_id': 2, 'pay_time': '2016-01-02', 'tid': '22'}, {'buyer_id': 2, 'pay_time': '2016-01-05', 'tid': '23'}]

df['pay_time'] = pd.to_datetime(df['pay_time'])
df.set_index('pay_time').groupby('buyer_id').resample('1D').count()

program output like following:

                     buyer_id  tid
buyer_id pay_time
1        2016-01-01         1    1
          2016-01-03         1    1
          2016-01-05         1    1
2        2016-01-01         1    1
         2016-01-02         1    1
         2016-01-05         1    1

but I want the output could pad the missing data with 0 like this:

                     buyer_id  tid
buyer_id pay_time
1        2016-01-01         1    1
          2016-01-02         0  0
          2016-01-03         1    1
          2016-01-04         0  0
          2016-01-05         1    1
2        2016-01-01         1    1
         2016-01-02         1    1
         2016-01-03        0  0
         2016-01-04        0  0
         2016-01-05         1    1

If I don't use groupby method, resample can pad missing data with 0, like this:

df.set_index('pay_time').resample('1D').count()

            buyer_id  tid
pay_time
2016-01-01         2    2
2016-01-02         1    1
2016-01-03         1    1
2016-01-04         0    0
2016-01-05         2    2

I don't know why the behavior of using resample alone is different with resample with group by.

output of pd.show_versions()

NSTALLED VERSIONS

commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Darwin
OS-release: 15.4.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: zh_CN.UTF-8

pandas: 0.18.0
nose: None
pip: 8.1.1
setuptools: 19.4
Cython: None
numpy: 1.11.0
scipy: None
statsmodels: None
xarray: None
IPython: 4.2.0
sphinx: 1.3.5
patsy: None
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions