Description
Code Sample, a copy-pastable example if possible
import pandas as pd
from datetime import datetime
data = [
{
't': datetime(2017,6,1,0),
'x': 1.0,
'y': 2.0
},
{
't': datetime(2017,6,1,1),
'x': 2.0,
'y': 2.0
},
{
't': datetime(2017,6,1,2),
'x': 3.0,
'y': 1.5
}
]
df = pd.DataFrame(data)
df = df.set_index('t')
# Perform a resample is get a binned time series DataFrame... this works fine
ts = df.resample('30T').agg({'x':['mean'],'y':['median']})
print ts['x'].shape
# (5,1)
# What if I put a field in there that doesn't exist?
ts = df.resample('30T').agg({'x':['mean'],'y':['median'],'z':['sum']})
print ts['x'].shape
# (5,2) ??? I don't understand why the shape isn't (5,1)
print ts['x'].values
#[[ 1. 2. ]
# [ nan nan]
# [ 2. 2. ]
# [ nan nan]
# [ 3. 1.5]]
# Looks like a copy of the full aggregation even though I only requested the 'x' column
# Furthermore, now ts['z'] exists
Output:
(5, 1)
[[ 1.]
[ nan]
[ 2.]
[ nan]
[ 3.]]
(5, 2)
[[ 1. 2. ]
[ nan nan]
[ 2. 2. ]
[ nan nan]
[ 3. 1.5]]
Problem description
I am using pandas on records from an Elasticsearch database. The queries are pulling from multiple indices with overlapping column/field names. Most records will have most of their data in common, but when available, I want to know the values of other fields. I'm essentially creating a time series for each column with a specific time-based binning and a per-column aggregation.
I think this should either ignore columns that don't exist or raise an exception. If it silently ignores these columns, ts['x']
should return the same result as the first example. I spent several hours on a workaround today that required that I check for which columns were available for each aggregation and remove those from my agg dictionary. I feel like the current behavior doesn't have a purpose, but perhaps I'm missing something.
Expected Output
(5, 1)
[[ 1.]
[ nan]
[ 2.]
[ nan]
[ 3.]]
(5, 1)
[[ 1.]
[ nan]
[ 2.]
[ nan]
[ 3.]]
OR
ValueError("Column 'z' does not exist!")
Output of pd.show_versions()
pandas: 0.20.2
pytest: None
pip: 9.0.1
setuptools: 0.9.8
Cython: None
numpy: 1.11.3
scipy: 0.18.1
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 1.5
pytz: 2012d
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 1.5.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
pandas_gbq: None
pandas_datareader: None