Description
When trying to insert/append a subclass (or composition) of a pandas Series into a DataFrame, any and all of the 'extra' functions that come with my subclass (or composition) are stripped and a Series is created:
In [7]: df = read_csv('some/data/from/file.csv')
In [8]: sp = SpatialSeries(df.the_geom) # SpatialSeries is subclass, the_geom is spatial location (WKT)
In [9]: type(sp)
Out[9]: spseries.SpatialSeries
In [10]: type(df)
Out[10]: pandas.core.frame.DataFrame
In [11]: df['geoms'] = sp
In [12]: type(df['geoms'])
Out[12]: pandas.core.series.Series
I suspect that for the most part, this kind of behaviour is useful, however, I need the extra functions and classes associated with SpatialSeries, and I'd rather not have to subclass DataFrame to create a special DataFrame that allows this. It looks like the culprit is here in frame.py at lines 1761-1772:
def _set_item(self, key, value):
"""
Add series to DataFrame in specified column.
If series is a numpy-array (not a Series/TimeSeries), it must be the
same length as the DataFrame's index or an error will be thrown.
Series/TimeSeries will be conformed to the DataFrame's index to
ensure homogeneity.
"""
value = self._sanitize_column(key, value)
NDFrame._set_item(self, key, value)
I particular, I'm looking at value = self._sanitize_column(key, value)
, which appears to use np.asarray(value)
before it returns the input array (even if the input column is a Series). Is there any way to avoid this behaviour? Or alternatively, a better way to implement this so that useful subclasses can be used within a DataFrame? I hope I'm not missing something simple/vital here?
FYI:
In [13]: pandas.__version__
Out[13]: '0.8.0b1'