Description
I open this suggestion as per @jorisvandenbossche's recommendation.
This issue follows in the steps of #18213 and #19850.
As it is commented in #18213, _from_array
has a single difference with the Series
constructor, how it handles SparseArray
s:
# return a sparse series here
if isinstance(arr, ABCSparseArray):
from pandas.core.sparse.series import SparseSeries
cls = SparseSeries
This process could be achieved in a similar way in Series.__new__
; something on the lines of:
def __new__( cls, *args, **kwargs ):
# arr is mandatory, first argument or key `arr`.
if isinstance(kwargs.get('arr', args[0]), ABCSparseArray):
from pandas.core.sparse.series import SparseSeries
cls = SparseSeries
obj = object.__new__(cls)
obj.__init__(*args, **kwargs)
return obj
What's the issue?
As @jorisvandenbossche pointed out, a change like this will result in a change of the API, as this:
>>> s = pd.Series(pd.SparseArray([1, 0, 0, 2, 0]))
>>> type(s)
<class 'pandas.core.series.Series'>
will become this:
>>> s = pd.Series.from_array(pd.SparseArray([1, 0, 0, 2, 0]))
>>> type(s)
<class 'pandas.core.sparse.series.SparseSeries'>
I'm not familiar with sparse data structures, but according to the docs all functionality is kept between Series
and SparseSeries
. Furthermore, a simple
>>> s = s.to_dense()
>>> type(s)
<class 'pandas.core.series.Series'>
should do it to go back to Series
.
Why change it, then?
Currently, Series._from_array
is called only inside two functions: DataFrame._idx
and DataFrame. _box_col_values
. With the proposed change, those calls could be substituted by the default constructor.
Being that the case, when working with panda
's subclassing, one would be able to declare complex _constructor_slice
such as this:
@property
def _constructor_sliced(self):
def f(*args, **kwargs):
# adapted from https://github.com/pandas-dev/pandas/issues/13208#issuecomment-326556232
return DerivedSeries(*args, **kwargs).__finalize__(self, method='inherit')
return f
, which would allow for a more complex relationship between the subclassed DataFrame
and its sliced version, including the transfer of metadata
according to the user's specification in __finalize__
.