Skip to content

Deprecate Series._from_array ? #19883

Closed
@jaumebonet

Description

@jaumebonet

I open this suggestion as per @jorisvandenbossche's recommendation.

This issue follows in the steps of #18213 and #19850.

As it is commented in #18213, _from_array has a single difference with the Series constructor, how it handles SparseArrays:

        # return a sparse series here
        if isinstance(arr, ABCSparseArray):
            from pandas.core.sparse.series import SparseSeries
            cls = SparseSeries

This process could be achieved in a similar way in Series.__new__; something on the lines of:

def __new__( cls, *args, **kwargs ):
    # arr is mandatory, first argument or key `arr`.
    if isinstance(kwargs.get('arr', args[0]), ABCSparseArray):
        from pandas.core.sparse.series import SparseSeries
        cls = SparseSeries
    obj = object.__new__(cls)
    obj.__init__(*args, **kwargs)
    return obj

What's the issue?

As @jorisvandenbossche pointed out, a change like this will result in a change of the API, as this:

>>> s = pd.Series(pd.SparseArray([1, 0, 0, 2, 0]))
>>> type(s)
<class 'pandas.core.series.Series'>

will become this:

>>> s = pd.Series.from_array(pd.SparseArray([1, 0, 0, 2, 0]))
>>> type(s)
<class 'pandas.core.sparse.series.SparseSeries'>

I'm not familiar with sparse data structures, but according to the docs all functionality is kept between Series and SparseSeries. Furthermore, a simple

>>> s = s.to_dense()
>>> type(s)
<class 'pandas.core.series.Series'>

should do it to go back to Series.

Why change it, then?

Currently, Series._from_array is called only inside two functions: DataFrame._idxand DataFrame. _box_col_values. With the proposed change, those calls could be substituted by the default constructor.
Being that the case, when working with panda's subclassing, one would be able to declare complex _constructor_slice such as this:

    @property
    def _constructor_sliced(self):
        def f(*args, **kwargs):
            # adapted from https://github.com/pandas-dev/pandas/issues/13208#issuecomment-326556232
            return DerivedSeries(*args, **kwargs).__finalize__(self, method='inherit')
        return f

, which would allow for a more complex relationship between the subclassed DataFrame and its sliced version, including the transfer of metadata according to the user's specification in __finalize__.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions