Skip to content

ENH: implement clip for sparse structures #9265

Closed
@wholmgren

Description

@wholmgren

Recognizing that this is probably not very high on the priority list, I'll document that a user tried to use clip on sparse structures and wished that it worked.

Some ideas for what to do in order of best to worst solution:

  1. Make it work in a memory and cpu efficient way.
  2. Raise a NotImplementedError (possibly suggesting converting to a dense structure).
  3. Make clip first convert to a dense structure, apply the limits, and return a sparse structure.

I am willing to submit PRs for 2 and 3, although I think that 3 is a bad idea. I need to understand the source better before I can consider volunteering to help with 1.

SparseSeries, SparseDataFrame, and SparsePanel all raise different exceptions.

pd.SparseSeries([0,1,np.nan,3,4]).clip(1,2)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-159-4c91e2de7ae6> in <module>()
----> 1 pd.SparseSeries([0,1,np.nan,3,4]).clip(1,2)

/home/will/.virtualenvs/dev3/lib/python3.4/site-packages/pandas/core/generic.py in clip(self, lower, upper, out)
   2805         result = self
   2806         if lower is not None:
-> 2807             result = result.clip_lower(lower)
   2808         if upper is not None:
   2809             result = result.clip_upper(upper)

/home/will/.virtualenvs/dev3/lib/python3.4/site-packages/pandas/core/generic.py in clip_lower(self, threshold)
   2843             raise ValueError("Cannot use an NA value as a clip threshold")
   2844 
-> 2845         return self.where((self >= threshold) | isnull(self), threshold)
   2846 
   2847     def groupby(self, by=None, axis=0, level=None, as_index=True, sort=True,

/home/will/.virtualenvs/dev3/lib/python3.4/site-packages/pandas/core/ops.py in wrapper(self, other)
    661             return self._constructor(na_op(self.values, other.values),
    662                                      index=self.index,
--> 663                                      name=name).fillna(False).astype(bool)
    664         elif isinstance(other, pd.DataFrame):
    665             return NotImplemented

/home/will/.virtualenvs/dev3/lib/python3.4/site-packages/pandas/core/ops.py in na_op(x, y)
    627     def na_op(x, y):
    628         try:
--> 629             result = op(x, y)
    630         except TypeError:
    631             if isinstance(y, list):

ValueError: operands could not be broadcast together with shapes (5,) (0,) 
pd.SparseDataFrame([[0,1,np.nan,3,4],[0,1,np.nan,3,4]]).clip(1,2)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-160-b399e57148ac> in <module>()
----> 1 pd.SparseDataFrame([[0,1,np.nan,3,4],[0,1,np.nan,3,4]]).clip(1,2)

/home/will/.virtualenvs/dev3/lib/python3.4/site-packages/pandas/core/generic.py in clip(self, lower, upper, out)
   2805         result = self
   2806         if lower is not None:
-> 2807             result = result.clip_lower(lower)
   2808         if upper is not None:
   2809             result = result.clip_upper(upper)

/home/will/.virtualenvs/dev3/lib/python3.4/site-packages/pandas/core/generic.py in clip_lower(self, threshold)
   2843             raise ValueError("Cannot use an NA value as a clip threshold")
   2844 
-> 2845         return self.where((self >= threshold) | isnull(self), threshold)
   2846 
   2847     def groupby(self, by=None, axis=0, level=None, as_index=True, sort=True,

/home/will/.virtualenvs/dev3/lib/python3.4/site-packages/pandas/core/ops.py in f(self, other)
    917             # straight boolean comparisions we want to allow all columns
    918             # (regardless of dtype to pass thru) See #4537 for discussion.
--> 919             res = self._combine_const(other, func, raise_on_error=False)
    920             return res.fillna(True).astype(bool)
    921 

TypeError: _combine_const() got an unexpected keyword argument 'raise_on_error'
pd.Panel(np.zeros((3,3,3))).to_sparse().clip(1,2)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-171-ec53f9e1dd24> in <module>()
----> 1 pd.Panel(np.zeros((3,3,3))).to_sparse().clip(1,2)

/home/will/.virtualenvs/dev3/lib/python3.4/site-packages/pandas/core/generic.py in clip(self, lower, upper, out)
   2805         result = self
   2806         if lower is not None:
-> 2807             result = result.clip_lower(lower)
   2808         if upper is not None:
   2809             result = result.clip_upper(upper)

/home/will/.virtualenvs/dev3/lib/python3.4/site-packages/pandas/core/generic.py in clip_lower(self, threshold)
   2843             raise ValueError("Cannot use an NA value as a clip threshold")
   2844 
-> 2845         return self.where((self >= threshold) | isnull(self), threshold)
   2846 
   2847     def groupby(self, by=None, axis=0, level=None, as_index=True, sort=True,

/home/will/.virtualenvs/dev3/lib/python3.4/site-packages/pandas/core/ops.py in f(self, other)
    960             raise ValueError('Simple arithmetic with %s can only be '
    961                              'done with scalar values' %
--> 962                              self._constructor.__name__)
    963 
    964         return self._combine(other, op)

ValueError: Simple arithmetic with SparsePanel can only be done with scalar values

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions