Description
leftover from #23623
-
Signature for
.to_numpy()
: @jorisvandenbossche proposedcopy=True
, which I think is good. Beyond that, we may want to control the "fidelity" of the conversion. ShouldSeries[datetime64[ns, tz]].to_numpy()
be an ndarray of Timestamp objets or an ndarray of dateimte64[ns] normalized to UTC (by default, and should we allow that to be controlled)? Can we hope for a set of keywords appropriate for all subtypes, or do we need to allowkwargs
? Perhapsto_numpy(copy=True, dtype=None)
will suffice? -
Make
.array
always an ExtensionArray (via @shoyer). This gives pandas a bit more freedom going forward, since the type of.array
will be stable if / when we flip over to Arrow arrays by default. We'll just swap out the data backing the ExtensionArray. A generic "NumpyBackedExtensionArray" is pretty easy to write (I had one in cyberpandas). My main concern here is that it makes the statement ".array
is the actual data stored in the Series / Index" falseish, but that's OK. -
Revert the breaking changes to
Series.values
forperiod
andinterval
dtype data (cc @jschendel)? I think we should do this.
In [3]: sper = pd.Series(pd.period_range('2000', periods=4))
In [4]: sper.values # on master this is the PeriodArray
Out[4]:
array([Period('2000-01-01', 'D'), Period('2000-01-02', 'D'),
Period('2000-01-03', 'D'), Period('2000-01-04', 'D')], dtype=object)
In [5]: sper.array
Out[5]:
<PeriodArray>
['2000-01-01', '2000-01-02', '2000-01-03', '2000-01-04']
Length: 4, dtype: period[D]
In terms of LOC, it's a simple change
@@ -1984,6 +1984,16 @@ class ExtensionBlock(NonConsolidatableMixIn, Block):
return blocks, mask
+class ObjectValuesExtensionBlock(ExtensionBlock):
+ """Block for Interval / Period data.
+
+ Only needed for backwards compatability to ensure that
+ Series[T].values is an ndarray of objects.
+ """
+ def external_values(self, dtype=None):
+ return self.values.astype(object)
+
+
class NumericBlock(Block):
__slots__ = ()
is_numeric = True
@@ -3004,6 +3014,8 @@ def get_block_type(values, dtype=None):
if is_categorical(values):
cls = CategoricalBlock
+ elif is_interval_dtype(dtype) or is_period_dtype(dtype):
+ cls = ObjectValuesExtensionBlock
There are a couple other places (like Series._ndarray_values
) that assume "extension dtype means .values
is an ExtensionArray", which I've surfaced on my DatetimeArray branch. We'll need to update those to use .array
anyway.
-
Series.to_numpy()
signature -
Series.array
is always an EA - Revert breaking changes to
Series.values
for Period / Interval (API: Revert breaking.values
changes #24163)