Description
1. Inconsistent comparisons versus string 'interval'
:
In [2]: IntervalDtype() == 'interval'
Out[2]: True
In [3]: IntervalDtype('interval') == 'interval'
Out[3]: False
In [4]: IntervalDtype('int64') == 'interval'
Out[4]: False
I'd expect all of these to return True
, like how CategoricalDtype(*, *) == 'category'
always returns True
.
2. Inconsistent comparisons versus IntervalDtype(None)
:
In [5]: IntervalDtype(None) == IntervalDtype('interval')
Out[5]: False
In [6]: IntervalDtype(None) == IntervalDtype('int64')
Out[6]: False
I'd expect all of these to return True
, like how CDT(None, None) == CDT(*, *)
always returns True
.
3. IntervalDtype.name
attribute changes
In [7]: IntervalDtype().name
Out[7]: 'interval'
In [8]: IntervalDtype('interval').name
Out[8]: 'interval[]'
In [9]: IntervalDtype('int64').name
Out[9]: 'interval[int64]'
CategoricalDtype.name
attribute is always the same:
In [10]: CategoricalDtype(list('abc'), True).name
Out[10]: 'category'
In [11]: CategoricalDtype(list('wxyz'), False).name
Out[11]: 'category'
I'd expect IntervalDtype.name
to always return 'interval'
, like how CDT.name
always returns 'category'
. This makes the code for checking equality against strings (i.e. what I described in 1) simpler. I don't think the behavior of str(IntervalDtype)
should change, which is currently the same as IntervalDtype.name
, so I'd still have that return strings specifying the subtype.
4. (No longer an issue due to #19022)CategoricalDtype
gets cached incorrectly:
In [12]: idt1 = IntervalDtype(CategoricalDtype(list('abc'), True))
In [13]: idt2 = IntervalDtype(CategoricalDtype(list('wxyz'), False))
In [14]: idt2.subtype
Out[14]: CategoricalDtype(categories=['a', 'b', 'c'], ordered=True)
This looks to be caused by the caching being done by string representation, and str(CDT(*, *))
always returns 'category'
:
pandas/pandas/core/dtypes/dtypes.py
Lines 673 to 679 in e1d5a27
Can caching be removed entirely for IntervalDtype
, or is there some need/advantage that I'm not seeing? Looking at the other dtypes, CategoricalDtype
appears to have had the caching code removed, but PeriodDtype
and DatetimeTZDtype
are using it.