IntervalDtype inconsistencies and bugs

**1. Inconsistent comparisons versus string `'interval'`:**
```python
In [2]: IntervalDtype() == 'interval'
Out[2]: True

In [3]: IntervalDtype('interval') == 'interval'
Out[3]: False

In [4]: IntervalDtype('int64') == 'interval'
Out[4]: False
```
I'd expect all of these to return `True`, like how `CategoricalDtype(*, *) == 'category'` always returns `True`.

<br />

**2. Inconsistent comparisons versus `IntervalDtype(None)`:**

```python
In [5]: IntervalDtype(None) == IntervalDtype('interval')
Out[5]: False

In [6]: IntervalDtype(None) == IntervalDtype('int64')
Out[6]: False
```
I'd expect all of these to return `True`, like how `CDT(None, None) == CDT(*, *)` always returns `True`.

<br />

**3. `IntervalDtype.name` attribute changes**

```python
In [7]: IntervalDtype().name
Out[7]: 'interval'

In [8]: IntervalDtype('interval').name
Out[8]: 'interval[]'

In [9]: IntervalDtype('int64').name
Out[9]: 'interval[int64]'
```
`CategoricalDtype.name`  attribute is always the same:

```python
In [10]: CategoricalDtype(list('abc'), True).name
Out[10]: 'category'

In [11]: CategoricalDtype(list('wxyz'), False).name
Out[11]: 'category'
```
I'd expect `IntervalDtype.name` to always return `'interval'`, like how `CDT.name` always returns `'category'`.  This makes the code for checking equality against strings (i.e. what I described in 1) simpler.  I don't think the behavior of `str(IntervalDtype)` should change, which is currently the same as `IntervalDtype.name`, so I'd still have that return strings specifying the subtype.

<br />

**4. ~`CategoricalDtype` gets cached incorrectly:~** (No longer an issue due to #19022)
```python
In [12]: idt1 = IntervalDtype(CategoricalDtype(list('abc'), True))

In [13]: idt2 = IntervalDtype(CategoricalDtype(list('wxyz'), False))

In [14]: idt2.subtype
Out[14]: CategoricalDtype(categories=['a', 'b', 'c'], ordered=True)
```
This looks to be caused by the caching being done by string representation, and `str(CDT(*, *))` always returns `'category'`:

https://github.com/pandas-dev/pandas/blob/e1d5a2738235fec22f3cfad4814e09e3e3786f8c/pandas/core/dtypes/dtypes.py#L673-L679

Can caching be removed entirely for `IntervalDtype`, or is there some need/advantage that I'm not seeing?  Looking at the other dtypes, `CategoricalDtype` appears to have had the caching code removed, but `PeriodDtype` and `DatetimeTZDtype` are using it.
  

	try:
	return cls._cache[str(subtype)]
	except KeyError:
	u = object.__new__(cls)
	u.subtype = subtype
	cls._cache[str(subtype)] = u
	return u

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IntervalDtype inconsistencies and bugs #18980

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

IntervalDtype inconsistencies and bugs #18980

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions