Closed
Description
xref #14937 (comment)
a number of indexing / conversion issues arise because we are treating uint
as a direct int
, rather than a sub-class. (e.g. if we make UIntBlock a sub-class of IntBlock), I think can easily handle some small overrides to, for instance check for negative values when indexing.
In [1]: df = pd.DataFrame({'A' : np.array([1,2,3],dtype='uint64'), 'B': range(3)})
In [2]: df
Out[2]:
A B
0 1 0
1 2 1
2 3 2
In [4]: df.dtypes
Out[4]:
A uint64
B int64
dtype: object
Buggy
In [5]: df.iloc[1] = -1
In [6]: df
Out[6]:
A B
0 1 0
1 18446744073709551615 -1
2 3 2
In [7]: df.iloc[1] = np.nan
In [8]: df
Out[8]:
A B
0 1.0 0.0
1 NaN NaN
2 3.0 2.0
This is correct
In [9]: df.A.astype('uint64')
---------------------------------------------------------------------------
ValueError: Cannot convert non-finite values (NA or inf) to integer
However, this is not
In [10]: df.iloc[1] = -1
In [11]: df
Out[11]:
A B
0 1.0 0.0
1 -1.0 -1.0
2 3.0 2.0
In [12]: df.dtypes
Out[12]:
A float64
B float64
dtype: object
In [13]: df.A.astype('uint64')
Out[13]:
0 1
1 18446744073709551615
2 3
Name: A, dtype: uint64
Construction with invalid values
In [1]: Series([-1], dtype='uint64')
Out[1]:
0 18446744073709551615
dtype: uint64