Open
Description
I've found some odd behaviour when assigning to columns with a multiindex. I'm trying to use an array with a float32 dtype, but it's being converted to a float64 dtype under some circumstances. For large arrays this is accompanied by a signifcant slowdown.
>>> import sys; sys.version
sys.version
'3.6.3 (default, Oct 11 2017, 14:49:33) [GCC]'
>>> import pandas as pd
>>> pd.__version__
'0.21.0'
>>> import numpy as np
>>> A = pd.DataFrame(np.zeros((6,5),dtype=np.float32)); A = pd.concat([A,A],axis=1,keys=[1,2])
>>> A
1 2
0 1 2 3 4 0 1 2 3 4
0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
>>> A.loc[:,(1,1)] = np.ones((6,),dtype=np.float32) # index a single column - doesn't change dtypes
>>> (A.dtypes==np.float32).all()
True
>>> A.loc[:,(1,slice(2,3))] = np.ones((6,2),dtype=np.float32) # Index multiple columns - changes dtypes
>>> (A.dtypes==np.float32).all()
False
So indexing a single column keeps the dtype as float32 (as I would expect), but indexing multiple columns changes it to float64. The behaviour is also different if you write to part of a column (doesn't change) vs a whole column (does change):
>>> A = pd.DataFrame(np.zeros((6,5),dtype=np.float32)); A = pd.concat([A,A],axis=1,keys=[1,2])
>>> A.loc[2:3,(1,slice(2,3))] = np.ones((2,2),dtype=np.float32) # index a section of multiple columns - doesn’t change dtypes
>>> (A.dtypes==np.float32).all()
True
>>> A.loc[0:5,(1,slice(2,3))] = np.ones((6,2),dtype=np.float32) # but indexing a complete section does change dtypes
>>> (A.dtypes==np.float32).all()
False
If the multiindex is on axis 0 rather than axis 1 then it does not change the dtypes
>>> A = pd.DataFrame(np.zeros((6,5),dtype=np.float32)); A = pd.concat([A,A],axis=1,keys=[1,2])
>>> A = A.T
>>> A.loc[(1,slice(2,3)),:] = np.ones((6,2),dtype=np.float32).T # doesn’t change any dtypes
>>> (A.dtypes==np.float32).all()
True
This odd behaviour only applies to multiindexes:
>>> A = pd.DataFrame(np.zeros((6,5),dtype=np.float32))
>>> A.loc[:,2:3] = np.ones((6,2),dtype=np.float32) # does not change dtypes
>>> (A.dtypes==np.float32).all()
True
Finally it also applies to iloc as well as loc:
>>> A = pd.DataFrame(np.zeros((6,5),dtype=np.float32)); A = pd.concat([A,A],axis=1,keys=[1,2])
>>> A.iloc[:,2:4] = np.ones((6,2),dtype=np.float32) # changes dtypes
>>> (A.dtypes==np.float32).all()
False