Description
Code Sample, a copy-pastable example if possible
dfA = pd.DataFrame({'id':[1,2,3,4,5,6,7,8,9,10],'colA':[3,4,2,4,3,4,5,4,5,6],'colB':[7,6,5,6,5,7,8,7,6,7],'colC':[False,True,True,False,False,True,False,True,True,True]})
dfA['colC'] = dfA['colC'].astype('category',categories=[True,False],ordered=True)
dfB = pd.DataFrame({'id':[2,5,7,8],'colD':[1,9,7,3]})
print("Before\n====")
print('dfA dtypes\n------')
print(dfA.dtypes)
print('\ndfA\n---')
print(dfA)
print('\ndfB\n---')
print(dfB)
dfA = pd.merge(left=dfA,right=dfB,how='left',on='id')
print("\nAfter\n=====")
print(dfA)
Problem description
This problem was asked on StackOverflow at https://stackoverflow.com/questions/45538092/merging-pandas-dataframes-containing-a-categorical-variable-fails-with-valueerr where it was suggested that it was a bug.
Two dataframes containing different columns can be combined using the pandas.merge() method. This works well but in the above example, converting one of the columns in the dataframe to a categorical variable causes the method to fail with error:
/Users/.../env3/lib/python3.4/site-packages/pandas/core/internals.py in __init__(self, values, placement, ndim, fastpath)
104 ndim = values.ndim
105 elif values.ndim != ndim:
--> 106 raise ValueError('Wrong number of dimensions')
107 self.ndim = ndim
108
ValueError: Wrong number of dimensions
Using df.ndim() indicates that both dataframes have 2 dimensions.
Expected Output
The expected output can be generated simply by commenting out the second line in the above code, the line that converts one of the columns to a categorical variable.
colA colB colC id colD
0 3 7 False 1 NaN
1 4 6 True 2 1.0
2 2 5 True 3 NaN
3 4 6 False 4 NaN
4 3 5 False 5 9.0
5 4 7 True 6 NaN
6 5 8 False 7 7.0
7 4 7 True 8 3.0
8 5 6 True 9 NaN
9 6 7 True 10 NaN
Output of pd.show_versions()
[paste the output of pd.show_versions()
here below this line]
INSTALLED VERSIONS
commit: None
python: 3.4.1.final.0
python-bits: 64
OS: Darwin
OS-release: 15.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: en_GB.UTF-8
pandas: 0.20.1
pytest: None
pip: 9.0.1
setuptools: 34.1.0
Cython: None
numpy: 1.12.1
scipy: 0.16.1
xarray: None
IPython: 4.1.1
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 1.5.3
openpyxl: 2.4.7
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: 0.7.11.None
psycopg2: None
jinja2: 2.8
s3fs: None
pandas_gbq: None
pandas_datareader: None