Skip to content

Appending Pandas dataframes in for loop results in ValueError #13524

Closed
@lvphj

Description

@lvphj

I recently posted this on StackOverflow. It seems to be a bug so I am posting here as well.

I want to generate a dataframe that is created by appended several separate dataframes generated in a for loop. Each individual dataframe consists of a name column, a range of integers and a column identifying a category to which the integer belongs (e.g. quintile 1 to 5). If I generate each dataframe individually and then append one to the other to create a 'master' dataframe then there are no problems. However, when I use a loop to create each individual dataframe then trying to append a dataframe to the master dataframe results in:

ValueError: incompatible categories in categorical concat

A work-around (suggested by jezrael) involved appending each dataframe to a list of dataframes and concatenating them using pd.concat.

I've written a simplified loop to illustrate:

Code Sample, a copy-pastable example if possible

import numpy as np
import pandas as pd

# Define column names
colNames = ('a','b','c')

# Define a dataframe with the required column names
masterDF = pd.DataFrame(columns = colNames)

# A list of the group names
names = ['Group1','Group2','Group3']

# Create a dataframe for each group
for i in names:
    tempDF = pd.DataFrame(columns = colNames)
    tempDF['a'] = np.arange(1,11,1)
    tempDF['b'] = i
    tempDF['c'] = pd.cut(np.arange(1,11,1),
                        bins = np.linspace(0,10,6),
                        labels = [1,2,3,4,5])
    print(tempDF)
    print('\n')

    # Try to append temporary DF to master DF
    masterDF = masterDF.append(tempDF,ignore_index=True)

print(masterDF)

Expected Output

     a       b  c
 0   1  Group1  1
 1   2  Group1  1
 2   3  Group1  2
 3   4  Group1  2
 4   5  Group1  3
 5   6  Group1  3
 6   7  Group1  4
 7   8  Group1  4
 8   9  Group1  5
 9  10  Group1  5
10  11  Group2  1
11  12  Group2  1
12  13  Group2  2
13  14  Group2  2
...
28  29  Group3  5
29  30  Group3  5

output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.4.1.final.0
python-bits: 64
OS: Darwin
OS-release: 15.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8

pandas: 0.18.1
nose: None
pip: 1.5.6
setuptools: 20.1.1
Cython: None
numpy: 1.11.0
scipy: 0.16.1
statsmodels: None
xarray: None
IPython: 4.1.1
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.0
openpyxl: 2.3.2
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: 0.7.4.None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions