Skip to content

.concat crashes Python #16111

Closed
Closed
@topper-123

Description

@topper-123

pandas.concat crashes the python interpreter

The program snippet below crashes the python interpreter. I have run the snippet with:

  • Pandas 0.19.2,
  • Python 3.6.1 and Python 3.5.2,
  • Windows 10 and Windows 8.1
  • in a straigth python interpreter and in the ipython program

all with the same result (sometimes I have run it twice and it crashes the next time, don't know why).

import pandas as pd

categories = ['Afghanistan', 'Albania', 'Algeria', 'Andorra', 'Angola', 'Antigua & Deps', 'Argentina', 'Armenia', 'Australia', 'Austria', 'Azerbaijan', 'Bahamas', 'Bahrain', 'Bangladesh', 'Barbados', 'Belarus', 'Belgium', 'Benin', 'Bhutan', 'Bolivia', 'Bosnia Herzegovina', 'Botswana', 'Brazil', 'Brunei', 'Bulgaria', 'Burkina', 'Cambodia', 'Cameroon', 'Canada', 'Chile', 'China', 'Colombia', 'Congo {Democratic Rep}', 'Costa Rica', 'Croatia', 'Cuba', 'Cyprus', 'Czech Republic', 'Denmark', 'Djibouti', 'Dominican Republic', 'Ecuador', 'Egypt', 'El Salvador', 'Estonia', 'Ethiopia', 'Finland', 'France', 'Gabon', 'Georgia', 'Germany', 'Ghana', 'Greece', 'Guatemala', 'Haiti', 'Honduras', 'Hungary', 'Iceland', 'India', 'Indonesia', 'Iran', 'Iraq', 'Ireland {Republic}', 'Israel', 'Italy', 'Jamaica', 'Japan', 'Jordan', 'Kazakhstan', 'Kenya', 'Korea North', 'Korea South', 'Kosovo', 'Kuwait', 'Kyrgyzstan', 'Laos', 'Latvia', 'Lebanon', 'Liechtenstein', 'Lithuania', 'Luxembourg', 'Macedonia', 'Madagascar', 'Malaysia', 'Maldives', 'Malta', 'Mauritania', 'Mauritius', 'Mexico', 'Moldova', 'Mongolia', 'Montenegro', 'Morocco', 'Mozambique', 'Myanmar, {Burma}', 'Namibia', 'Nepal', 'Netherlands', 'New Zealand', 'Nicaragua', 'Nigeria', 'Norway', 'Oman', 'Other', 'Pakistan', 'Panama', 'Paraguay', 'Peru', 'Philippines', 'Poland', 'Portugal', 'Qatar', 'Romania', 'Russian Federation', 'Rwanda', 'San Marino', 'Saudi Arabia', 'Senegal', 'Serbia', 'Sierra Leone', 'Singapore', 'Slovakia', 'Slovenia', 'Solomon Islands', 'Somalia', 'South Africa', 'South Sudan', 'Spain', 'Sri Lanka', 'Sudan', 'Swaziland', 'Sweden', 'Switzerland', 'Syria', 'Taiwan', 'Tajikistan', 'Tanzania', 'Thailand', 'Togo', 'Trinidad & Tobago', 'Tunisia', 'Turkey', 'Turkmenistan', 'Uganda', 'Ukraine', 'United Arab Emirates', 'United Kingdom', 'United States', 'Uruguay', 'Uzbekistan', 'Vanuatu', 'Vatican City', 'Venezuela', 'Vietnam', 'Zambia', 'Zimbabwe', "Didn't answer"]

series0_values = [9, 1, 8, 18, 8, 85, 49, 3, 1, 2, 14, 1, 7, 40, 2, 5, 64, 1, 15, 5, 1, 116, 7, 43, 8, 6, 13, 2, 2, 32, 35, 2, 1, 23, 1, 5, 1, 43, 112, 1, 9, 319, 1, 25, 3, 2, 30, 4, 455, 23, 60, 1, 37, 37, 106, 1, 13, 6, 5, 5, 1, 32, 1, 10, 8, 14, 5, 1, 11, 1, 6, 1, 39, 4, 1, 4, 2, 7, 124, 30, 1, 4, 28, 1, 31, 2, 20, 130, 39, 29, 97, 7, 17, 1, 17, 11, 15, 23, 1, 146, 11, 1, 75, 55, 4, 9, 10, 1, 1, 4, 54, 1, 3, 34, 3, 299, 587, 7, 1, 10, 17, 2, 65]
series0_index = ['Afghanistan', 'Albania', 'Algeria', 'Argentina', 'Armenia', 'Australia', 'Austria', 'Azerbaijan', 'Bahamas', 'Bahrain', 'Bangladesh', 'Barbados', 'Belarus', 'Belgium', 'Bolivia', 'Bosnia Herzegovina', 'Brazil', 'Brunei', 'Bulgaria', 'Cambodia', 'Cameroon', 'Canada', 'Chile', 'China', 'Colombia', 'Costa Rica', 'Croatia', 'Cuba', 'Cyprus', 'Czech Republic', 'Denmark', 'Dominican Republic', 'Ecuador', 'Egypt', 'El Salvador', 'Estonia', 'Ethiopia', 'Finland', 'France', 'Gabon', 'Georgia', 'Germany', 'Ghana', 'Greece', 'Guatemala', 'Honduras', 'Hungary', 'Iceland', 'India', 'Indonesia', 'Iran', 'Iraq', 'Ireland {Republic}', 'Israel', 'Italy', 'Jamaica', 'Japan', 'Jordan', 'Kazakhstan', 'Kenya', 'Korea North', 'Korea South', 'Kyrgyzstan', 'Latvia', 'Lebanon', 'Lithuania', 'Macedonia', 'Madagascar', 'Malaysia', 'Maldives', 'Malta', 'Mauritius', 'Mexico', 'Moldova', 'Mongolia', 'Morocco', 'Myanmar, {Burma}', 'Nepal', 'Netherlands', 'New Zealand', 'Nicaragua', 'Nigeria', 'Norway', 'Oman', 'Pakistan', 'Peru', 'Philippines', 'Poland', 'Portugal', 'Romania', 'Russian Federation', 'Saudi Arabia', 'Serbia', 'Sierra Leone', 'Singapore', 'Slovakia', 'Slovenia', 'South Africa', 'South Sudan', 'Spain', 'Sri Lanka', 'Sudan', 'Sweden', 'Switzerland', 'Syria', 'Taiwan', 'Thailand', 'Togo', 'Trinidad & Tobago', 'Tunisia', 'Turkey', 'Turkmenistan', 'Uganda', 'Ukraine', 'United Arab Emirates', 'United Kingdom', 'United States', 'Uruguay', 'Uzbekistan', 'Venezuela', 'Vietnam', 'Zimbabwe', "Didn't answer"]

i0 = pd.CategoricalIndex(series0_index, ordered=True, categories=categories)
s0 = pd.Series(series0_values, index=i0)

series1_values = [4, 18, 16, 65, 33, 2, 8, 1, 12, 35, 3, 3, 46, 6, 1, 89, 4, 17, 8, 6, 3, 2, 1, 22, 30, 3, 13, 1, 9, 1, 35, 109, 7, 197, 1, 14, 2, 17, 233, 9, 28, 16, 36, 56, 5, 1, 3, 4, 6, 14, 3, 8, 3, 3, 3, 14, 2, 1, 3, 1, 3, 5, 86, 11, 1, 3, 20, 12, 2, 1, 1, 7, 93, 20, 18, 61, 3, 14, 1, 5, 9, 5, 1, 19, 80, 4, 1, 75, 29, 2, 6, 11, 1, 2, 34, 45, 6, 281, 580, 6, 1, 5, 10, 1, 36]
series1_index = ['Afghanistan', 'Argentina', 'Armenia', 'Australia', 'Austria', 'Bahrain', 'Bangladesh', 'Barbados', 'Belarus', 'Belgium', 'Bolivia', 'Bosnia Herzegovina', 'Brazil', 'Bulgaria', 'Cambodia', 'Canada', 'Chile', 'China', 'Colombia', 'Costa Rica', 'Croatia', 'Cuba', 'Cyprus', 'Czech Republic', 'Denmark', 'Ecuador', 'Egypt', 'El Salvador', 'Estonia', 'Ethiopia', 'Finland', 'France', 'Georgia', 'Germany', 'Ghana', 'Greece', 'Guatemala', 'Hungary', 'India', 'Indonesia', 'Iran', 'Ireland {Republic}', 'Israel', 'Italy', 'Japan', 'Jordan', 'Kazakhstan', 'Kenya', 'Korea South', 'Latvia', 'Lebanon', 'Lithuania', 'Macedonia', 'Malaysia', 'Malta', 'Mexico', 'Moldova', 'Mongolia', 'Morocco', 'Mozambique', 'Myanmar, {Burma}', 'Nepal', 'Netherlands', 'New Zealand', 'Nicaragua', 'Nigeria', 'Norway', 'Pakistan', 'Panama', 'Paraguay', 'Peru', 'Philippines', 'Poland', 'Portugal', 'Romania', 'Russian Federation', 'Saudi Arabia', 'Serbia', 'Sierra Leone', 'Singapore', 'Slovakia', 'Slovenia', 'Solomon Islands', 'South Africa', 'Spain', 'Sri Lanka', 'Swaziland', 'Sweden', 'Switzerland', 'Syria', 'Taiwan', 'Thailand', 'Trinidad & Tobago', 'Tunisia', 'Turkey', 'Ukraine', 'United Arab Emirates', 'United Kingdom', 'United States', 'Uruguay', 'Uzbekistan', 'Venezuela', 'Vietnam', 'Zimbabwe', "Didn't answer"]

i1 = pd.CategoricalIndex(series1_index, ordered=True, categories=categories)
s1 = pd.Series(series1_values, index=i1)

series2_values = [7, 1, 6, 2, 49, 26, 1, 1, 1, 25, 3, 18, 9, 63, 1, 2, 5, 1, 3, 2, 17, 23, 4, 3, 15, 55, 2, 2, 172, 7, 1, 17, 1, 41, 4, 6, 13, 8, 61, 2, 4, 1, 4, 2, 3, 1, 3, 2, 9, 1, 1, 63, 13, 16, 3, 1, 4, 59, 11, 3, 22, 2, 5, 1, 2, 2, 7, 8, 50, 2, 39, 35, 2, 1, 14, 6, 1, 191, 312, 5, 1, 1, 1, 32]
series2_index = ['Afghanistan', 'Angola', 'Argentina', 'Armenia', 'Australia', 'Austria', 'Azerbaijan', 'Bangladesh', 'Belarus', 'Belgium', 'Bosnia Herzegovina', 'Brazil', 'Bulgaria', 'Canada', 'Chile', 'China', 'Colombia', 'Costa Rica', 'Croatia', 'Cyprus', 'Czech Republic', 'Denmark', 'Egypt', 'Estonia', 'Finland', 'France', 'Gabon', 'Georgia', 'Germany', 'Greece', 'Honduras', 'Hungary', 'Iceland', 'India', 'Indonesia', 'Iran', 'Ireland {Republic}', 'Israel', 'Italy', 'Kazakhstan', 'Korea South', 'Laos', 'Latvia', 'Lebanon', 'Lithuania', 'Luxembourg', 'Malaysia', 'Malta', 'Mexico', 'Mongolia', 'Nepal', 'Netherlands', 'New Zealand', 'Norway', 'Pakistan', 'Paraguay', 'Philippines', 'Poland', 'Portugal', 'Romania', 'Russian Federation', 'Saudi Arabia', 'Serbia', 'Sierra Leone', 'Singapore', 'Slovakia', 'Slovenia', 'South Africa', 'Spain', 'Sri Lanka', 'Sweden', 'Switzerland', 'Taiwan', 'Thailand', 'Turkey', 'Ukraine', 'United Arab Emirates', 'United Kingdom', 'United States', 'Uruguay', 'Venezuela', 'Vietnam', 'Zimbabwe', "Didn't answer"]

i2 = pd.CategoricalIndex(series2_index, ordered=True, categories=categories)
s2 = pd.Series(series2_values, index=i2)

print("Before")
x = pd.concat([s0, s1, s2], axis=1)  # crash!
print("After")

pd.concat([s0, s1, s2], axis=1) crashes the interpreter for me. If only one or two series are concatenated, no crash occurs

Problem description

The interpreter just exits with a message in a message box (In danish, but translated to An error caused Python to exit) and the program shuts down. There is no traceback or other information about the cause (that I know how to get at - If I'm instructed to how to get it, I may be able to fetch it).

Expected Output

pd.concat should return a three-column DataFrame. Instead the interpreter crashes.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.19.2
nose: None
pip: 9.0.1
setuptools: 35.0.1
Cython: None
numpy: 1.11.2
scipy: 0.18.1
statsmodels: None
xarray: None
IPython: 6.0.0
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 2.0.0
openpyxl: 2.4.5
xlrd: None
xlwt: None
xlsxwriter: None
lxml: 3.7.3
bs4: 4.5.3
html5lib: 0.999999999
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
boto: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions