Skip to content

Dataframe constructor misinterprets columns argument if nested list is passed in as the data parameter. #14467

Closed
@madphysicist

Description

@madphysicist

This issue is based on Stack Overflow question http://stackoverflow.com/q/40182072/2988730.

A small, complete example of the issue

df = pd.DataFrame([[1, 2, 3], [4, 5, 6]],
                  index=[['gibberish']*2, [0, 1]],
                  columns=[['baldersash']*3, [10, 20, 30]])

The result is

  File "<ipython-input-321-2695882ac68b>", line 3, in <module>
    columns=[['baldersash']*3, [10, 20, 30]])

  File "/home/jfoxrabi/miniconda3/lib/python3.5/site-packages/pandas/core/frame.py", line 263, in __init__
    arrays, columns = _to_arrays(data, columns, dtype=dtype)

  File "/home/jfoxrabi/miniconda3/lib/python3.5/site-packages/pandas/core/frame.py", line 5352, in _to_arrays
    dtype=dtype)

  File "/home/jfoxrabi/miniconda3/lib/python3.5/site-packages/pandas/core/frame.py", line 5431, in _list_to_arrays
    coerce_float=coerce_float)

  File "/home/jfoxrabi/miniconda3/lib/python3.5/site-packages/pandas/core/frame.py", line 5489, in _convert_object_array
    'columns' % (len(columns), len(content)))

AssertionError: 2 columns passed, passed data had 3 columns

Expected Output

            baldersash   
                    10 20 30
gibberish 0          1  2  3
          1          4  5  6

The surprising thing here is that any of the following seem to work just fine:

  1. Supplying a numpy array as data:

    df = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6]]),
                      index=[['gibberish']*2, [0, 1]],
                      columns=[['baldersash']*3, [10, 20, 30]])

    Results in

                baldersash      
                        10 20 30
    gibberish 0          1  2  3
              1          4  5  6
    
  2. Reducing the size of the input array to have two columns:

    df = pd.DataFrame([[1, 2], [3, 4]],
                      index=[['gibberish']*2, [0, 1]],
                      columns=[['baldersash']*2, [10, 20]])

    Results in

                baldersash   
                        10 20
    gibberish 0          1  2
              1          3  4
    
  3. Omitting the columns argument:

    df = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6]]),
                      index=[['gibberish']*2, [0, 1]])

    Results in

                 0  1  2
    gibberish 0  1  2  3
              1  4  5  6
    
  4. Using a single-level list for the columns argument:

    df = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6]]),
                      index=[['gibberish']*2, [0, 1]],
                      columns=[10, 20, 30])

    Results in

                 10  20  30
    gibberish 0   1   2   3
              1   4   5   6
    

Output of pd.show_versions()

## INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 2.6.32-431.29.2.el6.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.1
nose: 1.3.7
pip: 8.1.2
setuptools: 25.1.6
Cython: 0.24.1
numpy: 1.11.1
scipy: 0.18.0
statsmodels: None
xarray: None
IPython: 5.1.0
sphinx: 1.4.1
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.6.1
blosc: None
bottleneck: None
tables: None
numexpr: 2.6.1
matplotlib: 1.5.1
openpyxl: 2.3.5
xlrd: None
xlwt: None
xlsxwriter: 0.8.4
lxml: 3.5.0
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions