Skip to content

Allow broadcasting vertically a 1-dim input to pd.DataFrame(), - and document #20837

Open
@toobaz

Description

@toobaz

Code Sample, a copy-pastable example if possible

In [2]: pd.DataFrame([1, 2], columns=range(3))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/home/nobackup/repo/pandas/pandas/core/internals.py in create_block_manager_from_blocks(blocks, axes)
   4844                 blocks = [make_block(values=blocks[0],
-> 4845                                      placement=slice(0, len(axes[0])))]
   4846 

/home/nobackup/repo/pandas/pandas/core/internals.py in make_block(values, placement, klass, ndim, dtype, fastpath)
   3192 
-> 3193     return klass(values, ndim=ndim, placement=placement)
   3194 

/home/nobackup/repo/pandas/pandas/core/internals.py in __init__(self, values, placement, ndim)
    124                 'Wrong number of items passed {val}, placement implies '
--> 125                 '{mgr}'.format(val=len(self.values), mgr=len(self.mgr_locs)))
    126 

ValueError: Wrong number of items passed 1, placement implies 3

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-2-4ad51ebcfae4> in <module>()
----> 1 pd.DataFrame([1, 2], columns=range(3))

/home/nobackup/repo/pandas/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
    403                 else:
    404                     mgr = self._init_ndarray(data, index, columns, dtype=dtype,
--> 405                                              copy=copy)
    406             else:
    407                 mgr = self._init_dict({}, index, columns, dtype=dtype)

/home/nobackup/repo/pandas/pandas/core/frame.py in _init_ndarray(self, values, index, columns, dtype, copy)
    536             values = maybe_infer_to_datetimelike(values)
    537 
--> 538         return create_block_manager_from_blocks([values], [columns, index])
    539 
    540     @property

/home/nobackup/repo/pandas/pandas/core/internals.py in create_block_manager_from_blocks(blocks, axes)
   4852         blocks = [getattr(b, 'values', b) for b in blocks]
   4853         tot_items = sum(b.shape[0] for b in blocks)
-> 4854         construction_error(tot_items, blocks[0].shape[1:], axes, e)
   4855 
   4856 

/home/nobackup/repo/pandas/pandas/core/internals.py in construction_error(tot_items, block_shape, axes, e)
   4829         raise ValueError("Empty data passed with indices specified.")
   4830     raise ValueError("Shape of passed values is {0}, indices imply {1}".format(
-> 4831         passed, implied))
   4832 
   4833 

ValueError: Shape of passed values is (1, 2), indices imply (3, 2)

Problem description

(From #18626 (comment) )

#18819 (now fixed) disabled a call such as pd.Series([1], index=range(3)) - the same result can be obtained with pd.Series(1, index=range(3), which is less ambiguous.

In principle, the same reasoning should lead us to disable pd.DataFrame([[1, 2]], index=range(3)). But that can't be replaced as comfortably, because pd.DataFrame([1, 2], index=range(3)) aligns vertically - and this couldn't be otherwise, as 1d objects are treated as Series, and Series in DataFrames are mainly columns, not rows. Moreover, this is probably quite used in existing code, and also in tests:

expected = DataFrame([self.frame.mean()], index=self.frame.index)

df0 = pd.DataFrame([[1, 2]], index=idx0)

df = DataFrame([[10, 11]], index=midx)

So I think the best way to proceed is:

  • allow 1d objects to be broadcasted horizontally (not just aligned vertically)
  • clearly document the above, and the fact that 2d objects of length 1 are broadcasted vertically instead

Expected Output

In [3]: pd.DataFrame([[1]*3, [2]*3], columns=range(3))
Out[3]: 
   0  1  2
0  1  1  1
1  2  2  2

Output of pd.show_versions()

In [3]: pd.show_versions()

INSTALLED VERSIONS

commit: 7ec74e5
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.0-6-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: it_IT.UTF-8
LOCALE: it_IT.UTF-8

pandas: 0.23.0.dev0+798.g7ec74e5f7
pytest: 3.5.0
pip: 9.0.1
setuptools: 39.0.1
Cython: 0.25.2
numpy: 1.14.1
scipy: 0.19.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.5.6
patsy: 0.5.0
dateutil: 2.7.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.0dev
tables: 3.3.0
numexpr: 2.6.1
feather: 0.3.1
matplotlib: 2.0.0
openpyxl: 2.3.0
xlrd: 1.0.0
xlwt: 1.3.0
xlsxwriter: 0.9.6
lxml: 4.1.1
bs4: 4.5.3
html5lib: 0.999999999
sqlalchemy: 1.0.15
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.2.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions