Description
Code Sample, a copy-pastable example if possible
In [2]: pd.DataFrame([1, 2], columns=range(3))
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/home/nobackup/repo/pandas/pandas/core/internals.py in create_block_manager_from_blocks(blocks, axes)
4844 blocks = [make_block(values=blocks[0],
-> 4845 placement=slice(0, len(axes[0])))]
4846
/home/nobackup/repo/pandas/pandas/core/internals.py in make_block(values, placement, klass, ndim, dtype, fastpath)
3192
-> 3193 return klass(values, ndim=ndim, placement=placement)
3194
/home/nobackup/repo/pandas/pandas/core/internals.py in __init__(self, values, placement, ndim)
124 'Wrong number of items passed {val}, placement implies '
--> 125 '{mgr}'.format(val=len(self.values), mgr=len(self.mgr_locs)))
126
ValueError: Wrong number of items passed 1, placement implies 3
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<ipython-input-2-4ad51ebcfae4> in <module>()
----> 1 pd.DataFrame([1, 2], columns=range(3))
/home/nobackup/repo/pandas/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
403 else:
404 mgr = self._init_ndarray(data, index, columns, dtype=dtype,
--> 405 copy=copy)
406 else:
407 mgr = self._init_dict({}, index, columns, dtype=dtype)
/home/nobackup/repo/pandas/pandas/core/frame.py in _init_ndarray(self, values, index, columns, dtype, copy)
536 values = maybe_infer_to_datetimelike(values)
537
--> 538 return create_block_manager_from_blocks([values], [columns, index])
539
540 @property
/home/nobackup/repo/pandas/pandas/core/internals.py in create_block_manager_from_blocks(blocks, axes)
4852 blocks = [getattr(b, 'values', b) for b in blocks]
4853 tot_items = sum(b.shape[0] for b in blocks)
-> 4854 construction_error(tot_items, blocks[0].shape[1:], axes, e)
4855
4856
/home/nobackup/repo/pandas/pandas/core/internals.py in construction_error(tot_items, block_shape, axes, e)
4829 raise ValueError("Empty data passed with indices specified.")
4830 raise ValueError("Shape of passed values is {0}, indices imply {1}".format(
-> 4831 passed, implied))
4832
4833
ValueError: Shape of passed values is (1, 2), indices imply (3, 2)
Problem description
(From #18626 (comment) )
#18819 (now fixed) disabled a call such as pd.Series([1], index=range(3))
- the same result can be obtained with pd.Series(1, index=range(3)
, which is less ambiguous.
In principle, the same reasoning should lead us to disable pd.DataFrame([[1, 2]], index=range(3))
. But that can't be replaced as comfortably, because pd.DataFrame([1, 2], index=range(3))
aligns vertically - and this couldn't be otherwise, as 1d objects are treated as Series
, and Series
in DataFrames
are mainly columns, not rows. Moreover, this is probably quite used in existing code, and also in tests:
pandas/pandas/tests/frame/test_apply.py
Line 139 in 6cacdde
pandas/pandas/tests/indexes/test_multi.py
Line 3248 in 6cacdde
pandas/pandas/tests/reshape/test_reshape.py
Line 499 in 6cacdde
So I think the best way to proceed is:
- allow 1d objects to be broadcasted horizontally (not just aligned vertically)
- clearly document the above, and the fact that 2d objects of length 1 are broadcasted vertically instead
Expected Output
In [3]: pd.DataFrame([[1]*3, [2]*3], columns=range(3))
Out[3]:
0 1 2
0 1 1 1
1 2 2 2
Output of pd.show_versions()
In [3]: pd.show_versions()
INSTALLED VERSIONS
commit: 7ec74e5
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.0-6-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: it_IT.UTF-8
LOCALE: it_IT.UTF-8
pandas: 0.23.0.dev0+798.g7ec74e5f7
pytest: 3.5.0
pip: 9.0.1
setuptools: 39.0.1
Cython: 0.25.2
numpy: 1.14.1
scipy: 0.19.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.5.6
patsy: 0.5.0
dateutil: 2.7.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.0dev
tables: 3.3.0
numexpr: 2.6.1
feather: 0.3.1
matplotlib: 2.0.0
openpyxl: 2.3.0
xlrd: 1.0.0
xlwt: 1.3.0
xlsxwriter: 0.9.6
lxml: 4.1.1
bs4: 4.5.3
html5lib: 0.999999999
sqlalchemy: 1.0.15
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.2.1