Description
Extract of discussion from #16823
Code Sample, a copy-pastable example if possible
In [2]: df = pd.DataFrame()
In [3]: df['dummy'] = 1
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-3-3122daad6dab> in <module>()
----> 1 df['dummy'] = 1
/home/nobackup/repo/pandas/pandas/core/frame.py in __setitem__(self, key, value)
2515 else:
2516 # set column
-> 2517 self._set_item(key, value)
2518
2519 def _setitem_slice(self, key, value):
/home/nobackup/repo/pandas/pandas/core/frame.py in _set_item(self, key, value)
2584 """
2585
-> 2586 self._ensure_valid_index(value)
2587 value = self._sanitize_column(key, value)
2588 NDFrame._set_item(self, key, value)
/home/nobackup/repo/pandas/pandas/core/frame.py in _ensure_valid_index(self, value)
2561 if not is_list_like(value):
2562 # GH16823, Raise an error due to loss of information
-> 2563 raise ValueError('If using all scalar values, you must pass'
2564 ' an index')
2565 try:
ValueError: If using all scalar values, you must pass an index
Problem description
Previously, the above would just add a new (obviously empty) column.
@jreback objects that if this is allowed, then we should also allow initialization with only scalars (as in pd.Dataframe({'a' : 1, 'b' : 2})
I'm not 100% sure of what @jorisvandenbossche suggests, but he agrees with me that the current state is inconsistent.
My view is that previously things were just (almost) fine:
- at initialization, a
DataFrame
needs to have an index. You can avoid providing one expliclty only if it can be automatically built for the values you pass (i.e. 1-dimensional objects of the same length, or a single 2-dimensional block of data). Scalars clearly do not satisfy this requirement, so the constructor will raise if passed only scalars (butpd.DataFrame({'A' : range(3), 'B' : 23})
works, which is cool). - at assignment, there is already an index, and in particular, when assigning a(n entire) column you know you'll never alter the index. More specifically, when you assign a scalar to a column, you know it will alter all existing rows, which means "none" if the index is empty. And if the column does not exist, it will just be added, clearly empty as well.
In both cases, scalars/empty indexes represent no exception to the general behavior.
For consistency, we might want to fix the following too:
In [2]: pd.DataFrame().loc[1] = 0
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-2-20bba0809d0b> in <module>()
----> 1 pd.DataFrame().loc[1] = 0
/home/nobackup/repo/pandas/pandas/core/indexing.py in __setitem__(self, key, value)
192 key = com._apply_if_callable(key, self.obj)
193 indexer = self._get_setitem_indexer(key)
--> 194 self._setitem_with_indexer(indexer, value)
195
196 def _has_valid_type(self, k, axis):
/home/nobackup/repo/pandas/pandas/core/indexing.py in _setitem_with_indexer(self, indexer, value)
421 # no columns and scalar
422 if not len(self.obj.columns):
--> 423 raise ValueError("cannot set a frame with no defined "
424 "columns")
425
ValueError: cannot set a frame with no defined columns
but I will detail this in a separate issue.
Expected Output
None, but a new column "dummy"
is added to df
.
Output of pd.show_versions()
INSTALLED VERSIONS
commit: 5bf7f9a
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.0-3-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: it_IT.UTF-8
LOCALE: it_IT.UTF-8
pandas: 0.21.0rc1+18.g5bf7f9a4f
pytest: 3.0.6
pip: 9.0.1
setuptools: None
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
pyarrow: None
xarray: None
IPython: 5.1.0.dev
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.3.0
numexpr: 2.6.1
feather: 0.3.1
matplotlib: 2.0.0
openpyxl: None
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.6
lxml: None
bs4: 4.5.3
html5lib: 0.999999999
sqlalchemy: 1.0.15
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.2.1