Skip to content

Creating DataFrame throws: data type "bytes512" not understood #20734

Closed
@stephenmartindale

Description

@stephenmartindale

Code Sample, a copy-pastable example if possible

index = pd.Series(name='id', dtype='S24')
df = pd.DataFrame(index=index)
df['a'] = pd.Series(name='a', index=index, dtype=np.uint32)
df['b'] = pd.Series(name='b', index=index, dtype='S64')
df['c'] = pd.Series(name='c', index=index, dtype='S64')
df['d'] = pd.Series(name='d', index=index, dtype=np.uint8)

Problem description

The code, above, which is attempting to create an empty pandas.DataFrame with an index and four typed columns yields the following error:

[... snip ...]\appdata\local\programs\python\python36\lib\site-packages\pandas\core\internals.py in _vstack(to_stack, dtype)
   4912 
   4913     # work around NumPy 1.6 bug
-> 4914     if dtype == _NS_DTYPE or dtype == _TD_DTYPE:
   4915         new_values = np.vstack([x.view('i8') for x in to_stack])
   4916         return new_values.view(dtype)

TypeError: data type "bytes512" not understood

Why?

Changing the order of the columns works just fine:

index = pd.Series(name='id', dtype='S24')
df = pd.DataFrame(index=index)
df['a'] = pd.Series(name='a', index=index, dtype=np.uint32)
df['d'] = pd.Series(name='d', index=index, dtype=np.uint8)
df['b'] = pd.Series(name='b', index=index, dtype='S64')
df['c'] = pd.Series(name='c', index=index, dtype='S64')

In fact, it seems that any Series added after the two S64 series throws an error: I tried with both np.float and np.bool.

Expected Output

I would expect that it isn't important which order the Series are added or, if it actually is important, perhaps a better error message.

I tried with an older version of Python 3.6, NumPy and Pandas and then updated, thinking this was just a bug. The latest version I tested was CPython 3.6.5, NumPy 1.14.2, Pandas 0.22.0.

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]
INSTALLED VERSIONS

commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 26 Stepping 5, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.22.0
pytest: None
pip: 9.0.3
setuptools: 39.0.1
Cython: None
numpy: 1.14.2
scipy: 1.0.1
pyarrow: None
xarray: None
IPython: 6.3.1
sphinx: None
patsy: None
dateutil: 2.7.2
pytz: 2018.4
blosc: None
bottleneck: None
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.2.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions