Description
Code Sample, a copy-pastable example if possible
import pandas as pd
import numpy as np
ary = np.array([ [1, 0, 0, 3],
[1, 0, 2, 0],
[0, 4, 0 ,0] ])
df = pd.DataFrame(ary)
df.columns = [1, 2, 3, 4]
dfs = pd.SparseDataFrame(df,
default_fill_value=0)
# DOES NOT WORK:
dfs.to_coo() # raises KeyError: 0
# WORKS (1)
dfs2 = dfs.copy()
dfs2.columns = [0, 1, 2, 3]
dfs2.to_coo()
# WORKS (2)
dfs3 = dfs.copy()
dfs3.columns = [str(i) for i in dfs3.columns]
dfs3.to_coo()
Problem description
In the example above, the Pandas SparseDataFrame method to_coo()
(and possibly others) cannot handle sparse dataframes if the column names are integer types and don't start at 0. If the column names start at 0 or are string types, this is not an issue.
Expected Output
<3x4 sparse matrix of type '<class 'numpy.int64'>'
with 5 stored elements in COOrdinate format>
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Darwin
OS-release: 18.2.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.23.4
pytest: None
pip: 18.1
setuptools: 40.2.0
Cython: 0.28.5
numpy: 1.15.4
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.5.0
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: 3.4.4
numexpr: 2.6.9
feather: None
matplotlib: 2.2.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None