Open
Description
Code to reproduce:
import pandas as pd
from scipy import sparse as sc
import numpy as np
np.random.seed(42)
vals = np.random.randint(0, 10, size=(1000, 1000))
keep = vals > 3
vals[keep] = 0
sparse_mtx = sc.coo_matrix(vals)
sparse_pd = pd.DataFrame.sparse.from_spmatrix(sparse_mtx)
num_tries = 30
t1 = timeit.timeit(lambda: sparse_pd.to_csv('sparse_pd.csv'), number=num_tries)
t2 = timeit.timeit(lambda: sparse_pd.sparse.to_dense().to_csv('sparse_pd.csv'), number=num_tries)
overhead = t1/t2
print(t1, t2, overhead)
Output:
56.591012510471046 3.7841985523700714 14.954556883657089
Versions:
- python == 3.9.2
- pandas == 1.2.4