Closed
Description
After #19664 to_sql()
performance is highly dependent on the number of columns.
Problem description
For a single column 0.23 is faster than 0.22. Adding more columns has little influence on 0.22 but quickly degrades the performance on 0.23. I.e. for 20 columns 0.23 takes 1.5 times longer than 0.22
Performance degradation happens regardless of parameter chunksize
adjustment.
import time
import numpy as np
import pandas as pd
from sqlalchemy import create_engine
start = time.time()
N_COLS = 20
df = pd.DataFrame({n: np.arange(0,20_000,1) for n in range(N_COLS)})
#create the engine to connect pandas with sqlite3
engine = create_engine('postgresql://user:@localhost/db')
#create connection
conn = engine.connect()
# convert df to sql table
df.to_sql('test',engine, if_exists='replace',chunksize=1_000)
result = conn.execute("select * from test")
conn.close()
print('WRITE: {}'.format(time.time() - start))
Tested using linux, postgres 9.6, python 3.6, SQLAlchemy 1.2.2