Closed
Description
Suppose you have an existing SQL table called person_age
, where id
is the primary key:
age
id
1 18
2 42
and you also have new data in a DataFrame
called extra_data
age
id
2 44
3 95
then it would be useful to have an option on extra_data.to_sql()
that allows to pass the DataFrame to SQL with an INSERT
or UPDATE
option on the rows, based on the primary key
.
In this case, the id=2
row would get updated to age=44
and the id=3
row would get added
Expected Output
age
id
1 18
2 44
3 95
(Maybe) helpful code references
- Use
merge
from SQLAlchemy? - The query: '''INSERT or REPLACE into person_age (id, age) values (?,?,?) ''' in this question
I looked at pandas
sql.py
sourcecode to come up with a solution, but I couldn't follow.
Code to replicate the example above
(Apologies for mixing sqlalchemy
and sqlite
import pandas as pd
from sqlalchemy import create_engine
import sqlite3
conn = sqlite3.connect('example.db')
c = conn.cursor()
c.execute('''DROP TABLE IF EXISTS person_age;''')
c.execute('''
CREATE TABLE person_age
(id INTEGER PRIMARY KEY ASC, age INTEGER NOT NULL)
''')
conn.commit()
conn.close()
##### Create original table
engine = create_engine("sqlite:///example.db")
sql_df = pd.DataFrame({'id' : [1, 2], 'age' : [18, 42]})
sql_df.to_sql('person_age', engine, if_exists='append', index=False)
#### Extra data to insert/update
extra_data = pd.DataFrame({'id' : [2, 3], 'age' : [44, 95]})
extra_data.set_index('id', inplace=True)
#### extra_data.to_sql() with row update or insert option
expected_df = pd.DataFrame({'id': [1, 2, 3], 'age': [18, 44, 95]})
expected_df.set_index('id', inplace=True)