Skip to content

Adding (Insert or update if key exists) option to .to_sql #14553

Closed
@cdagnino

Description

@cdagnino

Suppose you have an existing SQL table called person_age, where id is the primary key:

    age
id	
1	18
2	42

and you also have new data in a DataFrame called extra_data

	age
id	
2	44
3	95

then it would be useful to have an option on extra_data.to_sql() that allows to pass the DataFrame to SQL with an INSERT or UPDATE option on the rows, based on the primary key.

In this case, the id=2 row would get updated to age=44 and the id=3 row would get added

Expected Output

	age
id	
1	18
2	44
3	95

(Maybe) helpful code references

I looked at pandas sql.py sourcecode to come up with a solution, but I couldn't follow.

Code to replicate the example above

(Apologies for mixing sqlalchemy and sqlite

import pandas as pd
from sqlalchemy import create_engine
import sqlite3
conn = sqlite3.connect('example.db')
 
c = conn.cursor()
c.execute('''DROP TABLE IF EXISTS person_age;''')
c.execute('''
          CREATE TABLE person_age
          (id INTEGER PRIMARY KEY ASC, age INTEGER NOT NULL)
          ''')
conn.commit()
conn.close()

##### Create original table

engine = create_engine("sqlite:///example.db")
sql_df = pd.DataFrame({'id' : [1, 2], 'age' : [18, 42]})

sql_df.to_sql('person_age', engine, if_exists='append', index=False)


#### Extra data to insert/update

extra_data = pd.DataFrame({'id' : [2, 3], 'age' : [44, 95]})
extra_data.set_index('id', inplace=True)

#### extra_data.to_sql()  with row update or insert option

expected_df = pd.DataFrame({'id': [1, 2, 3], 'age': [18, 44, 95]})
expected_df.set_index('id', inplace=True)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions