Skip to content

DOC: Add docs for read_sql to avoid sql injection #56546

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jan 17, 2024
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions pandas/io/sql.py
Original file line number Diff line number Diff line change
Expand Up @@ -644,6 +644,37 @@ def read_sql(
read_sql_table : Read SQL database table into a DataFrame.
read_sql_query : Read SQL query into a DataFrame.

Notes
-----
Using string interpolation (e.g. ``f-strings``, ``%-formatting``,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking something along the lines of

pandas does not attempt to sanitize SQL statements; instead it simply forwards the statement you are executing to the underlying driver, which may or may not sanitize from there. Please refer to the underlying driver documentation for any details. Generally, be wary when accepting statements from arbitrary sources

Is all that we need to say

``str.format()``, etc.) in a SQL query may cause SQL injection.
For example, the code below will insert unexpected data into ``test_data`` table.

>>> from sqlite3 import connect
>>> from sqlalchemy import create_engine
>>> engine = create_engine('postgresql:///test_db')
>>> conn = engine.connect()

>>> df = pd.DataFrame(data=[[0, '10/11/12'], [1, '12/11/10']],
... columns=['int_column', 'date_column'])
>>> df.to_sql(name='test_data', con=conn)
2

>>> # DON'T DO THIS
>>> query_int = "1; INSERT INTO test_data VALUES (2, 2, '09/11/12') RETURNING *;"
>>> pd.read_sql(f'SELECT * FROM test_data WHERE int_column={query_int}', conn)
index int_column date_column
0 2 2 09/11/12
>>> conn.commit()

Instead, use the ``params`` argument:

>>> from sqlalchemy import text
>>> sql = text('SELECT * FROM test_data WHERE int_column=:int_val')
>>> pd.read_sql(sql, conn, params={'int_val': 1})
index int_column date_column
0 1 1 12/11/10

Examples
--------
Read data from SQL via either a SQL query or a SQL tablename.
Expand Down