Skip to content

DOC: Add docs for read_sql to avoid sql injection #56546

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jan 17, 2024

Conversation

erichxchen
Copy link
Contributor

Added examples and notes to the documents of pandas.read_sql (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql.html) as a warning to avoid SQL injection.

Copy link
Member

@lithomas1 lithomas1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, this looks correct at a first glance to me!

I left a small comment.

cc @WillAyd

@lithomas1 lithomas1 added this to the 2.2 milestone Dec 22, 2023
@WillAyd
Copy link
Member

WillAyd commented Dec 22, 2023

Does this really matter for any other drivers? I think it is the responsibility of the driver to properly escape strings

@erichxchen
Copy link
Contributor Author

Does this really matter for any other drivers? I think it is the responsibility of the driver to properly escape strings

Thanks @WillAyd. Yeah, I also think it is the responsibility of the driver to properly escape strings. When I test with other drivers, I do see that some of them will raise exceptions if there are multiple statements in one query.

But I think it would be nice to have a note/warning along with this API.

@erichxchen erichxchen requested a review from lithomas1 December 26, 2023 21:36
@WillAyd
Copy link
Member

WillAyd commented Dec 26, 2023

I think a note is fine but directing users to use params to solve this is not always the purpose of that argument, so I would remove any reference to that.

@erichxchen
Copy link
Contributor Author

I think a note is fine but directing users to use params to solve this is not always the purpose of that argument, so I would remove any reference to that.

Removed the reference directing users to use params

pandas/io/sql.py Outdated
@@ -644,6 +644,30 @@ def read_sql(
read_sql_table : Read SQL database table into a DataFrame.
read_sql_query : Read SQL query into a DataFrame.

Notes
-----
Using string interpolation (e.g. ``f-strings``, ``%-formatting``,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking something along the lines of

pandas does not attempt to sanitize SQL statements; instead it simply forwards the statement you are executing to the underlying driver, which may or may not sanitize from there. Please refer to the underlying driver documentation for any details. Generally, be wary when accepting statements from arbitrary sources

Is all that we need to say

@mroeschke mroeschke modified the milestones: 2.2, 2.3 Jan 8, 2024
@mroeschke mroeschke added Docs IO SQL to_sql, read_sql, read_sql_query labels Jan 8, 2024
@mroeschke mroeschke removed this from the 2.3 milestone Jan 8, 2024
@datapythonista
Copy link
Member

@WillAyd I do think we should be pointing users to use params. In the current version of this PR we are telling them what not to do, but we don't provide any solution. The right way to deal with user input in SQL is to use params. I don't think it's so useful to explain in the docs that regular string formatting is not great for SQL. I think we should simply mention that users should use params instead of string formatting, and show an example.

@WillAyd
Copy link
Member

WillAyd commented Jan 12, 2024

My hesitation in doing that is that we are making promises about params doing something that isn't true. I don't see anything in PEP 249 about params helping in this case and our code base does nothing to change that.

Is the promise of that avoiding injection made somewhere else? I am still under the impression that that would be driver-specific behavior

@WillAyd
Copy link
Member

WillAyd commented Jan 12, 2024

In the current version of this PR we are telling them what not to do, but we don't provide any solution

My latest feedback was that we should get rid of the DO NOT DO THIS comment and replace with a general warning around injection

@mroeschke
Copy link
Member

I agree with Will's comment by just adding #56546 (comment) and showing an example of using param. AFAICT in the docstring we don't have one.

Since pandas essentially uses execute + user supplied input instead of ORM methods, pandas can't defend against SQL injection.

@datapythonista
Copy link
Member

Makes sense. I guess pandas users are still expected to use params. Fully agree that we can't make promises on how that will protect from SQL injection. If what you both are proposing is to make clearer that for user input params should be used, and then warn about possible SQL injection and recommend users to check the driver documentation for more information, and see if they need to sanitize the output themselves, I fully agree on that.

@erichxchen
Copy link
Contributor Author

Even though I still think it would be useful to add some examples to tell users what kind of actions are not safe, and suggest them to use params. But yes, we cannot make a promise for this.

I changed the docs to the general warning mentioned in #56546 (comment)

@datapythonista
Copy link
Member

Thank you for the changes and adding the warning. I think adding an example of params and providing more info that makes clear that it's the way to provide data not directly hardcoded in the SQL is fine. I think the feedback above was to make sure we don't make users believe that using params will prevent SQL injection, which seems to be false for some drivers.

Maybe it would even be helpful to mention examples of drivers tbat do and don't sanitize params. I expected all them to do, I think it may be helpful for users to have an idea.

In any case, great changes, if you are happy with the PR as is, happy to get it merged, but I think you can add what you mention about params.

@mroeschke
Copy link
Member

Yeah I would support an example in the Examples section that uses params noting it's the recommended way to parameterize a query over f-strings

@datapythonista datapythonista changed the title Add docs for read_sql to avoid sql injection DOC: Add docs for read_sql to avoid sql injection Jan 15, 2024
@datapythonista
Copy link
Member

Do you want to add an example with params in this PR @erichxchen ?

@erichxchen
Copy link
Contributor Author

Do you want to add an example with params in this PR @erichxchen ?

Yes, I added an example with params in the Example Section.

In the example, I only say that using params is recommended over string interpolation. We do not make the promise that it will prevent the injection. So I think it should be okay.

@mroeschke mroeschke added this to the 3.0 milestone Jan 17, 2024
@mroeschke mroeschke merged commit 0ffb7e9 into pandas-dev:main Jan 17, 2024
@mroeschke
Copy link
Member

Thanks @erichxchen

pmhatre1 pushed a commit to pmhatre1/pandas-pmhatre1 that referenced this pull request May 7, 2024
* add docs for read_sql to avoid sql injection

* resolve formatting

* re-word the docs

* formatting

* refactor the example due to the length constraints

* remove reference directing users to use params

* changed to the general warning

* add the example for using params

---------

Co-authored-by: Hanxin Chen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs IO SQL to_sql, read_sql, read_sql_query
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants