Description
Feature Type
-
Adding new functionality to pandas
-
Changing existing functionality in pandas
-
Removing existing functionality in pandas
Problem Description
I wish I could use Pandas to query Salesforce using SOQL (Salesforce Object Query Language) just as you do with relational databases using pd.read_sql()
. Currently, to retrieve Salesforce data into a Pandas DataFrame, users must execute queries using simple_salesforce and manually convert results to DataFrames, which in case of multi-level queries can become quite inefficient (due to the nested format of the outputs of Salesforce REST API).
Feature Description
The function would parallel pd.read_sql()
and would look like this:
def read_soql(
query: str, # Equivalent to `query` in `pd.read_sql()`, representing the SOQL query string.
con, # Expects a `simple_salesforce.Salesforce` object instead of an SQLAlchemy connection.
index_col: str | list[str] | None = None, # Same as in `pd.read_sql()`
parse_dates=None, # Same as in `pd.read_sql()`
dtype: DtypeArg | None = None, # Same as in `pd.read_sql()`
dtype_backend: DtypeBackend | lib.NoDefault = lib.no_default # Same as in `pd.read_sql()`
) -> DataFrame:
# validate connection is Salesforce object
# validate dtype_backend is valid option
# execute SOQL query and get all records
# flatten output and remove metadata
# convert records to DataFrame
# if dtype specified: convert columns to specified types
# if parse_dates specified: convert date columns
# if index_col specified: set DataFrame index
# if dtype_backend != 'numpy': convert to nullable types
# return DataFrame
Alternative Solutions
Alternative naming
Since pd.read_sql()
is a convenience wrapper around read_sql_table
and read_sql_query
, a more "formally" correct name might be read_soql_query
, as there is no corresponding read_soql_table
. This would maintain a closer parallel to Pandas' SQL functions.
However, I propose read_soql
for brevity, and for consistency with other I/O functions such as pd.read_excel()
, pd.read_parquet()
, pd.read_feather()
, pd.read_orc()
etc.
Additional Context
I am interested in developing this feature as I have already done some work towards its implementation.