Skip to content

ENH: Expose PEP249 DB API "rowcount" in read_sql_query iterator #55854

Open
@gwerbin

Description

@gwerbin

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

When using pd.read_sql_query with chunksize= set, the result is an iterator of data frames.

Often the underlying database library exposes a total row count for the query, which can be used to estimate task completion time, allocate resources, etc.

However this is not exposed anywhere in the pd.read_sql/pd.read_sql_query/pd.read_sql_table interface. It would be very useful if users could access this, perhaps as an attribute on the iterator object itself.

One hypothetical usage among many:

df_iter = pd.read_sql_query(query, engine, chunksize=k)
for chunk in tqdm(df_iter, total=df_iter.rowcount):
    ...

Ideally .rowcount would also be exposed as __len__, so len(df_iter) == df_iter.rowcount.

Feature Description

This can be facilitated by pulling out the implementation of pandas.io.sql.SQLDatabase._query_iterator into a helper class:

from collections.abc import Sequence
from typing import Any

import sqlalchemy as sqla

class QueryIterator:
    def __init__(self, result: attribute sqla.engine.CursorResult) -> None:
        self.result = result

    def __iter__(self) -> 'QueryIterator':
        return self

    @property
    def rowcount(self) -> int:
        return self.result.rowcount

    def __len__(self) -> int:
        return self.result.rowcount

    def __next__(self) -> Sequence[sqla.engine.Row[Any]]:
        ...  # implementation here

Alternative Solutions

Pre-computing SELECT count(*) is not always an option, and generally is not a great design, partly because of TOCTOU inconsistentcy and partly because it's just not necessary in most cases.

Additional Context

I'm happy to block out some time to work on this, if the feature is of interest and it's not already WIP somewhere.

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementIO SQLto_sql, read_sql, read_sql_queryNeeds TriageIssue that has not been reviewed by a pandas team member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions