Skip to content

Slow SELECT query on large table #741

Closed
@mlexs

Description

@mlexs
  • asyncpg version: 0.22.0
  • PostgreSQL version: 11.10
  • Do you use a PostgreSQL SaaS? If so, which? Can you reproduce
    the issue with a local PostgreSQL install?
    : No, and I was able to reproduce it locally.
  • Python version: 3.7.9
  • Platform: Fedora 31
  • Do you use pgbouncer?: No
  • Did you install asyncpg with pip?: Yes
  • If you built asyncpg locally, which version of Cython did you use?:
  • Can the issue be reproduced under both asyncio and
    uvloop?
    : We use Starlette for serving API requests.

We have a large (~30 million rows) database table. I have noticed the SQL would run terribly slow when executed via asyncpg but very fast via psql.

Schema for table:

CREATE TABLE addresses(
    zipcode VARCHAR(12),
    line1 text,
    -- 13x more fields of text type
);
CREATE INDEX z_idx ON addresses(zipcode, line1);

Query to execute.

SELECT line_1, ..., zipcode FROM addresses WHERE REPLACE(zipcode, ' ', '')=$1 GROUP BY A, B, line1 ORDER BY A, B

This query never results in more than 100 rows (out of mentioned 30 million).

Time to execute via psql: less than 1 ms
Same query via asyncpg: more than 6 seconds (!)

I have not looked at source code for asyncpg so not sure what's going on here.

Can someone tell me why asyncpg runs this query so slow? Thanks!

FYI I have fixed this "temporarily" by some quick data normalisation where I added zipcode_x that contains no space so I could got rid of the SQL's REPLACE function.

SELECT line_1, ..., zipcode FROM addresses WHERE zipcode=$1 GROUP BY A, B, line1 ORDER BY A, B

and now asyncpg is very fast (as it should be).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions