Skip to content

Pdo\pgSql needs an efficient fetch() mode #15287

Closed
@outtersg

Description

@outtersg

Description

It would be expected that:

<?php
$statement = $db->query("select * from that_table");
while(($row = $statement->fetch()))
    fputcsv(…, $row);

be an efficient way to dump a table with PHP being a simple no-store pass-through.

But in fact fetch is not lazy as one could hope: the whole resultset is loaded in the query() before the script gets a chance to read the first rows.

E.g.: on a 24 million entries table of ~ 35 bytes per row, php consumes 2,5 GB of RAM during the whole process (that is, 3 bytes per effective data byte. It could have been worse).

In addition to this memory consumption, this prevents maximizing throughput, as we first have to wait the whole network part to finish, then only we do the whole disk I/O.

Cursors?

Here and there I see suggestions to use (server-side) cursors. But:

  • I'm still looking for how to use the internal ones (cursor_name does not want to get defined)
  • That's still a lot of overhead on network (and code? Because each call to the cursor is considered a new query, it may have to load column descriptions and so on from scratch)

Client laziness implementation?

libpq official docs says that PQexec is too high-level for that, and directs to PQsetSingleRowMode after a PQsendQuery.

As laziness modifies the script's responsibilities (error handling is probably different), this should not be the default.
But maybe PDO::ATTR_PREFETCH, used as a boolean (1 being the default to continue using PQexec, so an explicit set to 0 would be required to use PQsetSingleRowMode), could be used to this effect.

As for the code structure: the cursors implementation is a great starting point, because its workflow is very similar (necessity to handle interrupts after partial results, to close a previous unclosed cursor…).

Feature!

The discussions below led to requalify as a "new feature",
adding an optional mode:
both the Pdo and the Statement will accept a PDO::ATTR_PREFETCH which, when explicitely set to 0, enters this pass-through, efficient, lazy-fetch mode.

By default, without it, the fetch() keeps its traditional behaviour (of an once-at-all loading of the resultset, which eats memory but brings peace of mind, in that it guarantees that the loop will reach all results without the riskiness of being interrupted).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions