Fix #80783: PDO ODBC truncates BLOB records at every 256th byte #6716

cmb69 · 2021-02-22T14:20:58Z

It is not guaranteed, that the driver inserts only a single NUL byte at
the end of the buffer. Apparently, there is no way to find out the
actual data length in the buffer after calling SQLGetData(), so we
adjust after the next SQLGetData() call.

Note that the output of the IMAGE column looks fishy (it is bin2hex() encoded), but that would be different issue.

It is not guaranteed, that the driver inserts only a single NUL byte at the end of the buffer. Apparently, there is no way to find out the actual data length in the buffer after calling `SQLGetData()`, so we adjust after the next `SQLGetData()` call.

nikic · 2021-02-22T15:12:21Z

It is not guaranteed, that the driver inserts only a single NUL byte at
the end of the buffer. Apparently, there is no way to find out the
actual data length in the buffer after calling SQLGetData(), so we
adjust after the next SQLGetData() call.

Just to double check, is the problem really multiple nul bytes, or a missing nul terminator? That's how the trace reads to me at least.

For reference, from https://docs.microsoft.com/en-us/sql/odbc/reference/syntax/sqlgetdata-function?view=sql-server-ver15:

For character or binary data, this is the length of the data after conversion and before truncation due to BufferLength. If the driver cannot determine the length of the data after conversion, as is sometimes the case with long data, it returns SQL_SUCCESS_WITH_INFO and sets the length to SQL_NO_TOTAL. (The last call to SQLGetData must always return the length of the data, not zero or SQL_NO_TOTAL.) If data was truncated due to the SQL_ATTR_MAX_LENGTH statement attribute, the value of this attribute - as opposed to the actual length - is placed in *StrLen_or_IndPtr. This is because this attribute is designed to truncate data on the server before conversion, so the driver has no way of figuring out what the actual length is. When SQLGetData is called multiple times in succession for the same column, this is the length of the data available at the start of the current call; that is, the length decreases with each subsequent call.

So C->fetched_len is the total (remaining) length of the data, not the length of the chunk. Really not obvious reading the code.

cmb69 · 2021-02-22T15:36:40Z

Just to double check, is the problem really multiple nul bytes, or a missing nul terminator?

With the test and SQLServer, there have actually been two NUL bytes at the end (and only the first 254 bytes of the buffer where actually part of the column). But indeed, the provided ODBC traces indicate that there is no trailing NUL byte. I'm not sure whether that is a driver glitch, or fully conforming to the spec. Anyhow, I think we need to cater to that as well.

OTOH, I do not understand

php-src/ext/pdo_odbc/odbc_stmt.c

Lines 678 to 680 in 963e50c

    
           			 this loop has to work whether or not SQLGetData() provides the total column length. 
        
           			 calling SQLDescribeCol() or other, specifically to get the column length, then doing a single read 
        
           			 for that size would be slower except maybe for extremely long columns.*/

This comment has replaced a single fetch in 10251b2. Why would chunked fetching be faster than a single fetch?

codecov-io · 2021-02-23T02:03:14Z

Codecov Report

Merging #6716 (963e50c) into PHP-7.4 (46d6dae) will decrease coverage by 0.00%.
The diff coverage is 67.72%.

@@             Coverage Diff             @@
##           PHP-7.4    #6716      +/-   ##
===========================================
- Coverage    72.99%   72.99%   -0.01%     
===========================================
  Files          803      803              
  Lines       283735   283798      +63     
===========================================
+ Hits        207114   207153      +39     
- Misses       76621    76645      +24

Impacted Files	Coverage Δ
ext/mysqlnd/mysqlnd_auth.c	`56.99% <0.00%> (ø)`
ext/opcache/ZendAccelerator.c	`72.34% <0.00%> (-0.06%)`	⬇️
ext/readline/readline.c	`65.50% <0.00%> (-0.29%)`	⬇️
ext/standard/streamsfuncs.c	`86.32% <ø> (-0.06%)`	⬇️
ext/zip/php_zip.c	`77.04% <ø> (ø)`
Zend/zend_vm_execute.h	`66.75% <11.11%> (-0.08%)`	⬇️
ext/soap/php_sdl.c	`80.77% <15.38%> (+0.04%)`	⬆️
ext/mysqlnd/mysqlnd_wireprotocol.c	`81.17% <25.00%> (-0.07%)`	⬇️
ext/pgsql/pgsql.c	`62.58% <33.33%> (ø)`
ext/opcache/zend_accelerator_blacklist.c	`80.57% <50.00%> (+0.11%)`	⬆️
... and 40 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 073b6ea...ce49c7c. Read the comment docs.

cmb69 · 2021-02-25T12:54:34Z

I did some more testing. SQLServer with VARBINARY(MAX) column fetches 254 valid bytes (2 trailing NULs) as well. PostgreSQL with BYTEA column fetches 255 valid bytes (1 trailing NUL). MySQL with VARBINARY(4000) fetches 255 valid bytes (1 trailing NUL), but returns SQL_SUCCESS, so only a single SQLGetData() is executed; the rest of the string is arbitrary garbage.

One might think of "all these broken driver implementations", but it seems to me the real problem is that we fetch binary data as SQL_C_CHAR. Fetching as SQL_C_BINARY would yield 256 valid bytes (no trailing NUL) for all three drivers. However, changing this would be a massive BC break (results would no longer be bin2hex() encoded, but plain binary data). Well, except we would fetch as binary, and do the bin2hex() conversion ourselves. I'm unsure … thoughts?

This is actually already catered to, but we need to adjust the assertion.

This is actually an own issue, but can't be fixed without fixing #80783 first, so I add this to this PR. The point is that we need to fetch all chunks with the same C type.

cmb69 · 2021-02-25T13:57:21Z

Why would chunked fetching be faster than a single fetch?

I haven't done any performance tests regarding this issue, but it doesn't really matter, because a single fetch is not really possible (would need to determine the proper length in advance, what is a headache; cf. odbc_bindcols()), so we need at least two SQLGetData() calls, and having possibly more doesn't really hurt.

Gpapi13 · 2021-03-06T11:11:05Z

[ ]

cmb69 · 2021-03-08T14:25:30Z

If there are no objections, I'll merge this by the end of the week.

Gpapi13 · 2021-03-10T06:31:38Z

[ ]

Gpapi13 · 2021-03-10T06:31:48Z

``

Gpapi13

``

cmb69 added the Bug label Feb 22, 2021

cmb69 added 2 commits February 25, 2021 14:42

Cater to drivers which don't insert trailing NUL bytes

a691791

This is actually already catered to, but we need to adjust the assertion.

Prevent PDO::ODBC_ATTR_ASSUME_UTF8 from fetching garbage

fab2e5f

This is actually an own issue, but can't be fixed without fixing #80783 first, so I add this to this PR. The point is that we need to fetch all chunks with the same C type.

Gpapi13 approved these changes Mar 10, 2021

View reviewed changes

mvorisek mentioned this pull request Mar 11, 2021

Fix and reintroduce support for 4000+ bytes CLOB/BLOB for Oracle atk4/dsql#274

Closed

php-pulls closed this in bccca0b Mar 18, 2021

cmb69 deleted the cmb/80783 branch March 18, 2021 14:32

NattyNarwhal mentioned this pull request Feb 28, 2023

Drivers can provide PDO_ODBC incorrectly converted data for long columns with stateful encodings #10733

Open

cmb69 mentioned this pull request Oct 15, 2024

PDO_ODBC can inject garbage into field values #16450

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix #80783: PDO ODBC truncates BLOB records at every 256th byte #6716

Fix #80783: PDO ODBC truncates BLOB records at every 256th byte #6716

cmb69 commented Feb 22, 2021

nikic commented Feb 22, 2021

cmb69 commented Feb 22, 2021

codecov-io commented Feb 23, 2021

cmb69 commented Feb 25, 2021

cmb69 commented Feb 25, 2021

Gpapi13 commented Mar 6, 2021

cmb69 commented Mar 8, 2021

Gpapi13 commented Mar 10, 2021

Gpapi13 commented Mar 10, 2021

Gpapi13 left a comment

Fix #80783: PDO ODBC truncates BLOB records at every 256th byte #6716

Fix #80783: PDO ODBC truncates BLOB records at every 256th byte #6716

Conversation

cmb69 commented Feb 22, 2021

nikic commented Feb 22, 2021

cmb69 commented Feb 22, 2021

codecov-io commented Feb 23, 2021

Codecov Report

cmb69 commented Feb 25, 2021

cmb69 commented Feb 25, 2021

Gpapi13 commented Mar 6, 2021

cmb69 commented Mar 8, 2021

Gpapi13 commented Mar 10, 2021

Gpapi13 commented Mar 10, 2021

Gpapi13 left a comment

Choose a reason for hiding this comment