Skip to content

Fix #80783: PDO ODBC truncates BLOB records at every 256th byte #6716

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

cmb69
Copy link
Member

@cmb69 cmb69 commented Feb 22, 2021

It is not guaranteed, that the driver inserts only a single NUL byte at
the end of the buffer. Apparently, there is no way to find out the
actual data length in the buffer after calling SQLGetData(), so we
adjust after the next SQLGetData() call.


Note that the output of the IMAGE column looks fishy (it is bin2hex() encoded), but that would be different issue.

It is not guaranteed, that the driver inserts only a single NUL byte at
the end of the buffer.  Apparently, there is no way to find out the
actual data length in the buffer after calling `SQLGetData()`, so we
adjust after the next `SQLGetData()` call.
@cmb69 cmb69 added the Bug label Feb 22, 2021
@nikic
Copy link
Member

nikic commented Feb 22, 2021

It is not guaranteed, that the driver inserts only a single NUL byte at
the end of the buffer. Apparently, there is no way to find out the
actual data length in the buffer after calling SQLGetData(), so we
adjust after the next SQLGetData() call.

Just to double check, is the problem really multiple nul bytes, or a missing nul terminator? That's how the trace reads to me at least.

For reference, from https://docs.microsoft.com/en-us/sql/odbc/reference/syntax/sqlgetdata-function?view=sql-server-ver15:

For character or binary data, this is the length of the data after conversion and before truncation due to BufferLength. If the driver cannot determine the length of the data after conversion, as is sometimes the case with long data, it returns SQL_SUCCESS_WITH_INFO and sets the length to SQL_NO_TOTAL. (The last call to SQLGetData must always return the length of the data, not zero or SQL_NO_TOTAL.) If data was truncated due to the SQL_ATTR_MAX_LENGTH statement attribute, the value of this attribute - as opposed to the actual length - is placed in *StrLen_or_IndPtr. This is because this attribute is designed to truncate data on the server before conversion, so the driver has no way of figuring out what the actual length is. When SQLGetData is called multiple times in succession for the same column, this is the length of the data available at the start of the current call; that is, the length decreases with each subsequent call.

So C->fetched_len is the total (remaining) length of the data, not the length of the chunk. Really not obvious reading the code.

@cmb69
Copy link
Member Author

cmb69 commented Feb 22, 2021

Just to double check, is the problem really multiple nul bytes, or a missing nul terminator?

With the test and SQLServer, there have actually been two NUL bytes at the end (and only the first 254 bytes of the buffer where actually part of the column). But indeed, the provided ODBC traces indicate that there is no trailing NUL byte. I'm not sure whether that is a driver glitch, or fully conforming to the spec. Anyhow, I think we need to cater to that as well.

OTOH, I do not understand

this loop has to work whether or not SQLGetData() provides the total column length.
calling SQLDescribeCol() or other, specifically to get the column length, then doing a single read
for that size would be slower except maybe for extremely long columns.*/

This comment has replaced a single fetch in 10251b2. Why would chunked fetching be faster than a single fetch?

@codecov-io
Copy link

Codecov Report

Merging #6716 (963e50c) into PHP-7.4 (46d6dae) will decrease coverage by 0.00%.
The diff coverage is 67.72%.

Impacted file tree graph

@@             Coverage Diff             @@
##           PHP-7.4    #6716      +/-   ##
===========================================
- Coverage    72.99%   72.99%   -0.01%     
===========================================
  Files          803      803              
  Lines       283735   283798      +63     
===========================================
+ Hits        207114   207153      +39     
- Misses       76621    76645      +24     
Impacted Files Coverage Δ
ext/mysqlnd/mysqlnd_auth.c 56.99% <0.00%> (ø)
ext/opcache/ZendAccelerator.c 72.34% <0.00%> (-0.06%) ⬇️
ext/readline/readline.c 65.50% <0.00%> (-0.29%) ⬇️
ext/standard/streamsfuncs.c 86.32% <ø> (-0.06%) ⬇️
ext/zip/php_zip.c 77.04% <ø> (ø)
Zend/zend_vm_execute.h 66.75% <11.11%> (-0.08%) ⬇️
ext/soap/php_sdl.c 80.77% <15.38%> (+0.04%) ⬆️
ext/mysqlnd/mysqlnd_wireprotocol.c 81.17% <25.00%> (-0.07%) ⬇️
ext/pgsql/pgsql.c 62.58% <33.33%> (ø)
ext/opcache/zend_accelerator_blacklist.c 80.57% <50.00%> (+0.11%) ⬆️
... and 40 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 073b6ea...ce49c7c. Read the comment docs.

@cmb69
Copy link
Member Author

cmb69 commented Feb 25, 2021

I did some more testing. SQLServer with VARBINARY(MAX) column fetches 254 valid bytes (2 trailing NULs) as well. PostgreSQL with BYTEA column fetches 255 valid bytes (1 trailing NUL). MySQL with VARBINARY(4000) fetches 255 valid bytes (1 trailing NUL), but returns SQL_SUCCESS, so only a single SQLGetData() is executed; the rest of the string is arbitrary garbage.

One might think of "all these broken driver implementations", but it seems to me the real problem is that we fetch binary data as SQL_C_CHAR. Fetching as SQL_C_BINARY would yield 256 valid bytes (no trailing NUL) for all three drivers. However, changing this would be a massive BC break (results would no longer be bin2hex() encoded, but plain binary data). Well, except we would fetch as binary, and do the bin2hex() conversion ourselves. I'm unsure … thoughts?

This is actually already catered to, but we need to adjust the
assertion.
This is actually an own issue, but can't be fixed without fixing
#80783 first, so I add this to this PR.

The point is that we need to fetch all chunks with the same C type.
@cmb69
Copy link
Member Author

cmb69 commented Feb 25, 2021

Why would chunked fetching be faster than a single fetch?

I haven't done any performance tests regarding this issue, but it doesn't really matter, because a single fetch is not really possible (would need to determine the proper length in advance, what is a headache; cf. odbc_bindcols()), so we need at least two SQLGetData() calls, and having possibly more doesn't really hurt.

@Gpapi13
Copy link

Gpapi13 commented Mar 6, 2021

  • [ ]

@cmb69
Copy link
Member Author

cmb69 commented Mar 8, 2021

If there are no objections, I'll merge this by the end of the week.

@Gpapi13
Copy link

Gpapi13 commented Mar 10, 2021

  • [ ]

@Gpapi13
Copy link

Gpapi13 commented Mar 10, 2021

``

Copy link

@Gpapi13 Gpapi13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

``

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants