-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Fix GH-13452: Fixed handshake response [mysqlnd] #13470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
e771f74
to
f92bb21
Compare
Looks like I need to check about |
@@ -642,15 +642,6 @@ MYSQLND_METHOD(mysqlnd_command, handshake)(MYSQLND_CONN_DATA * const conn, const | |||
conn->protocol_version = greet_packet.protocol_version; | |||
conn->server_version = mnd_pestrdup(greet_packet.server_version, conn->persistent); | |||
|
|||
conn->greet_charset = mysqlnd_find_charset_nr(greet_packet.charset_no); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MariaDB 11.3 Docker images have been patched - MariaDB/mariadb-docker@d7a950d
but the GH-13452 can be tested using 11.3.1-rc image - https://hub.docker.com/layers/library/mariadb/11.3.1-rc/images/sha256-f37b1ba4f15fb90679f8f85fd7300815ef4c5ff7ada668bccbdff33e2ad8ecd6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note can also test by placing --character-set-collations=utf8mb4=uca1400_ai_ci
as the command on the 11.3+ image to enforce the setting.
If the mysqlnd structs are part of the public API (which I think is the case), then we should not remove the field in stable branches to avoid breaking binary compatibility with 3rd party drivers. |
@nielsdos |
I wonder what this actually does. From a quick look, it seems like we just resend the same value back to MySQL. Presumably, we should not be doing that. We should only be sending the charset number that the client set, which can probably happen with PDO, but not mysqli. Although, I am not sure how that works. Why do we even send the charset number? I'll look into it more, but it seems to me like we could just send harcoded 0 or the value from |
This always happens on the first handshake of a connection, so it probably should happen on mysqli as well. What I'm trying to understand now is how a single byte of charset that I send from the client to the database server works. If there is any impact, a BC Break will occur at the same time as the bug is fixed. However, I have no idea how it would be handled correctly by sending only the lower 8 bits of values greater than 255. |
I tried this. php-src/ext/mysqlnd/mysqlnd_wireprotocol.c Line 395 in 171e398
Change as follows:
php:
result:
The ID of |
I don't think changing The value in |
At least the Since the value of the lower 8 bits is sent, if a charset with an ID of e.g. /etc/mysql/my.cnf
This is an error because the ID of |
@kamil-tekiela result:
CLI result:
We can see that it has been changed to latin5. In other words, the charset_no sent by the server during the handshake and the charset_no that should be returned from the client in the handshake response have the same name but are completely different. (edit)
This is actually a collation character set. |
To fix this issue while avoiding BC Break:
How do you think? |
f92bb21
to
75b33b2
Compare
First, I fixed it like that. It seems better to do the change to master as a follow-up in another PR. |
75b33b2
to
0e50b96
Compare
ext/mysqlnd/mysqlnd_commands.c
Outdated
@@ -642,7 +643,7 @@ MYSQLND_METHOD(mysqlnd_command, handshake)(MYSQLND_CONN_DATA * const conn, const | |||
conn->protocol_version = greet_packet.protocol_version; | |||
conn->server_version = mnd_pestrdup(greet_packet.server_version, conn->persistent); | |||
|
|||
conn->greet_charset = mysqlnd_find_charset_nr(greet_packet.charset_no); | |||
conn->greet_charset = mysqlnd_find_charset_nr(greet_packet.charset_no != 0 ? greet_packet.charset_no : MYSQLND_DEFAULT_PROTOCOL_CHARSET); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left L647〜 as a fallback, but it looks like it could be UNEXPECTED.
I see what you are trying to do here, Saki, and it makes sense, but I am worried we can't actually do that. We don't know if the server supports this charset. The reason for this check was that we allow the user to select their preferred character set, but if they don't (in case of mysqli it's even impossible) then we need to provide the server with some default character set ID. What better way than resending the one that we received from the server in the handshake? But as we can see here, this value cannot be relied upon. MariaDB might add a gracious fallback for this issue, but we should get rid of this feature in mysqlnd anyway. So if we cannot use the value received from the server, how do we pick a default character set that will work? The line you are changing is actually the fallback for when the character set wasn't chosen by the user. What you are suggesting is that we add a fallback to this fallback. We could probably cut the middle out completely. After all, this value has proven to be unreliable and could cause other bugs too. Let's not read the Unfortunately, I don't have a better idea as of now. Another issue is that many PHP developers have come to rely on this feature. It would actually be a pretty significant breaking change for recklessly designed projects. |
To minimize the impact of this issue we could change the code to only check the value of |
Thanks for the confirmation. Overwrite it after this processing, probably with a user-specified charset. Here we are simply validating the value sent by the server. I think the safest way to do this is to look up all the past MySQL and MariaDB default values and set the appropriate default values for each server version. Fortunately the version information is included in the handshake packet, so we might be able to do this. However, we should only do this as a fallback in stable branches when It's true that we shouldn't read the charset of the handshake packet, so we should avoid doing so in master. (edit) |
When connecting to a MySQL8.0 server with libmysql, the connection character set is |
Maybe we can do something similar. Can you check how they do it in their C code? |
libmysql is hard-coding default values as constants. This is possible because there is a driver for each version of MySQL... However, if we could provide some default values for each version, it would be possible to do something similar. |
I thought about this for a while, and I think it's better to use |
0e50b96
to
489438f
Compare
I consulted the MariaDB team on #13452 In conclusion, regarding the character set in question here, both Therefore, we can purely use The remaining issue is compatibility with older versions. MySQL introduced @kamil-tekiela How do you think that? |
mnd_sprintf_free(msg); | ||
goto err; | ||
const MYSQLND_CHARSET *read_charset = mysqlnd_find_charset_nr(greet_packet.charset_no); | ||
if (!read_charset) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you look closely at mysqlnd_charset.c
, there are some missing numbers in the character set ID. (e.g. 76, 100-127)
So, checking for 0
is not enough here. We need to check whether the corresponding character set exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, Saki. That's why I feel that the way the current code works is correct. It raises an error whenever an unexpected charset is provided. 0 doesn't have any special meaning here.
What you are suggesting would probably be ok, but not ideal. utf8
and utf8mb4
should be a viable fallback value. The advantage is that we open the connection despite receiving an invalid charset value from the server. The disadvantage is that we silently switch to a different charset than a user might have expected. A properly designed PHP application will set the desired charset anyway.
So I would suggest using the two constant values as a fallback if we receive an unknown value from the server. Then we can think if we can drop this feature from mysqlnd in a major release. Maybe 9.0 would be good to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for confirmation.
So I would suggest using the two constant values as a fallback if we receive an unknown value from the server.
If I didn't make a mistake, it should already work as a fallback like you mention. Two constants are used when mysqlnd_find_charset_nr()
cannot find a character set. If it can find, uses the charset it finds in L651.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we can think if we can drop this feature from mysqlnd in a major release. Maybe 9.0 would be good to do this.
Agree. Initially I was thinking of using 8.4, but 9.x might be ideal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the logic looks fine. I doubt we can write a test for this, though.
We can create the same condition as this problem by rewriting my.cnf and restarting MySQL, but it can be an expensive test... I can't think of a realistic way to test this, so I might just commit as is for now. |
The character set ID included in the handshake data at the time of connection actually only includes the lower 8 bits of the ID, so if try to use this to specify a character set, the corresponding character set may not exist. In case of an invalid character set, the default character set is now used without an error. Fixes #13452 Closes #13470
The character set ID included in the handshake data at the time of connection actually only includes the lower 8 bits of the ID, so if try to use this to specify a character set, the corresponding character set may not exist. In case of an invalid character set, the default character set is now used without an error. Fixes #13452 Closes #13470
* PHP-8.2: Fixed handshake response charset. (#13470) [skip ci] Fixed NEWS
please see #13452
It's probably safe to use MariaDB if the MySQL test passes, but ideally it would be safer to have a MariaDB test as well.
By the way, this problem also occurs in MySQL when I make
utf8mb4_de_pb_0900_ai_ci
the default collation charset.In conclusion, regarding the character set in question here, both
utf8mb4_general_ci
andutf8mb4_unicode_ci
can be considered the sameutf8mb4
, since collation is not relevant.Therefore, we can purely use
???_general_ci
.The remaining issue is compatibility with older versions. MySQL introduced
utf8mb4
from 5.5. MariaDB5.5 is compatible with MySQL5.5. All later Maria versions are 10 or higher, so I decided to check whether the version value is50500
or higher as a criterion for whether to useutf8mb4
.