Skip to content

Fix GH-13452: Fixed handshake response [mysqlnd] #13470

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

SakiTakamachi
Copy link
Member

@SakiTakamachi SakiTakamachi commented Feb 22, 2024

please see #13452

It's probably safe to use MariaDB if the MySQL test passes, but ideally it would be safer to have a MariaDB test as well.


By the way, this problem also occurs in MySQL when I make utf8mb4_de_pb_0900_ai_ci the default collation charset.


In conclusion, regarding the character set in question here, both utf8mb4_general_ci and utf8mb4_unicode_ci can be considered the same utf8mb4, since collation is not relevant.

Therefore, we can purely use ???_general_ci.

The remaining issue is compatibility with older versions. MySQL introduced utf8mb4 from 5.5. MariaDB5.5 is compatible with MySQL5.5. All later Maria versions are 10 or higher, so I decided to check whether the version value is 50500 or higher as a criterion for whether to use utf8mb4.

@SakiTakamachi SakiTakamachi changed the base branch from master to PHP-8.2 February 22, 2024 00:40
@SakiTakamachi SakiTakamachi marked this pull request as draft February 22, 2024 00:50
@SakiTakamachi
Copy link
Member Author

Looks like I need to check about packet->charset_no as well.

@@ -642,15 +642,6 @@ MYSQLND_METHOD(mysqlnd_command, handshake)(MYSQLND_CONN_DATA * const conn, const
conn->protocol_version = greet_packet.protocol_version;
conn->server_version = mnd_pestrdup(greet_packet.server_version, conn->persistent);

conn->greet_charset = mysqlnd_find_charset_nr(greet_packet.charset_no);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note can also test by placing --character-set-collations=utf8mb4=uca1400_ai_ci as the command on the 11.3+ image to enforce the setting.

@SakiTakamachi SakiTakamachi marked this pull request as ready for review February 22, 2024 15:25
@SakiTakamachi SakiTakamachi requested review from nielsdos and kamil-tekiela and removed request for kamil-tekiela and nielsdos February 22, 2024 15:25
@SakiTakamachi SakiTakamachi marked this pull request as draft February 22, 2024 15:51
@nielsdos
Copy link
Member

If the mysqlnd structs are part of the public API (which I think is the case), then we should not remove the field in stable branches to avoid breaking binary compatibility with 3rd party drivers.
We should keep the field and keep it set to NULL in that case, and only remove it in master.

@SakiTakamachi
Copy link
Member Author

@nielsdos
Thanks, you are right. I forgot about the API

@kamil-tekiela
Copy link
Member

I wonder what this actually does. From a quick look, it seems like we just resend the same value back to MySQL. Presumably, we should not be doing that. We should only be sending the charset number that the client set, which can probably happen with PDO, but not mysqli. Although, I am not sure how that works. Why do we even send the charset number? I'll look into it more, but it seems to me like we could just send harcoded 0 or the value from conn object if it's somehow set.

@SakiTakamachi
Copy link
Member Author

@kamil-tekiela

This always happens on the first handshake of a connection, so it probably should happen on mysqli as well.

What I'm trying to understand now is how a single byte of charset that I send from the client to the database server works. If there is any impact, a BC Break will occur at the same time as the bug is fixed.

However, I have no idea how it would be handled correctly by sending only the lower 8 bits of values ​​greater than 255.

@SakiTakamachi
Copy link
Member Author

I tried this.

packet->charset_no = uint1korr(p);

Change as follows:

packet->charset_no = 30;

php:

<?php

$pdo = new PDO(/* mysql */);

$pdo->exec('CREATE DATABASE CTEST');
$r = $pdo->query("SELECT DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME FROM INFORMATION_SCHEMA.SCHEMATA WHERE SCHEMA_NAME = 'CTEST';");

var_dump($r->fetchAll(PDO::FETCH_ASSOC));

result:

array(1) {
  [0]=>
  array(2) {
    ["DEFAULT_CHARACTER_SET_NAME"]=>
    string(7) "utf8mb4"
    ["DEFAULT_COLLATION_NAME"]=>
    string(18) "utf8mb4_0900_ai_ci"
  }
}

The ID of utf8mb4_0900_ai_ci is 255. My environment is MySQL8.0, so this result means that the default value of charset has not changed anything.

@kamil-tekiela
Copy link
Member

I don't think changing charset_no would have any impact on the charset of your tables. In fact, I believe it should NOT affect it in any way as that would be wrong.

The value in charset_no specifies which charset is used in transit between the client and the server. It's known as the connection charset. It's not the same as the table/db/column charset.

@SakiTakamachi
Copy link
Member Author

At least the charset_no included in the handshake initially sent from the database server involves the charset of the collation.

Since the value of the lower 8 bits is sent, if a charset with an ID of 256 is specified as the collation order, the value sent will be 0 and an error will occur during mysqlnd validation.

e.g. /etc/mysql/my.cnf

[mysqld]
character-set-server=utf8mb4
collation-server=utf8mb4_de_pb_0900_ai_ci

This is an error because the ID of utf8mb4_de_pb_0900_ai_ci is 256.

@SakiTakamachi
Copy link
Member Author

SakiTakamachi commented Feb 23, 2024

@kamil-tekiela
But your comment is very helpful. As before, I checked the connection character set with ID30 specified.

result:

array(8) {
  [0]=>
  array(2) {
    ["Variable_name"]=>
    string(20) "character_set_client"
    ["Value"]=>
    string(6) "latin5"
  }
  [1]=>
  array(2) {
    ["Variable_name"]=>
    string(24) "character_set_connection"
    ["Value"]=>
    string(6) "latin5"
  }
  [2]=>
  array(2) {
    ["Variable_name"]=>
    string(22) "character_set_database"
    ["Value"]=>
    string(7) "utf8mb4"
  }
  [3]=>
  array(2) {
    ["Variable_name"]=>
    string(24) "character_set_filesystem"
    ["Value"]=>
    string(6) "binary"
  }
  [4]=>
  array(2) {
    ["Variable_name"]=>
    string(21) "character_set_results"
    ["Value"]=>
    string(6) "latin5"
  }
  [5]=>
  array(2) {
    ["Variable_name"]=>
    string(20) "character_set_server"
    ["Value"]=>
    string(7) "utf8mb4"
  }
  [6]=>
  array(2) {
    ["Variable_name"]=>
    string(20) "character_set_system"
    ["Value"]=>
    string(7) "utf8mb3"
  }
  [7]=>
  array(2) {
    ["Variable_name"]=>
    string(18) "character_sets_dir"
    ["Value"]=>
    string(26) "/usr/share/mysql/charsets/"
  }
}

CLI result:

+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8mb4                    |
| character_set_connection | utf8mb4                    |
| character_set_database   | utf8mb4                    |
| character_set_filesystem | binary                     |
| character_set_results    | utf8mb4                    |
| character_set_server     | utf8mb4                    |
| character_set_system     | utf8mb3                    |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+

We can see that it has been changed to latin5. In other words, the charset_no sent by the server during the handshake and the charset_no that should be returned from the client in the handshake response have the same name but are completely different.


(edit)

default server a_protocol_character_set, only the lower 8-bits

https://dev.mysql.com/doc/dev/mysql-server/latest/page_protocol_connection_phase_packets_protocol_handshake_v10.html

This is actually a collation character set.

@SakiTakamachi
Copy link
Member Author

To fix this issue while avoiding BC Break:

  • In the stable branch, only if charset_no is 0, it will be reset to 255, which is the default value of MySQL8 series.
  • In the master branch, we stop reading the charset_no of handshake packets and standardize them all to 255.

How do you think?

@SakiTakamachi
Copy link
Member Author

In the stable branch, only if charset_no is 0, it will be reset to 255, which is the default value of MySQL8 series.

First, I fixed it like that. It seems better to do the change to master as a follow-up in another PR.

@@ -642,7 +643,7 @@ MYSQLND_METHOD(mysqlnd_command, handshake)(MYSQLND_CONN_DATA * const conn, const
conn->protocol_version = greet_packet.protocol_version;
conn->server_version = mnd_pestrdup(greet_packet.server_version, conn->persistent);

conn->greet_charset = mysqlnd_find_charset_nr(greet_packet.charset_no);
conn->greet_charset = mysqlnd_find_charset_nr(greet_packet.charset_no != 0 ? greet_packet.charset_no : MYSQLND_DEFAULT_PROTOCOL_CHARSET);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left L647〜 as a fallback, but it looks like it could be UNEXPECTED.

@SakiTakamachi SakiTakamachi changed the title Fix GH-13452: Removed conn->greet_charset [mysqlnd] Fix GH-13452: Fixed handshake response [mysqlnd] Feb 23, 2024
@kamil-tekiela
Copy link
Member

I see what you are trying to do here, Saki, and it makes sense, but I am worried we can't actually do that. We don't know if the server supports this charset.

The reason for this check was that we allow the user to select their preferred character set, but if they don't (in case of mysqli it's even impossible) then we need to provide the server with some default character set ID. What better way than resending the one that we received from the server in the handshake? But as we can see here, this value cannot be relied upon. MariaDB might add a gracious fallback for this issue, but we should get rid of this feature in mysqlnd anyway. So if we cannot use the value received from the server, how do we pick a default character set that will work?

The line you are changing is actually the fallback for when the character set wasn't chosen by the user. What you are suggesting is that we add a fallback to this fallback. We could probably cut the middle out completely. After all, this value has proven to be unreliable and could cause other bugs too. Let's not read the greet_packet.charset_no at all. However, passing some hardcoded ID is not a good idea either. I don't know what happens when the server receives an ID that is not supported, but if at best the server replies with an error then we leave our PHP users in a situation without a fix. We could use an ID that is more likely to be supported everywhere, e.g. 8 but we still run a risk of providing a value which could cause issues.

Unfortunately, I don't have a better idea as of now. Another issue is that many PHP developers have come to rely on this feature. It would actually be a pretty significant breaking change for recklessly designed projects.

@kamil-tekiela
Copy link
Member

To minimize the impact of this issue we could change the code to only check the value of greet_packet.charset_no if the user hasn't already provided a valid charset. This would at least help PDO users.

@SakiTakamachi
Copy link
Member Author

SakiTakamachi commented Feb 23, 2024

@kamil-tekiela

Thanks for the confirmation.

Overwrite it after this processing, probably with a user-specified charset. Here we are simply validating the value sent by the server.

I think the safest way to do this is to look up all the past MySQL and MariaDB default values ​​and set the appropriate default values ​​for each server version. Fortunately the version information is included in the handshake packet, so we might be able to do this.

However, we should only do this as a fallback in stable branches when charset_no is 0, as we risk breaking user environments that happen to rely on the current bad behavior.

It's true that we shouldn't read the charset of the handshake packet, so we should avoid doing so in master.


(edit)
I'm really curious how libmysql handles this...

@SakiTakamachi
Copy link
Member Author

When connecting to a MySQL8.0 server with libmysql, the connection character set is utf8mb4. I got same result if I default the server collation to latin1. Presumably, libmysql automatically sets the initial value of the corresponding version if the connection character set is unspecified.

@kamil-tekiela
Copy link
Member

Maybe we can do something similar. Can you check how they do it in their C code?

@SakiTakamachi
Copy link
Member Author

libmysql is hard-coding default values ​​as constants. This is possible because there is a driver for each version of MySQL...

However, if we could provide some default values ​​for each version, it would be possible to do something similar.

@SakiTakamachi
Copy link
Member Author

SakiTakamachi commented Feb 28, 2024

I thought about this for a while, and I think it's better to use utf8mb4_general_ci(45) for MySQL 5.5 or higher, and utf8mb3_general_ci(33) for older versions.

@grooverdan
Copy link
Contributor

utf8mb4_general_ci

In 5.5 there got added utf8mb4_unicode_ci (224) which seems a better default base on SO.

@SakiTakamachi
Copy link
Member Author

I consulted the MariaDB team on #13452

In conclusion, regarding the character set in question here, both utf8mb4_general_ci and utf8mb4_unicode_ci can be considered the same utf8mb4, since collation is not relevant.

Therefore, we can purely use ???_general_ci.

The remaining issue is compatibility with older versions. MySQL introduced utf8mb4 from 5.5. MariaDB5.5 is compatible with MySQL5.5. All later Maria versions are 10 or higher, so I decided to check whether the version value is 50500 or higher as a criterion for whether to use utf8mb4.

@kamil-tekiela How do you think that?

@SakiTakamachi SakiTakamachi marked this pull request as ready for review February 29, 2024 16:16
mnd_sprintf_free(msg);
goto err;
const MYSQLND_CHARSET *read_charset = mysqlnd_find_charset_nr(greet_packet.charset_no);
if (!read_charset) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look closely at mysqlnd_charset.c, there are some missing numbers in the character set ID. (e.g. 76, 100-127)

So, checking for 0 is not enough here. We need to check whether the corresponding character set exists.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Saki. That's why I feel that the way the current code works is correct. It raises an error whenever an unexpected charset is provided. 0 doesn't have any special meaning here.

What you are suggesting would probably be ok, but not ideal. utf8 and utf8mb4 should be a viable fallback value. The advantage is that we open the connection despite receiving an invalid charset value from the server. The disadvantage is that we silently switch to a different charset than a user might have expected. A properly designed PHP application will set the desired charset anyway.

So I would suggest using the two constant values as a fallback if we receive an unknown value from the server. Then we can think if we can drop this feature from mysqlnd in a major release. Maybe 9.0 would be good to do this.

Copy link
Member Author

@SakiTakamachi SakiTakamachi Feb 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for confirmation.

So I would suggest using the two constant values as a fallback if we receive an unknown value from the server.

If I didn't make a mistake, it should already work as a fallback like you mention. Two constants are used when mysqlnd_find_charset_nr() cannot find a character set. If it can find, uses the charset it finds in L651.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we can think if we can drop this feature from mysqlnd in a major release. Maybe 9.0 would be good to do this.

Agree. Initially I was thinking of using 8.4, but 9.x might be ideal.

Copy link
Member

@kamil-tekiela kamil-tekiela left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the logic looks fine. I doubt we can write a test for this, though.

@SakiTakamachi
Copy link
Member Author

Yeah, the logic looks fine. I doubt we can write a test for this, though.

We can create the same condition as this problem by rewriting my.cnf and restarting MySQL, but it can be an expensive test... I can't think of a realistic way to test this, so I might just commit as is for now.

SakiTakamachi added a commit that referenced this pull request Mar 4, 2024
The character set ID included in the handshake data at the time of connection
actually only includes the lower 8 bits of the ID, so if  try to use this to specify
a character set, the corresponding character set may not exist.

In case of an invalid character set, the default character set is now used
without an error.

Fixes #13452
Closes #13470
SakiTakamachi added a commit that referenced this pull request Mar 4, 2024
The character set ID included in the handshake data at the time of connection
actually only includes the lower 8 bits of the ID, so if  try to use this to specify
a character set, the corresponding character set may not exist.

In case of an invalid character set, the default character set is now used
without an error.

Fixes #13452
Closes #13470
SakiTakamachi added a commit that referenced this pull request Mar 4, 2024
* PHP-8.2:
  Fixed handshake response charset. (#13470)
  [skip ci] Fixed NEWS
SakiTakamachi added a commit that referenced this pull request Mar 4, 2024
* PHP-8.3:
  NEWS
  Fixed handshake response charset. (#13470)
  Fixed handshake response charset. (#13470)
  [skip ci] Fixed NEWS
@SakiTakamachi SakiTakamachi deleted the fix/gh-13452 branch March 15, 2024 14:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants