Fix #76093: Format strings w/o loss of precision w/ FORMAT_TYPE_DECIMAL #12432

lucaswerkmeister · 2023-10-13T17:58:04Z

Resubmit of #9909, which was itself a resubmit of #7782. Rebased onto latest master. Request #76093 is classified as a feature/change request rather than a bug, which I think I agree with, so IIUC this is supposed to be based on master rather than PHP-8.0.

lucaswerkmeister · 2023-10-13T19:33:11Z

Hm, there’s a CI error I’m not sure I understand :/

runtime error: 1e+19 is outside the range of representable values of type 'long' – UndefinedBehaviorSanitizer

========DIFF========
--
       ["currency"]=>
       string(29) "$9,999,999,999,999,999,999.00"
     }
029+ /__w/php-src/php-src/ext/intl/formatter/formatter_format.c:105:13: runtime error: 1e+19 is outside the range of representable values of type 'long'
030+ SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /__w/php-src/php-src/ext/intl/formatter/formatter_format.c:105:13 in 
     array(6) {
       ["input"]=>
       float(1.0E+19)
--
========DONE========
FAIL Bug #76093 (NumberFormatter::format loses precision) [ext/intl/tests/bug76093.phpt]

If I understand correctly, the UB is in the if branch of the following part of the diff:

@@ -79 +103,7 @@ PHP_FUNCTION( numfmt_format )
-			int64_t value = (Z_TYPE_P(number) == IS_DOUBLE)?(int64_t)Z_DVAL_P(number):Z_LVAL_P(number);
+			int64_t value;
+			if (Z_TYPE_P(number) == IS_DOUBLE) {
+				value = (int64_t)Z_DVAL_P(number);
+			} else {
+				convert_to_long(number);
+				value = Z_LVAL_P(number);
+			}

But according to that diff, the cast already existed (in a ternary instead of if/else). So is the error only showing up because this previously wasn’t hit by a test?

lucaswerkmeister · 2023-10-13T19:42:52Z

Hm, I think this comment in the test is referring to the same thing, actually:

# Also, casting from double to int64 when the int64 range
# is exceeded results in an implementation-defined value.

Girgias · 2023-10-14T18:36:39Z

ext/intl/formatter/formatter_format.c

+				convert_to_long(number);
+				value = Z_LVAL_P(number);


This will cast non-numeric strings to 0, which is not what should happen.

Same as the other cases using convert_to_long() or convert_to_double()

Okay, what should happen instead?

Or to put the question slightly differently: I’m not very familiar with the php-src codebase, and couldn’t even find documentation on what exactly convert_to_long() does – if there’s another paradigm for what the code should do with non-numeric strings, such as throwing an error, can you point me to another place in PHP that uses this paradigm, so I can see how it’s implemented there?

Please use the zval_try_get_long() function. Convert changes the zval in place and does a forced (int) cast.

This is a problem for non-numeric strings, and floats that have a fractional part.

I switched it to zval_try_get_long() (in a separate commit), but it seems to introduce a lot of deprecation warnings that are currently causing tests to fail. Should I add the deprecations to the expected test output, or should the deprecations not happen after all?

Also added a throw for cases where the INT32 type is used with numbers that exceed 32 bits of precision, but I’m not sure whether that’s what you had in mind or not. Happy to change this part.

convert_to_double() not converted, because I couldn’t find a zval_try_get_double() or equivalent function.

ext/intl/formatter/formatter_format.c

Girgias

I'm not sure adding a ZPP modifier for one (/two) case is totally worth it, as the logic inside of it doesn't even really utilize the benefit of it as values may still be cast around within the function body.

In any case, Stringable objects need to be handle properly.

Zend/zend_API.h

Girgias · 2023-10-22T14:59:37Z

ext/intl/formatter/formatter_format.c

+				convert_to_long(number);
+				value = Z_LVAL_P(number);


Please use the zval_try_get_long() function. Convert changes the zval in place and does a forced (int) cast.

This is a problem for non-numeric strings, and floats that have a fractional part.

Girgias · 2023-10-22T15:00:48Z

ext/intl/formatter/formatter_format.c

+			}
+			INTL_METHOD_CHECK_STATUS( nfo, "Number formatting failed" );
+			break;
+
 		case FORMAT_TYPE_CURRENCY:
 			if (getThis()) {


Aside Nit: this could now be if (object)

I tried it, but it caused several test failures (seemingly object was still null when I wouldn’t have expected it to be).

Girgias · 2023-10-22T15:05:38Z

ext/intl/formatter/formatter_format.c

+			if (!try_convert_to_string(number)) {
+				RETURN_THROWS();
+			}


This could be simplified as at this point the value should be int|float|string and integers and floats always have string representations.

ext/intl/formatter/formatter_format.c

ext/intl/tests/bug48227.phpt

ext/intl/tests/bug76093.phpt

To improve precision handling in cases where the number doesn’t fit into a 32-bit or 64-bit integer. A float that doesn’t fit into a long will produce a deprecation warning; for NumberFormatter::TYPE_INT32, a number that doesn’t fit into 32 bits will additionally produce a ValueError. (This is inconsistent, but I’m not sure how to do it better.)

lucaswerkmeister · 2023-11-26T15:07:38Z

Rebased and updated – reverted the argument parsing to zend_parse_method_parameters() style (with z instead of n, and manual zend_argument_type_error() for unexpected argument type), and used zval_try_get_long(). Stringable handling still TBD.

ext/intl/tests/bug48227.phpt

ext/intl/formatter/formatter_format.c

lucaswerkmeister · 2023-11-27T20:16:26Z

(I’m pretty sure I force-with-lease-pushed a new version that squashed the WIP into two commits again and also added Stringable support, but it’s not showing up in GitHub yet. Hopefully that’ll resolve itself. It resolved itself.)

Passing the argument to NumberFormat::format() as a number loses precision if the value can not be represented precisely as a double or long integer. The icu library provides a "decimal number" type that avoids the loss of prevision when the value is passed as a string. Add a new FORMAT_TYPE_DECIMAL to explicitly request the argument be converted to a string and then passed to icu that way. Co-authored-by: Gina Peter Banyard <[email protected]>

lucaswerkmeister requested review from dstogov and kocsismate as code owners October 13, 2023 17:58

github-actions bot added Category: Engine Extension: intl labels Oct 13, 2023

lucaswerkmeister mentioned this pull request Oct 13, 2023

Fix #76093: Format strings w/o loss of precision w/ FORMAT_TYPE_DECIMAL #9909

Closed

Girgias reviewed Oct 14, 2023

View reviewed changes

mvorisek reviewed Oct 21, 2023

View reviewed changes

ext/intl/formatter/formatter_format.c Show resolved Hide resolved

lucaswerkmeister force-pushed the bug76093 branch from f2175dc to 93f26bc Compare October 22, 2023 14:30

Girgias requested changes Oct 22, 2023

View reviewed changes

lucaswerkmeister force-pushed the bug76093 branch from 93f26bc to 8e6e10f Compare November 26, 2023 15:01

github-actions bot removed the Category: Engine label Nov 26, 2023

Girgias reviewed Nov 27, 2023

View reviewed changes

lucaswerkmeister force-pushed the bug76093 branch from e12cf31 to 4334911 Compare November 27, 2023 20:26

lucaswerkmeister force-pushed the bug76093 branch from 4334911 to 18b0136 Compare November 28, 2023 20:40

lucaswerkmeister requested a review from devnexen as a code owner April 15, 2024 15:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix #76093: Format strings w/o loss of precision w/ FORMAT_TYPE_DECIMAL #12432

Fix #76093: Format strings w/o loss of precision w/ FORMAT_TYPE_DECIMAL #12432

lucaswerkmeister commented Oct 13, 2023

lucaswerkmeister commented Oct 13, 2023

lucaswerkmeister commented Oct 13, 2023

Girgias Oct 14, 2023

lucaswerkmeister Oct 15, 2023

lucaswerkmeister Oct 22, 2023

Girgias Oct 22, 2023

lucaswerkmeister Nov 26, 2023

Girgias left a comment

Girgias Oct 22, 2023

Girgias Oct 22, 2023

lucaswerkmeister Nov 26, 2023

Girgias Oct 22, 2023

lucaswerkmeister commented Nov 26, 2023

lucaswerkmeister commented Nov 27, 2023 •

edited

Loading

Fix #76093: Format strings w/o loss of precision w/ FORMAT_TYPE_DECIMAL #12432

Are you sure you want to change the base?

Fix #76093: Format strings w/o loss of precision w/ FORMAT_TYPE_DECIMAL #12432

Conversation

lucaswerkmeister commented Oct 13, 2023

lucaswerkmeister commented Oct 13, 2023

lucaswerkmeister commented Oct 13, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Girgias left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lucaswerkmeister commented Nov 26, 2023

lucaswerkmeister commented Nov 27, 2023 • edited Loading

lucaswerkmeister commented Nov 27, 2023 •

edited

Loading