Skip to content

On number parsing #86

Open
Open
@glennsl

Description

@glennsl

This touches a bit on the discussion in #83, but covers more broadly the current options for parsing numbers in JavaScript, to try to inform which semantics we should expose, more so than just how we should type and organize the JS API.

Currently, I believe the only option for parsing numbers are parseInt/parseFloat. Either directly through Float.parseFloat and Float.parseInt, but also as the underlying API used in Float.fromString andInt.fromString. While there unfortunately aren't any good options, in my opinion parseInt and parseFloat is the worst of all the bad options. For the sake of easy comparison and to facilitate efficient discussion, I've outlined the pros and cons of the options i know of below.

Note that I've only covered the differences between the options. Common peculiarities to all of them is that:

  • They accept scientific notation, but not optionally so it can't be turned off.
  • They don't distinguish parse failure from the actual valid value of NaN.
  • They don't support group separators.
  • They don't provide locale-aware parsing.

parseInt/parseFloat

The core of the prom with these functions is described by the following quote from MDN:

parseFloat() picks the longest substring starting from the beginning that generates a valid number literal. If it encounters an invalid character, it returns the number represented up to that point, ignoring the invalid character and all characters following it.

While this may not seems like such a big issue, since it just ignores certain kinds of "mistakes", keep in mind that there isn't just one number format used across the world. These functions will only parse a very simple number format, close to but not entirely the same as JavaScript number literals. And iif it ecnounters anything that doesn't fit that, it will just ignore the rest. For example, if any kind of group separator is used, that and everything that follows will be silently ignored. The same goes for using a decimal separator other than ..

In short, these invocations will all return 15:

parseInt("15,123");
parseInt("15 123");
parseInt("15 * 3");
parseInt("15px");

Pros

  • Simple functions that are easy to bind to.
  • Zero-cost.
  • Accepts optional radix argument.

Cons

  • Will accept any string that starts with something that can be parsed as a number.

Number coercion

There are a number of ways to trigger number coercion, such as using any numeric operator on them. E.g. if value is a string, +value will return a number.

This improves on parseInt and parseFloat most notably by rejecting input that isn't wholly parsable, such as all the examples in the section above. It does come with a few extra cons though, but they are for the most part possible to work around.

Pros

  • Only accepts wholly parsable strings.

Cons

  • No explicit radix option, but does parse numbers correctly when prefixed with 0x etc.
  • No int-specific variant. And while it's easy to create a wrapper function to reject floats that are not whole numbers, it's not possible to reject string such as "123.0" without a pre-processing step.
  • Empty or whitespace-only strings are converted to 0. (can easily be worked around with a wrapper function though).

Number constructor

This works very similar to number coercion, because that is the conversion mechanism actually used. But it's slightly slower. And when used as a constructor (with new) it will create a Number object rather than a primitive.

Pros

  • Same as number coercion
  • Easy to bind to, though doesn't transfer as naturally to ReScript

Cons

  • Same as number coercion
  • Slightly slower

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions