Description
Regarding the numpy floating point precision and that PlotlyJSONEncoder
always casts those to float64 due to using tolist()
...
This had always bugged me, as it resulted in much larger exports (i.e. html / ipynb file sizes) than necessary (when float16 or float32 is sufficient) and affected not only coordinate data, but also marker sizes, meta info, etc.
Just in case the plotly.py devs or others are interested: I had found a way to avoid this number inflation by modifying (& monkey patching) the encode_as_list
method:
@staticmethod
def encode_as_list_patch(obj):
"""Attempt to use `tolist` method to convert to normal Python list."""
if hasattr(obj, "tolist"):
numpy = get_module("numpy")
try:
if isinstance(obj, numpy.ndarray) \
and obj.dtype == numpy.float32 or obj.dtype == numpy.float16 \
and obj.flags.contiguous:
return [float('%s' % x) for x in obj]
except AttributeError:
raise NotEncodable
return obj.tolist()
else:
raise NotEncodable
It's about 30-50x slower than .tolist()
, but - being in the order of a few μs - still much faster than the json encoding, with the benefit of ~3x smaller exports.
I always wanted to report this, and this PR revived the topic. Could this be relevant for a new issue (especially since orjson will not become the default)?
FYI: for reference, a quick search revealed that a patch of encode_as_list was already suggested before: #1842 (comment), in the context of treating inf & NaN, which got brought up again in #2880 (comment).
Originally posted by @mherrmann3 in #2955 (comment)