Skip to content

[maskedtensor] Add missing nan ops tutorial #2046

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions beginner_source/maskedtensor_missing_nan_ops.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
Implementing missing torch.nan* operators
-----------------------------------------

In the above issue, there is a request to add additional operators to cover the various `torch.nan*` applications,
such as ``torch.nanmax``, ``torch.nanmin``, etc.

In general, these problems lend themselves more naturally to masked semantics, so instead of introducing additional
operators, we propose using MaskedTensors instead. Since
`nanmean has already landed <https://github.com/pytorch/pytorch/issues/21987>`__, we can use it as a comparison point:

>>> x = torch.arange(16).float()
>>> y = x * x.fmod(4)
>>> y = y.masked_fill(y ==0, float('nan'))
>>> y
tensor([nan, 1., 4., 9., nan, 5., 12., 21., nan, 9., 20., 33., nan, 13.,
28., 45.])
>>> y.nanmean()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be useful to inline some comments on what you're trying to show here

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably outside the scope of this review, but why do we have nanmean() as an API instead of the pandas-style mean(..., skipna=True)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure..

tensor(16.6667)
>>> torch.mean(masked_tensor(y, ~torch.isnan(y)))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the goal to have sequential tutorials or keep each self contained? If the latter, can you add the relevant imports up top.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tutorial will be merged with overview!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you replace masked_tensor with MaskedTensor in pytorch/pytorch@5e9c26c ? If so, can you update the tutorial here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

masked_tensor is the preferred function to use!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have we considered API sugar:

(1) instantiating a MT from a Tensor assuming na is the mask

>>> MaskedTensor(y)
MaskedTensor(
  [      --,   1.0000,   4.0000,   9.0000,       --,   5.0000,  12.0000,  21.0000,       --,   9.0000,  20.0000,  33.0000,       --,  13.0000,  28.0000,  45.0000]
)

(2) instantiating a MT where user just states the mask value instead of passing the mask

y = MaskedTensor(y, mask_value=float(1))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet! I think an unspecified mask could also be an indication that they would like all True values for the mask, so that could be a third option as well.

Another one would be to allow for just MaskedTensor(y) if y is a sparse tensor because then the mask is "implied".

All been discussed and will take note to add in :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where are you tracking feature requests?

MaskedTensor( 16.6667, True)

:class:`MaskedTensor` can also support reductions when the data is fully masked out, which is equivalent
to the case above when the data Tensor is completely ``nan``. ``nanmean`` would return ``nan``
(an ambiguous return value), while MaskedTensor would more accurately indicate a masked out result.

>>> x = torch.empty(16).fill_(float('nan'))
>>> x
tensor([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan])
>>> torch.nanmean(x)
tensor(nan)
>>> torch.mean(masked_tensor(x, ~torch.isnan(x)))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment above on MaskedTensor

MaskedTensor(--, False)