Skip to content

Refactor stacking functions, add dstack and column_stack #624

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

HarshvirSandhu
Copy link
Contributor

Description

  1. Add dstack and column_stack for stacking tensors.
  2. Refactor function names
    • horizontal_stack to hstack
    • vertical_stack to vstack

Related Issue

Type of change

  • New feature / enhancement
  • Bug fix
  • Documentation
  • Maintenance
  • Other (please specify):

@codecov-commenter
Copy link

codecov-commenter commented Feb 4, 2024

Codecov Report

Attention: 5 lines in your changes are missing coverage. Please review.

Comparison is base (082081a) 80.80% compared to head (d160f9f) 80.82%.
Report is 13 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #624      +/-   ##
==========================================
+ Coverage   80.80%   80.82%   +0.02%     
==========================================
  Files         162      162              
  Lines       46743    46765      +22     
  Branches    11419    11423       +4     
==========================================
+ Hits        37770    37798      +28     
+ Misses       6731     6721      -10     
- Partials     2242     2246       +4     
Files Coverage Δ
pytensor/tensor/basic.py 88.17% <82.14%> (-0.15%) ⬇️

... and 1 file with indirect coverage changes


if len(args) < 2:
raise ValueError("Too few arguments")

_args = []
for arg in args:
_arg = as_tensor_variable(arg)
if _arg.type.ndim != 2:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test for the special behavior with 1d inputs (and mixed 1d and 2d) to make sure it matches numpy?

@@ -2758,29 +2758,21 @@ def concatenate(tensor_list, axis=0):
return join(axis, *tensor_list)


def horizontal_stack(*args):
def hstack(*args):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're leaving horizontal_stack with a FutureWarning, I suggest that the signature of hstack should match np.hstack. It takes a single sequence of matrices as input, rather than *args*

(ditto vstack and dstack)

Copy link
Contributor Author

@HarshvirSandhu HarshvirSandhu Feb 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jessegrabowski
The following are input arguments of np.hstack

tup : sequence of ndarrays
        The arrays must have the same shape along all but the second axis,
        except 1-D arrays which can be any length.

    dtype : str or dtype
        If provided, the destination array will have this dtype. Cannot be
        provided together with `out`.

    .. versionadded:: 1.24

    casting : {'no', 'equiv', 'safe', 'same_kind', 'unsafe'}, optional
        Controls what kind of data casting may occur. Defaults to 'same_kind'.

Which of these needs to be implemented?
dtype and casting are not present in pt.concatenate

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to worry about dtype or casting, but the function should expect a sequence of tensors, rather than *args

# functions have potentially confusing/incoherent behavior (try them on 1D
# arrays). If this is fixed in a future version of Numpy, it may be worth
# trying to get closer to Numpy's way of doing things. In the meantime,
# better keep different names to emphasize the implementation divergences.

if len(args) < 2:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Numpy also doesn't raise on a single input. As the (deleted) comment says, it has somewhat unexpected behaviors:

  • np.hstack([flat_array]) == flat_array
  • np.vstack([flat_array]) == flat_array[None]
  • np.dstack([flat_array]) == flat_array[:, None]

If we're going to match the names I think it's important we dig into the internal logic of these three functions a little bit and try to replicate it. I don't think anyone's code depends on these behaviors, but they are idiosyncrasies of how they are doing the insert_dims and concatenate Op.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jessegrabowski
np.dstack uses np.atleast_3d to make sure the array is at least 3-dimensional
However, np.atleast_3d behaves differently than pt.atleast_3d, see below code for reference

a = np.array([1, 2, 3, 4])

print(a.shape) # (4,)
print(pt.atleast_3d(a).eval().shape) # (1, 1, 4)
print(np.atleast_3d([a]).shape) # (1, 4, 1)

Should pt.atleast_3d be changed to behave like np.atleast_3d?

Copy link
Member

@ricardoV94 ricardoV94 Feb 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I have come across this behavior and it's annoying. We should behave like numpy but we need a transition strategy.

I suggest we had a kwarg "expand_left” that for now is None and behaves like before but with a warning on the cases where the behavior will change

det atleast_3d(*xs, expand_left: bool = True):
  if expand_left:
    if any(x.type.ndim < 3 for x in xs):
      warnings.warning("The behavior of at_least3d will change to match that of numpy. If you want to keep the old behavior use atleast_nd. Otherwise set expand_left=False.", FutureWarning)
  
  # Act based on expand_left flag
  ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pt.atleast_3d uses pt.atleast_Nd
pt.atleast_Nd already has an arg left which is true by default.

An example:

a = np.array([1, 2, 3, 4])

print(pt.atleast_3d([a], left=True).eval().shape) # (1, 1, 4)
print(pt.atleast_3d([a], left=False).eval().shape) # (1, 4, 1)
print(np.atleast_3d([a]).shape) # (1, 4, 1)

To match numpy behaviour the array must be inside a list

Copy link
Member

@ricardoV94 ricardoV94 Feb 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, so we should add the same warning around the left key

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will use this property only for pt.dstack
What warning should be added?

"horizontal_stack was renamed to hstack and will be removed in a future release",
FutureWarning,
)
return hstack(*args)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the function signatures shouldn't be the same, I would just leave horizontal_stack as-is with the depreciation warning rather than aliasing to hstack


want = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [9, 8, 7]])
out = self.eval_outputs_and_check_join([s])
assert (out == want).all()

def test_join_matrix1_using_horizontal_stack(self):
def test_join_matrix1_using_dstack(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are a bit too verbose IMO, they can be parameterized by shape (1d, 2d, 3d), function (hstack, vstack, dstack), and n_inputs (1,2,3) and just use np.testing.assert_allclose to make sure the pytensor version matches the numpy version

@HarshvirSandhu
Copy link
Contributor Author

@ricardoV94 @jessegrabowski
Is there anything else needed for this PR?

Copy link
Member

@ricardoV94 ricardoV94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small suggestion for the tests, otherwise looks good

dtype="float32",
)
out = self.eval_outputs_and_check_join([s])
assert (out == want).all()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will also check the shapes match, which is important for these changes

Suggested change
assert (out == want).all()
np.testing.assert_array_equal(out, want, strict=True)

result = func(arrays)
np_result = getattr(np, func.__name__)(arrays)

assert np.array_equal(result.eval(), np_result)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert np.array_equal(result.eval(), np_result)
np.testing.assert_array_equal(result.eval(), np_result, strict=True)

@pytest.mark.parametrize("dimension", [1, 2, 3])
def test_stack_helpers(func, dimension):
if dimension == 1:
arrays = [np.arange(i * dimension, (i + 1) * dimension) for i in range(3)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be more readable with np.random.normal(size=...) for the array values?

@HarshvirSandhu
Copy link
Contributor Author

@ricardoV94
Is there anything else needed for this PR?

Copy link
Member

@jessegrabowski jessegrabowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HarshvirSandhu @ricardoV94 is this PR ready to merge? I still have a few nitpicks, but I'm approving so it doesn't have to languish here any longer.

A rebase onto main is also needed to resolve merge conflicts.

return concatenate(arrs, axis=1)


def vstack(tup):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name tup is not clear -- arrays or arrs should be preferred. A typehint would also be useful here.

@@ -2758,15 +2758,34 @@ def concatenate(tensor_list, axis=0):
return join(axis, *tensor_list)


def horizontal_stack(*args):
def hstack(tup):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See below



def vstack(tup):
r"""Stack arrays in sequence vertically (row wise)."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to use this opportunity to start putting real docstrings on pytensor functions, by adding (at least) a Parameters and Returns section. A See Also section would also be nice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add numpy-like helper hstack
4 participants