Refactor stacking functions, add dstack and column_stack #624

HarshvirSandhu · 2024-02-04T07:18:10Z

Description

Add dstack and column_stack for stacking tensors.
Refactor function names
- horizontal_stack to hstack
- vertical_stack to vstack

Related Issue

Closes Add numpy-like helper hstack #585

Type of change

for more information, see https://pre-commit.ci

codecov-commenter · 2024-02-04T07:42:54Z

Codecov Report

Attention: 5 lines in your changes are missing coverage. Please review.

Comparison is base (082081a) 80.80% compared to head (d160f9f) 80.82%.
Report is 13 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #624      +/-   ##
==========================================
+ Coverage   80.80%   80.82%   +0.02%     
==========================================
  Files         162      162              
  Lines       46743    46765      +22     
  Branches    11419    11423       +4     
==========================================
+ Hits        37770    37798      +28     
+ Misses       6731     6721      -10     
- Partials     2242     2246       +4

Files	Coverage Δ
pytensor/tensor/basic.py	`88.17% <82.14%> (-0.15%)`	⬇️

... and 1 file with indirect coverage changes

pytensor/tensor/basic.py

for more information, see https://pre-commit.ci

ricardoV94 · 2024-02-06T07:30:01Z

pytensor/tensor/basic.py


    if len(args) < 2:
        raise ValueError("Too few arguments")

    _args = []
    for arg in args:
        _arg = as_tensor_variable(arg)
-        if _arg.type.ndim != 2:


Can you add a test for the special behavior with 1d inputs (and mixed 1d and 2d) to make sure it matches numpy?

jessegrabowski · 2024-02-06T08:59:17Z

pytensor/tensor/basic.py

@@ -2758,29 +2758,21 @@ def concatenate(tensor_list, axis=0):
    return join(axis, *tensor_list)


-def horizontal_stack(*args):
+def hstack(*args):


Since we're leaving horizontal_stack with a FutureWarning, I suggest that the signature of hstack should match np.hstack. It takes a single sequence of matrices as input, rather than *args*

(ditto vstack and dstack)

@jessegrabowski
The following are input arguments of np.hstack

tup : sequence of ndarrays The arrays must have the same shape along all but the second axis, except 1-D arrays which can be any length. dtype : str or dtype If provided, the destination array will have this dtype. Cannot be provided together with `out`. .. versionadded:: 1.24 casting : {'no', 'equiv', 'safe', 'same_kind', 'unsafe'}, optional Controls what kind of data casting may occur. Defaults to 'same_kind'.

Which of these needs to be implemented?
dtype and casting are not present in pt.concatenate

I don't think we need to worry about dtype or casting, but the function should expect a sequence of tensors, rather than *args

jessegrabowski · 2024-02-06T09:05:19Z

pytensor/tensor/basic.py

-    # functions have potentially confusing/incoherent behavior (try them on 1D
-    # arrays). If this is fixed in a future version of Numpy, it may be worth
-    # trying to get closer to Numpy's way of doing things. In the meantime,
-    # better keep different names to emphasize the implementation divergences.

    if len(args) < 2:


Numpy also doesn't raise on a single input. As the (deleted) comment says, it has somewhat unexpected behaviors:

np.hstack([flat_array]) == flat_array

np.vstack([flat_array]) == flat_array[None]

np.dstack([flat_array]) == flat_array[:, None]

If we're going to match the names I think it's important we dig into the internal logic of these three functions a little bit and try to replicate it. I don't think anyone's code depends on these behaviors, but they are idiosyncrasies of how they are doing the insert_dims and concatenate Op.

@jessegrabowski
np.dstack uses np.atleast_3d to make sure the array is at least 3-dimensional
However, np.atleast_3d behaves differently than pt.atleast_3d, see below code for reference

a = np.array([1, 2, 3, 4]) print(a.shape) # (4,) print(pt.atleast_3d(a).eval().shape) # (1, 1, 4) print(np.atleast_3d([a]).shape) # (1, 4, 1)

Should pt.atleast_3d be changed to behave like np.atleast_3d?

Yeah I have come across this behavior and it's annoying. We should behave like numpy but we need a transition strategy.

I suggest we had a kwarg "expand_left” that for now is None and behaves like before but with a warning on the cases where the behavior will change

det atleast_3d(*xs, expand_left: bool = True): if expand_left: if any(x.type.ndim < 3 for x in xs): warnings.warning("The behavior of at_least3d will change to match that of numpy. If you want to keep the old behavior use atleast_nd. Otherwise set expand_left=False.", FutureWarning) # Act based on expand_left flag ...

pt.atleast_3d uses pt.atleast_Nd
pt.atleast_Nd already has an arg left which is true by default.

An example:

a = np.array([1, 2, 3, 4]) print(pt.atleast_3d([a], left=True).eval().shape) # (1, 1, 4) print(pt.atleast_3d([a], left=False).eval().shape) # (1, 4, 1) print(np.atleast_3d([a]).shape) # (1, 4, 1)

To match numpy behaviour the array must be inside a list

Nice, so we should add the same warning around the left key

I will use this property only for pt.dstack
What warning should be added?

jessegrabowski · 2024-02-06T09:06:05Z

pytensor/tensor/basic.py

+        "horizontal_stack was renamed to hstack and will be removed in a future release",
+        FutureWarning,
+    )
+    return hstack(*args)


Since the function signatures shouldn't be the same, I would just leave horizontal_stack as-is with the depreciation warning rather than aliasing to hstack

jessegrabowski · 2024-02-06T09:08:46Z

tests/tensor/test_basic.py


        want = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [9, 8, 7]])
        out = self.eval_outputs_and_check_join([s])
        assert (out == want).all()

-    def test_join_matrix1_using_horizontal_stack(self):
+    def test_join_matrix1_using_dstack(self):


These tests are a bit too verbose IMO, they can be parameterized by shape (1d, 2d, 3d), function (hstack, vstack, dstack), and n_inputs (1,2,3) and just use np.testing.assert_allclose to make sure the pytensor version matches the numpy version

…k-helper-functions solved merge conflict

HarshvirSandhu · 2024-02-18T05:55:47Z

@ricardoV94 @jessegrabowski
Is there anything else needed for this PR?

ricardoV94

Small suggestion for the tests, otherwise looks good

ricardoV94 · 2024-03-04T11:07:21Z

tests/tensor/test_basic.py

+            dtype="float32",
+        )
+        out = self.eval_outputs_and_check_join([s])
+        assert (out == want).all()


This will also check the shapes match, which is important for these changes

Suggested change

assert (out == want).all()

np.testing.assert_array_equal(out, want, strict=True)

ricardoV94 · 2024-03-04T11:09:19Z

tests/tensor/test_basic.py

+    result = func(arrays)
+    np_result = getattr(np, func.__name__)(arrays)
+
+    assert np.array_equal(result.eval(), np_result)


Suggested change

assert np.array_equal(result.eval(), np_result)

np.testing.assert_array_equal(result.eval(), np_result, strict=True)

ricardoV94 · 2024-03-04T11:11:11Z

tests/tensor/test_basic.py

+@pytest.mark.parametrize("dimension", [1, 2, 3])
+def test_stack_helpers(func, dimension):
+    if dimension == 1:
+        arrays = [np.arange(i * dimension, (i + 1) * dimension) for i in range(3)]


May be more readable with np.random.normal(size=...) for the array values?

HarshvirSandhu · 2024-04-21T18:36:27Z

@ricardoV94
Is there anything else needed for this PR?

jessegrabowski

@HarshvirSandhu @ricardoV94 is this PR ready to merge? I still have a few nitpicks, but I'm approving so it doesn't have to languish here any longer.

A rebase onto main is also needed to resolve merge conflicts.

jessegrabowski · 2024-07-06T16:43:46Z

pytensor/tensor/basic.py

+        return concatenate(arrs, axis=1)
+
+
+def vstack(tup):


The name tup is not clear -- arrays or arrs should be preferred. A typehint would also be useful here.

jessegrabowski · 2024-07-06T16:43:53Z

pytensor/tensor/basic.py

@@ -2758,15 +2758,34 @@ def concatenate(tensor_list, axis=0):
    return join(axis, *tensor_list)


-def horizontal_stack(*args):
+def hstack(tup):


jessegrabowski · 2024-07-06T16:45:04Z

pytensor/tensor/basic.py

+
+
+def vstack(tup):
+    r"""Stack arrays in sequence vertically (row wise)."""


It would be good to use this opportunity to start putting real docstrings on pytensor functions, by adding (at least) a Parameters and Returns section. A See Also section would also be nice.

HarshvirSandhu and others added 2 commits February 4, 2024 12:29

Refactor stacking functions, add dstack and column_stack

5e0abe5

[pre-commit.ci] auto fixes from pre-commit.com hooks

2d7617e

for more information, see https://pre-commit.ci

ricardoV94 reviewed Feb 4, 2024

View reviewed changes

pytensor/tensor/basic.py Show resolved Hide resolved

HarshvirSandhu and others added 5 commits February 4, 2024 21:30

Add FutureWarning

1fc5e23

Add FutureWarning

7c5b094

Add FutureWarning

f98a6af

[pre-commit.ci] auto fixes from pre-commit.com hooks

fe9e2da

for more information, see https://pre-commit.ci

Add horizontal_stack and vertical_stack in __all__

d160f9f

ricardoV94 reviewed Feb 6, 2024

View reviewed changes

ricardoV94 requested a review from jessegrabowski February 6, 2024 07:30

jessegrabowski requested changes Feb 6, 2024

View reviewed changes

HarshvirSandhu added 5 commits February 11, 2024 10:31

erge branch 'main' of https://github.com/pymc-devs/pytensor into stac…

528e79e

…k-helper-functions solved merge conflict

Implement stack helper functions to align with NumPy behavior

2c4355a

Solved linting issues

b860369

Solved linting issues

3ba05ae

Fix ruff format

2167178

HarshvirSandhu requested review from ricardoV94 and jessegrabowski February 12, 2024 10:02

Remove old tests

4b5210a

ricardoV94 reviewed Mar 4, 2024

View reviewed changes

HarshvirSandhu added 2 commits March 4, 2024 18:11

Modify tests

6a962ab

Include warnings in tests

e4d9087

HarshvirSandhu requested a review from ricardoV94 March 4, 2024 13:35

jessegrabowski approved these changes Jul 6, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor stacking functions, add dstack and column_stack #624

Refactor stacking functions, add dstack and column_stack #624

HarshvirSandhu commented Feb 4, 2024

codecov-commenter commented Feb 4, 2024 •

edited

Loading

ricardoV94 Feb 6, 2024

jessegrabowski Feb 6, 2024

HarshvirSandhu Feb 7, 2024 •

edited

Loading

jessegrabowski Feb 8, 2024

jessegrabowski Feb 6, 2024

HarshvirSandhu Feb 10, 2024

ricardoV94 Feb 10, 2024 •

edited

Loading

HarshvirSandhu Feb 10, 2024

ricardoV94 Feb 10, 2024 •

edited

Loading

HarshvirSandhu Feb 11, 2024

jessegrabowski Feb 6, 2024

jessegrabowski Feb 6, 2024

HarshvirSandhu commented Feb 18, 2024

ricardoV94 left a comment

ricardoV94 Mar 4, 2024

ricardoV94 Mar 4, 2024

ricardoV94 Mar 4, 2024

HarshvirSandhu commented Apr 21, 2024

jessegrabowski left a comment

jessegrabowski Jul 6, 2024

jessegrabowski Jul 6, 2024

jessegrabowski Jul 6, 2024

	assert (out == want).all()
	np.testing.assert_array_equal(out, want, strict=True)

	assert np.array_equal(result.eval(), np_result)
	np.testing.assert_array_equal(result.eval(), np_result, strict=True)



		def vstack(tup):
		r"""Stack arrays in sequence vertically (row wise)."""

Refactor stacking functions, add dstack and column_stack #624

Are you sure you want to change the base?

Refactor stacking functions, add dstack and column_stack #624

Conversation

HarshvirSandhu commented Feb 4, 2024

Description

Related Issue

Type of change

codecov-commenter commented Feb 4, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HarshvirSandhu Feb 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ricardoV94 Feb 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ricardoV94 Feb 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HarshvirSandhu commented Feb 18, 2024

ricardoV94 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HarshvirSandhu commented Apr 21, 2024

jessegrabowski left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-commenter commented Feb 4, 2024 •

edited

Loading

HarshvirSandhu Feb 7, 2024 •

edited

Loading

ricardoV94 Feb 10, 2024 •

edited

Loading

ricardoV94 Feb 10, 2024 •

edited

Loading