-
Notifications
You must be signed in to change notification settings - Fork 129
Refactor stacking functions, add dstack and column_stack #624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Refactor stacking functions, add dstack and column_stack #624
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #624 +/- ##
==========================================
+ Coverage 80.80% 80.82% +0.02%
==========================================
Files 162 162
Lines 46743 46765 +22
Branches 11419 11423 +4
==========================================
+ Hits 37770 37798 +28
+ Misses 6731 6721 -10
- Partials 2242 2246 +4
|
pytensor/tensor/basic.py
Outdated
|
||
if len(args) < 2: | ||
raise ValueError("Too few arguments") | ||
|
||
_args = [] | ||
for arg in args: | ||
_arg = as_tensor_variable(arg) | ||
if _arg.type.ndim != 2: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test for the special behavior with 1d inputs (and mixed 1d and 2d) to make sure it matches numpy?
pytensor/tensor/basic.py
Outdated
@@ -2758,29 +2758,21 @@ def concatenate(tensor_list, axis=0): | |||
return join(axis, *tensor_list) | |||
|
|||
|
|||
def horizontal_stack(*args): | |||
def hstack(*args): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're leaving horizontal_stack
with a FutureWarning
, I suggest that the signature of hstack
should match np.hstack
. It takes a single sequence of matrices as input, rather than *args*
(ditto vstack
and dstack
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jessegrabowski
The following are input arguments of np.hstack
tup : sequence of ndarrays
The arrays must have the same shape along all but the second axis,
except 1-D arrays which can be any length.
dtype : str or dtype
If provided, the destination array will have this dtype. Cannot be
provided together with `out`.
.. versionadded:: 1.24
casting : {'no', 'equiv', 'safe', 'same_kind', 'unsafe'}, optional
Controls what kind of data casting may occur. Defaults to 'same_kind'.
Which of these needs to be implemented?
dtype
and casting
are not present in pt.concatenate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to worry about dtype
or casting
, but the function should expect a sequence of tensors, rather than *args
pytensor/tensor/basic.py
Outdated
# functions have potentially confusing/incoherent behavior (try them on 1D | ||
# arrays). If this is fixed in a future version of Numpy, it may be worth | ||
# trying to get closer to Numpy's way of doing things. In the meantime, | ||
# better keep different names to emphasize the implementation divergences. | ||
|
||
if len(args) < 2: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Numpy also doesn't raise on a single input. As the (deleted) comment says, it has somewhat unexpected behaviors:
np.hstack([flat_array]) == flat_array
np.vstack([flat_array]) == flat_array[None]
np.dstack([flat_array]) == flat_array[:, None]
If we're going to match the names I think it's important we dig into the internal logic of these three functions a little bit and try to replicate it. I don't think anyone's code depends on these behaviors, but they are idiosyncrasies of how they are doing the insert_dims and concatenate Op
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jessegrabowski
np.dstack uses np.atleast_3d to make sure the array is at least 3-dimensional
However, np.atleast_3d behaves differently than pt.atleast_3d, see below code for reference
a = np.array([1, 2, 3, 4])
print(a.shape) # (4,)
print(pt.atleast_3d(a).eval().shape) # (1, 1, 4)
print(np.atleast_3d([a]).shape) # (1, 4, 1)
Should pt.atleast_3d be changed to behave like np.atleast_3d?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I have come across this behavior and it's annoying. We should behave like numpy but we need a transition strategy.
I suggest we had a kwarg "expand_left” that for now is None
and behaves like before but with a warning on the cases where the behavior will change
det atleast_3d(*xs, expand_left: bool = True):
if expand_left:
if any(x.type.ndim < 3 for x in xs):
warnings.warning("The behavior of at_least3d will change to match that of numpy. If you want to keep the old behavior use atleast_nd. Otherwise set expand_left=False.", FutureWarning)
# Act based on expand_left flag
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pt.atleast_3d uses pt.atleast_Nd
pt.atleast_Nd already has an arg left
which is true by default.
An example:
a = np.array([1, 2, 3, 4])
print(pt.atleast_3d([a], left=True).eval().shape) # (1, 1, 4)
print(pt.atleast_3d([a], left=False).eval().shape) # (1, 4, 1)
print(np.atleast_3d([a]).shape) # (1, 4, 1)
To match numpy behaviour the array must be inside a list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, so we should add the same warning around the left key
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will use this property only for pt.dstack
What warning should be added?
pytensor/tensor/basic.py
Outdated
"horizontal_stack was renamed to hstack and will be removed in a future release", | ||
FutureWarning, | ||
) | ||
return hstack(*args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the function signatures shouldn't be the same, I would just leave horizontal_stack
as-is with the depreciation warning rather than aliasing to hstack
tests/tensor/test_basic.py
Outdated
|
||
want = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [9, 8, 7]]) | ||
out = self.eval_outputs_and_check_join([s]) | ||
assert (out == want).all() | ||
|
||
def test_join_matrix1_using_horizontal_stack(self): | ||
def test_join_matrix1_using_dstack(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests are a bit too verbose IMO, they can be parameterized by shape (1d, 2d, 3d), function (hstack, vstack, dstack), and n_inputs (1,2,3) and just use np.testing.assert_allclose
to make sure the pytensor version matches the numpy version
@ricardoV94 @jessegrabowski |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small suggestion for the tests, otherwise looks good
tests/tensor/test_basic.py
Outdated
dtype="float32", | ||
) | ||
out = self.eval_outputs_and_check_join([s]) | ||
assert (out == want).all() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will also check the shapes match, which is important for these changes
assert (out == want).all() | |
np.testing.assert_array_equal(out, want, strict=True) |
result = func(arrays) | ||
np_result = getattr(np, func.__name__)(arrays) | ||
|
||
assert np.array_equal(result.eval(), np_result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert np.array_equal(result.eval(), np_result) | |
np.testing.assert_array_equal(result.eval(), np_result, strict=True) |
@pytest.mark.parametrize("dimension", [1, 2, 3]) | ||
def test_stack_helpers(func, dimension): | ||
if dimension == 1: | ||
arrays = [np.arange(i * dimension, (i + 1) * dimension) for i in range(3)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be more readable with np.random.normal(size=...)
for the array values?
@ricardoV94 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HarshvirSandhu @ricardoV94 is this PR ready to merge? I still have a few nitpicks, but I'm approving so it doesn't have to languish here any longer.
A rebase onto main is also needed to resolve merge conflicts.
return concatenate(arrs, axis=1) | ||
|
||
|
||
def vstack(tup): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name tup
is not clear -- arrays
or arrs
should be preferred. A typehint would also be useful here.
@@ -2758,15 +2758,34 @@ def concatenate(tensor_list, axis=0): | |||
return join(axis, *tensor_list) | |||
|
|||
|
|||
def horizontal_stack(*args): | |||
def hstack(tup): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See below
|
||
|
||
def vstack(tup): | ||
r"""Stack arrays in sequence vertically (row wise).""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to use this opportunity to start putting real docstrings on pytensor functions, by adding (at least) a Parameters and Returns section. A See Also section would also be nice.
Description
dstack
andcolumn_stack
for stacking tensors.horizontal_stack
tohstack
vertical_stack
tovstack
Related Issue
hstack
#585Type of change