Skip to content

REF: reshape.concat operate on arrays, not SingleBlockManagers #33125

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Apr 10, 2020

Conversation

jbrockmendel
Copy link
Member

cc @TomAugspurger is there a better way to handle the assert_series_equal patch for the PandasArray tests?

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain the issue your change to assert_series_equal is working around?

@@ -436,6 +436,11 @@ def skew(self, axis=None, dtype=None, out=None, keepdims=False, skipna=True):
# ------------------------------------------------------------------------
# Additional Methods

def astype(self, dtype, copy=True):
if dtype is self.dtype:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are PandasDtype instances singletons? I wouldn't expect this to pass for PandasType('int') is PandasDtype('int')

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a singleton, no

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this require action? AFAICT the fact that this is needed is an artifact of the way we are patching PandasArray._typ in these tests

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so. Otherwise the implementation is incorrect since PandasArray.astype(PandasDtype("int64")) will raise

In [6]: arr = pd.core.arrays.PandasArray(np.array([1, 2]))

In [7]: arr.astype(pd.core.arrays.numpy_.PandasDtype("float64"))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-eb4babcd3e70> in <module>
----> 1 arr.astype(pd.core.arrays.numpy_.PandasDtype("float64"))

~/sandbox/pandas/pandas/core/arrays/numpy_.py in astype(self, dtype, copy)
    440         if dtype is self.dtype:
    441             return self.copy() if copy else self
--> 442         return super().astype(dtype, copy=copy)
    443
    444     def to_numpy(

~/sandbox/pandas/pandas/core/arrays/base.py in astype(self, dtype, copy)
    432             NumPy ndarray with 'dtype' for its dtype.
    433         """
--> 434         return np.array(self, dtype=dtype, copy=copy)
    435
    436     def isna(self) -> ArrayLike:

TypeError: data type not understood

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, but that isnt any different from the status quo

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the difference is that we're implementing this now, but the implementation isn't correct.

If you don't see a straightforward was to resolve this now, then feel free to open a followup issue and merge.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is_extension_dtype(other) and other.dtype == self.dtype might suffice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To explain a bit more (at risk of over-explaining) a future reader of that exception is likely to be more confused about why this doesn't work than we are now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to revert the astype change and xfail the affected tests

@jbrockmendel
Copy link
Member Author

Can you explain the issue your change to assert_series_equal is working around?

Without the patched assert_series_equal, we get

_________________________________________________________________________ TestGetitem.test_loc_iloc_frame_single_dtype[object] _________________________________________________________________________

self = <pandas.tests.extension.test_numpy.TestGetitem object at 0x127f29f90>
data = <PandasArray>
[ (0,),  (1,),  (2,),  (3,),  (4,),  (5,),  (6,),  (7,),  (8,),  (9,), (10,),
 (11,), (12,), (13,), (14,...(87,),
 (88,), (89,), (90,), (91,), (92,), (93,), (94,), (95,), (96,), (97,), (98,),
 (99,)]
Length: 100, dtype: object

    def test_loc_iloc_frame_single_dtype(self, data):
        # GH#27110 bug in ExtensionBlock.iget caused df.iloc[n] to incorrectly
        #  return a scalar
        df = pd.DataFrame({"A": data})
        expected = pd.Series([data[2]], index=["A"], name=2, dtype=data.dtype)
    
        result = df.loc[2]
>       self.assert_series_equal(result, expected)

pandas/tests/extension/base/getitem.py:81: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

cls = <class 'pandas.tests.extension.test_numpy.TestGetitem'>, left = A    (2,)
Name: 2, dtype: object, right = A    (2,)
Name: 2, dtype: object, args = (), kwargs = {}

    @classmethod
    def assert_series_equal(cls, left, right, *args, **kwargs):
>       return tm.assert_series_equal(left, right, *args, **kwargs)
E       AssertionError: Attributes of Series are different
E       
E       Attribute "dtype" are different
E       [left]:  object
E       [right]: object

pandas/tests/extension/base/base.py:13: AssertionError

test_loc_iloc_frame_single_dtype is currently xfailed. The edit to PandasArray.astype was because this had caused one of the cases of test_loc_iloc_frame_single_dtype to XPASS. Removing the xfail led to 9 failures, 8 of which were fixed by the edit to PandasArray.astype.

I think the assert_series_equal issue has to do with the way we are patching PandasArray._typ in these tests, but i havent chased it down to be sure.

@jbrockmendel
Copy link
Member Author

Updated to just xfail more selectively and avoid patching assert_series_equal

@jbrockmendel jbrockmendel added the Refactor Internal refactoring of code label Apr 1, 2020
@TomAugspurger
Copy link
Contributor

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. I think @TomAugspurger has some remaining comments

@jbrockmendel jbrockmendel added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Apr 9, 2020
@jreback jreback added this to the 1.1 milestone Apr 10, 2020
@jreback jreback merged commit 496c982 into pandas-dev:master Apr 10, 2020
@jbrockmendel jbrockmendel deleted the no-data-concat branch April 10, 2020 17:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Refactor Internal refactoring of code Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants