Pandas Tests rely on inconsistent array coercion

In https://github.com/numpy/numpy/pull/14995 I have tried to make numpy consistent with respect to coercing dataframes (and other array-likes which also implement the sequence protocol) to numpy arrays.

With the new PR/behaviour, the `__array__` interface would be fully preferred, and no mixed/inconsistent behaviour with respect to also being a sequence-like (with different behaviour) would occur.

Unfortunately, pandas DataFrames have this behaviour, since they are squence-like. This behaviour kicks in during DataFrame coercion, in the following case:

```
df1 = pd.DataFrame({"a": [1, 2, 3], "b": [3, 4, 5]})
df2 = pd.DataFrame([df1, df1])
```

Where `df2` is currently coerced as a dataframe with dataframes inside. Currently this happens due to the following logic:
```python
        try:
            if is_list_like(values[0]) or hasattr(values[0], 'len'):  # <-- is hit
                # following convert does nothing; `np.array()` than raises Error...
                values = np.array([convert(v) for v in values])
            elif isinstance(values[0], np.ndarray) and values[0].ndim == 0:
                # GH#21861
                values = np.array([convert(v) for v in values])
            else:
                values = convert(values)
        except (ValueError, TypeError):
            values = convert(values)  # <-- Ends up getting called and forces object array.
```

EDIT: addtional code details: `convert` is a thin wrapper around:
```python
def maybe_convert_platform(values):
    """ try to do platform conversion, allow ndarray or list here """

    if isinstance(values, (list, tuple, range)):
        values = construct_1d_object_array_from_listlike(values)
    # more logic
```
This takes the first branch (`values` is a list), which in turn forces a 1-D object array:
```python
def construct_1d_object_array_from_listlike(values):
    # numpy will try to interpret nested lists as further dimensions, hence
    # making a 1D array that contains list-likes is a bit tricky:
    result = np.empty(len(values), dtype='object')
    result[:] = values
    return result
```

because `np.array([df1, df1])` will raise an error due to the inconsistencies within NumPy, it ends up calling `convert([df1, df1])` which in turn creates a NumPy ``dtype=object`` array with two dataframes inside.
However, the new/correct behaviour for NumPy would be to that `np.array([df1, df1])` will return a 3 dimensional array. This ends up raising an error because pandas refuses to coerce a 3D array to a DataFrame.

It seems safest to not try to squeeze this into the upcoming NumPy release (it is planned in a few days). However, I would like to change it in master soon after branching. I am not sure if you see the current behaviour as important or not, but it would be nice if you can look into what the final intend will be here. If we (can) change this in NumPy I am not sure there is a way for pandas to retain the old behaviour.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pandas Tests rely on inconsistent array coercion #29978

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Pandas Tests rely on inconsistent array coercion #29978

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions