BUG: DataFrame.describe() breaks with a column index of object type and numeric entries

Preparing a commit for another issue in `.describe()`, I encountered this puzzling bug, surprisingly easy to trigger.
#### Symptoms

``` python
df = pd.DataFrame({'A': list("BCDE"), 0: [1,2,3,4]})
df.describe()
# Long traceback listing formatting and internal functions...
ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long'
```

However:

``` python
df.describe(include='all')
               0    A
count   4.000000    4
unique       NaN    4
top          NaN    D
freq         NaN    1
mean    2.500000  NaN
std     1.290994  NaN
min     1.000000  NaN
25%     1.750000  NaN
50%     2.500000  NaN
75%     3.250000  NaN
max     4.000000  NaN

# It's OK if we don't print on screen:
x = df.describe()
x.columns
Out[8]: Index([0], dtype='int64')

# Fixing this suspicious index (int works too):
x.columns = x.columns.astype(object)
x
Out[10]: 
              0
count  4.000000
mean   2.500000
std    1.290994
min    1.000000
25%    1.750000
50%    2.500000
75%    3.250000
max    4.000000
```

Same issue happens with a simpler data frame:

``` python
df0 = pd.DataFrame([1,2,3,4])
# It's  OK now
df0.describe()
Out[28]: 
              0
count  4.000000
mean   2.500000
std    1.290994
min    1.000000
25%    1.750000
50%    2.500000
75%    3.250000
max    4.000000

# Modify column index:
df0.columns = pd.Index([0], dtype=object)
df0.describe()
# ...
ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long'
```

Current version (but the bug is also present in pandas release 0.18.1):

```
pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Linux
OS-release: 4.1.20-1
machine: x86_64
processor: Intel(R)_Core(TM)_i5-2520M_CPU_@_2.50GHz
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.1+64.g7ed22fe.dirty
nose: 1.3.7
pip: 8.1.2
setuptools: 21.0.0
Cython: 0.24
numpy: 1.11.0
scipy: 0.17.0.dev0+3f3c371
IPython: 4.0.1
...
```
#### Reason

Some internal function gets confused by dtypes of a column index, I guess. But the faulty index is created in `.describe()`.

``` python
# Output from %debug df.describe()
# NDFrame.describe() in pandas/core/generic.py:
#
   4943             data = self
   4944         else:
   4945             data = self.select_dtypes(include=include, exclude=exclude)
   4946 
   4947         ldesc = [describe_1d(s, percentiles) for _, s in data.iteritems()]
   4948         # set a convenient order for rows
   4949         names = []
   4950         ldesc_indexes = sorted([x.index for x in ldesc], key=len)
   4951         for idxnames in ldesc_indexes:
   4952             for name in idxnames:
   4953                 if name not in names:
   4954                     names.append(name)
   4955 
   4956         d = pd.concat(ldesc, join_axes=pd.Index([names]), axis=1)
1> 4957         d.columns = self.columns._shallow_copy(values=d.columns.values)
   4958         d.columns.names = data.columns.names
   4959         return d
```

`_shallow_copy()` in the marked line changes `d.columns`:

``` python
ipdb> p d.columns
Int64Index([0], dtype='int64')
ipdb> n
> /home/users/piotr/workspace/pandas-pijucha/pandas/core/generic.py(4958)describe()
1  4957         d.columns = self.columns._shallow_copy(values=d.columns.values)
-> 4958         d.columns.names = data.columns.names
   4959         return d
ipdb> p d.columns
Index([0], dtype='int64')
```
#### Possible solutions

Lines 4957-4958 are actually used to fix issues that `pd.concat` brings about. They try to pass the column structure from `self` to `d`.
I think a simpler solution is replacing these lines with:

``` python
 d = pd.concat(ldesc, join_axes=pd.Index([names]), axis=1)
 d.columns = data.columns
 return d
```

or

``` python
d = pd.DataFrame(pd.concat(ldesc, axis=1), index = pd.Index(names), columns = data.columns)
return d
```

`data` is a subframe of `self` and retains the same column structure.

`pd.concat` has some parameters that help pass a hierarchical index but can't do anything on its own with a categorical one.

I'm going to submit a pull request with this fix together with some others related with `describe()`. I hope I haven't overlooked anything obvious. But if so, any comments are very welcome.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: DataFrame.describe() breaks with a column index of object type and numeric entries #13288

Symptoms

Reason

Possible solutions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

BUG: DataFrame.describe() breaks with a column index of object type and numeric entries #13288

Description

Symptoms

Reason

Possible solutions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions