PERF: Significant speed difference between `arr.mean()` and `arr.values.mean()` for common `dtype` columns

- [x] I have checked that this issue has not already been reported.

- [x] I have confirmed this bug exists on the latest version of pandas.

- [ ] (optional) I have confirmed this bug exists on the master branch of pandas.

---

I'm seeing a significant variance in timings for common math operations (e.g. `mean`, `std`, `max`) on a large Pandas `Series` vs the underlying NumPy `array`. An code example is shown below with 1 million elements and a 10x speed difference. The screenshot below uses 10 million elements.

I've generated a testing module (https://github.com/ianozsvald/dtype_pandas_numpy_speed_test) which several people have tried on Intel & AMD hardware: https://github.com/ianozsvald/dtype_pandas_numpy_speed_test/issues/1

This module confirms the general trend that all of these operations are faster on the underlying NumPy array (not unsurprising as it avoids the despatch machinery) _but_ for float operations the speed hit using Pandas seems to be extreme:

![timings](https://user-images.githubusercontent.com/273210/84599796-a10e6e80-ae6c-11ea-98aa-c0d58d0974aa.png)


#### Code Sample, a copy-pastable example

A Python module exists in this repo along with reports from several other users with screenshots of their graphs, the same general behaviour is seen across different machines: https://github.com/ianozsvald/dtype_pandas_numpy_speed_test

```python
# note this is copied from my README linked above.
# paste into IPython or a Notebook
import pandas as pd
import numpy as np
arr = pd.Series(np.ones(shape=1_000_000))
arr.values.dtype                                                                                                                                                         
Out[]: dtype('float64')

arr.values.mean() == arr.mean()                                                                                                                                           
Out[]: True

# call arr.mean() vs arr.values.mean(), note circa 10* speed difference
# with 4ms vs 0.4ms
%timeit arr.mean()
4.59 ms ± 44.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit arr.values.mean()
485 µs ± 5.73 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

# note that arr.values dereference is very cheap (nano seconds)
%timeit arr.values 
456 ns ± 0.828 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
```

#### Problem description

Is this slow-down expected? The slowdown feels extreme but perhaps my testing methodology is flawed? I expect the float & integer math to operate at approximately the same speed but instead we see a significant slow-down for Pandas float operations vs their NumPy counterparts.

I've added some extra graphs:
* https://github.com/ianozsvald/dtype_pandas_numpy_speed_test/blob/master/timings_1e7_std.png (10M elements with `std`)
* https://github.com/ianozsvald/dtype_pandas_numpy_speed_test/blob/master/timings_1e8_mean.png (100M elements with `mean` to contrast against the picture shown above in this report)

#### Expected Output

#### Output of ``pd.show_versions()``

<details>

In [2]: pd.show_versions()                                                                                                                                                        

INSTALLED VERSIONS
------------------
commit           : None
python           : 3.8.3.final.0
python-bits      : 64
OS               : Linux
OS-release       : 5.6.7-050607-generic
machine          : x86_64
processor        : x86_64
byteorder        : little
LC_ALL           : None
LANG             : en_GB.UTF-8
LOCALE           : en_GB.UTF-8

pandas           : 1.0.4
numpy            : 1.18.5
pytz             : 2020.1
dateutil         : 2.8.1
pip              : 20.1.1
setuptools       : 47.1.1.post20200529
Cython           : None
pytest           : None
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : 2.11.2
IPython          : 7.15.0
pandas_datareader: None
bs4              : None
bottleneck       : None
fastparquet      : None
gcsfs            : None
lxml.etree       : None
matplotlib       : 3.2.1
numexpr          : None
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : 0.17.1
pytables         : None
pytest           : None
pyxlsb           : None
s3fs             : None
scipy            : None
sqlalchemy       : None
tables           : None
tabulate         : None
xarray           : None
xlrd             : None
xlwt             : None
xlsxwriter       : None
numba            : None


</details>


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

PERF: Significant speed difference between `arr.mean()` and `arr.values.mean()` for common `dtype` columns #34773

Code Sample, a copy-pastable example

Problem description

Expected Output

Output of `pd.show_versions()`

INSTALLED VERSIONS

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

PERF: Significant speed difference between arr.mean() and arr.values.mean() for common dtype columns #34773

Description

Code Sample, a copy-pastable example

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

PERF: Significant speed difference between `arr.mean()` and `arr.values.mean()` for common `dtype` columns #34773

Output of `pd.show_versions()`