Extreme performance difference between int and float factorization

#### Code Sample, a copy-pastable example if possible

```python
import pandas as pd
import numpy as np
from time import time


df = pd.date_range(start="1/1/2018", end="1/2/2018", periods=1e6).to_frame()

start = time()
dfr = df.resample("1s").last()
print(time() - start)
print("Length:", len(dfr))
print()

group_index = np.round(df.index.astype(int) / 1e9)
start = time()
dfr = df.groupby(group_index).last()
print(time() - start)
print("Length:", len(dfr))
```
#### Problem description

In my current project, I use groupby as well as resample for the same data frames with the same aggregations. I have noticed that resample is way quicker than groupby. While I understand that groupby is more flexible, it would still be nice if the performance was comparable. In the example above, resample is more than 50 times faster:

```
Length: 86401
0.023558616638183594

Length: 86401
1.264981746673584
```

I am aware that they don't result in the exact same data frames, but this does not matter for this discussion.

#### Expected Output

Better performance for groupby.

I haven't looked at the groupby implementation and therefore I don't know if there is a good reason for the difference. If there is a good reason, some common cases could still be improved a lot. For example, in this case, we could just check first if the by-argument is monotonic increasing or decreasing. In this case, the operation can be implemented even without a hash map.

#### Output of ``pd.show_versions()``

<details>

INSTALLED VERSIONS
------------------
commit           : None
python           : 3.7.3.final.0
python-bits      : 64
OS               : Linux
OS-release       : 4.4.0-17134-Microsoft
machine          : x86_64
processor        : x86_64
byteorder        : little
LC_ALL           : None
LANG             : C.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 0.25.1
numpy            : 1.17.1
pytz             : 2019.2
dateutil         : 2.8.0
pip              : 19.2.3
setuptools       : 41.2.0
Cython           : None
pytest           : None
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : 4.4.1
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : 2.10.1
IPython          : 7.8.0
pandas_datareader: 0.7.4
bs4              : None
bottleneck       : None
fastparquet      : None
gcsfs            : None
lxml.etree       : 4.4.1
matplotlib       : 3.1.1
numexpr          : None
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : 0.14.1
pytables         : None
s3fs             : None
scipy            : 1.3.1
sqlalchemy       : None
tables           : None
xarray           : None
xlrd             : None
xlwt             : None
xlsxwriter       : None

</details>


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extreme performance difference between int and float factorization #28303

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of `pd.show_versions()`

INSTALLED VERSIONS

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Extreme performance difference between int and float factorization #28303

Description

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Output of `pd.show_versions()`