Description
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
import pandas as pd
print(pd.date_range('2020-02-01 15:00', '2020-05-12 13:00', freq='MS')) # wrong output
print(pd.date_range('2020-02-01 15:00', '2020-05-12 15:00', freq='MS')) # correct output
print(pd.date_range('2020-02-01 15:00', '2020-05-12 16:00', freq='MS')) # correct output
print(pd.date_range('2020-02-01 15:00', '2020-05-01 01:00', freq='MS')) # wrong output
print(pd.date_range('2020-02-01 15:00', '2020-05-01 23:00', freq='MS')) # correct output
print(pd.date_range('2020-01-21 15:00', '2020-05-12 13:00', freq='MS')) # correct output
Currently outputs:
DatetimeIndex(['2020-02-01 15:00:00', '2020-03-01 15:00:00',
'2020-04-01 15:00:00'],
dtype='datetime64[ns]', freq='MS')
DatetimeIndex(['2020-02-01 15:00:00', '2020-03-01 15:00:00',
'2020-04-01 15:00:00', '2020-05-01 15:00:00'],
dtype='datetime64[ns]', freq='MS')
DatetimeIndex(['2020-02-01 15:00:00', '2020-03-01 15:00:00',
'2020-04-01 15:00:00', '2020-05-01 15:00:00'],
dtype='datetime64[ns]', freq='MS')
DatetimeIndex(['2020-02-01 15:00:00', '2020-03-01 15:00:00',
'2020-04-01 15:00:00'],
dtype='datetime64[ns]', freq='MS')
DatetimeIndex(['2020-02-01 15:00:00', '2020-03-01 15:00:00',
'2020-04-01 15:00:00', '2020-05-01 15:00:00'],
dtype='datetime64[ns]', freq='MS')
DatetimeIndex(['2020-02-01 15:00:00', '2020-03-01 15:00:00',
'2020-04-01 15:00:00', '2020-05-01 15:00:00'],
dtype='datetime64[ns]', freq='MS')
Problem description
Seems that if freq
='MS', the date of start
is the first day of a month, and the time of start
is after the time of end
, then the last month is missing from the output.
The same problems seems to also occur if freq
='M' and the date of start
is the last day of a month (and the time of start
is after the time of end
), as demonstrated below.
(Perhaps the problem could occur also with other values of the freq
parameter. However, I only tested with 'MS' and 'M'.)
Expected Output
I would expect every code line above to produce the following output:
DatetimeIndex(['2020-02-01 15:00:00', '2020-03-01 15:00:00',
'2020-04-01 15:00:00', '2020-05-01 15:00:00'],
dtype='datetime64[ns]', freq='MS')
Another code Sample, a copy-pastable example
Here is an example with freq
='M'.
import pandas as pd
print(pd.date_range('2020-01-31 15:00', '2020-05-12 13:00', freq='M')) # wrong output
print(pd.date_range('2020-01-31 17:00', '2020-05-12 17:00', freq='M')) # correct output
print(pd.date_range('2020-01-30 15:00', '2020-05-12 13:00', freq='M')) # correct output
Expected Output
Here the expect output of each code line above is:
DatetimeIndex(['2020-01-31 17:00:00', '2020-02-29 17:00:00',
'2020-03-31 17:00:00', '2020-04-30 17:00:00'],
dtype='datetime64[ns]', freq='M')
Output of pd.show_versions()
(While the below output reports pandas version 1.0.3, i have also tested with version 1.0.5 with same results.)
INSTALLED VERSIONS
commit : None
python : 3.6.9.final.0
python-bits : 64
OS : Linux
OS-release : 5.3.0-62-generic
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : fi_FI.UTF-8
LOCALE : fi_FI.UTF-8
pandas : 1.0.3
numpy : 1.19.0
pytz : 2019.3
dateutil : 2.8.1
pip : 9.0.1
setuptools : 46.1.3
Cython : 0.29.20
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : 0.9.6
lxml.etree : 4.2.1
html5lib : 0.999999999
pymysql : None
psycopg2 : None
jinja2 : 2.10
IPython : 5.5.0
pandas_datareader: None
bs4 : 4.6.0
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.2.1
matplotlib : 3.2.1
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pytest : None
pyxlsb : None
s3fs : None
scipy : 1.5.1
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : 1.1.0
xlwt : None
xlsxwriter : 0.9.6
numba : None