Skip to content

BUG: date_range returns wrong output in some cases #35342

Open
@tnnmk

Description

@tnnmk
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

import pandas as pd
print(pd.date_range('2020-02-01 15:00', '2020-05-12 13:00', freq='MS')) # wrong output
print(pd.date_range('2020-02-01 15:00', '2020-05-12 15:00', freq='MS')) # correct output
print(pd.date_range('2020-02-01 15:00', '2020-05-12 16:00', freq='MS')) # correct output
print(pd.date_range('2020-02-01 15:00', '2020-05-01 01:00', freq='MS')) # wrong output
print(pd.date_range('2020-02-01 15:00', '2020-05-01 23:00', freq='MS')) # correct output
print(pd.date_range('2020-01-21 15:00', '2020-05-12 13:00', freq='MS')) # correct output

Currently outputs:

DatetimeIndex(['2020-02-01 15:00:00', '2020-03-01 15:00:00',
               '2020-04-01 15:00:00'],
              dtype='datetime64[ns]', freq='MS')
DatetimeIndex(['2020-02-01 15:00:00', '2020-03-01 15:00:00',
               '2020-04-01 15:00:00', '2020-05-01 15:00:00'],
              dtype='datetime64[ns]', freq='MS')
DatetimeIndex(['2020-02-01 15:00:00', '2020-03-01 15:00:00',
               '2020-04-01 15:00:00', '2020-05-01 15:00:00'],
              dtype='datetime64[ns]', freq='MS')
DatetimeIndex(['2020-02-01 15:00:00', '2020-03-01 15:00:00',
               '2020-04-01 15:00:00'],
              dtype='datetime64[ns]', freq='MS')
DatetimeIndex(['2020-02-01 15:00:00', '2020-03-01 15:00:00',
               '2020-04-01 15:00:00', '2020-05-01 15:00:00'],
              dtype='datetime64[ns]', freq='MS')
DatetimeIndex(['2020-02-01 15:00:00', '2020-03-01 15:00:00',
               '2020-04-01 15:00:00', '2020-05-01 15:00:00'],
              dtype='datetime64[ns]', freq='MS')

Problem description

Seems that if freq='MS', the date of start is the first day of a month, and the time of start is after the time of end, then the last month is missing from the output.

The same problems seems to also occur if freq='M' and the date of start is the last day of a month (and the time of start is after the time of end), as demonstrated below.

(Perhaps the problem could occur also with other values of the freq parameter. However, I only tested with 'MS' and 'M'.)

Expected Output

I would expect every code line above to produce the following output:

DatetimeIndex(['2020-02-01 15:00:00', '2020-03-01 15:00:00',
               '2020-04-01 15:00:00', '2020-05-01 15:00:00'],
              dtype='datetime64[ns]', freq='MS')

Another code Sample, a copy-pastable example

Here is an example with freq='M'.

import pandas as pd
print(pd.date_range('2020-01-31 15:00', '2020-05-12 13:00', freq='M')) # wrong output
print(pd.date_range('2020-01-31 17:00', '2020-05-12 17:00', freq='M')) # correct output
print(pd.date_range('2020-01-30 15:00', '2020-05-12 13:00', freq='M')) # correct output

Expected Output

Here the expect output of each code line above is:

DatetimeIndex(['2020-01-31 17:00:00', '2020-02-29 17:00:00',
               '2020-03-31 17:00:00', '2020-04-30 17:00:00'],
              dtype='datetime64[ns]', freq='M')

Output of pd.show_versions()

(While the below output reports pandas version 1.0.3, i have also tested with version 1.0.5 with same results.)

INSTALLED VERSIONS

commit : None
python : 3.6.9.final.0
python-bits : 64
OS : Linux
OS-release : 5.3.0-62-generic
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : fi_FI.UTF-8
LOCALE : fi_FI.UTF-8

pandas : 1.0.3
numpy : 1.19.0
pytz : 2019.3
dateutil : 2.8.1
pip : 9.0.1
setuptools : 46.1.3
Cython : 0.29.20
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : 0.9.6
lxml.etree : 4.2.1
html5lib : 0.999999999
pymysql : None
psycopg2 : None
jinja2 : 2.10
IPython : 5.5.0
pandas_datareader: None
bs4 : 4.6.0
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.2.1
matplotlib : 3.2.1
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pytest : None
pyxlsb : None
s3fs : None
scipy : 1.5.1
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : 1.1.0
xlwt : None
xlsxwriter : 0.9.6
numba : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions