Description
Code Sample
SQL = """
SELECT TOP 10
submitted_date_id
FROM my_table
WHERE submitted_date_id BETWEEN %(mindate)s AND %(maxdate)s
"""
df = pd.read_sql(
SQL,
con,
params={'mindate': 20171008, 'maxdate': '20171010'},
parse_dates={'submitted_date_id': '%Y%m%d'})
df
The underlaying database is MSSQL and I'm using pymssql
to build the connection con
.
Problem description
The column submitted_date_id
has integers representing the date YYYYMMDD
. The code above is expected to parse the column and populate the resulting column in the dataframe with Timestamp
objects. However, this code yields an error:
ValueError: cannot cast unit %Y%m%d
The current behavior is counter intuitive and seems to assume that the date column contains stings. A work around is to cast the column in the query to VARCHAR
, but this is cumbersome. Moreover, it adds unneeded complexity to the query.
Expected Output
When providing the format of the dates to parse_dates
, it is expected that the column will hold Timestamp objects.
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.2.final.0
python-bits: 64
OS: Darwin
OS-release: 16.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.20.3
pytest: 3.2.1
pip: 9.0.1
setuptools: 36.3.0
Cython: None
numpy: 1.13.1
scipy: 0.19.1
xarray: None
IPython: 6.2.0
sphinx: 1.6.4
patsy: None
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None