Description
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
Code Sample, a copy-pastable example
import pandas as pd
df = pd.read_csv("pdbug.csv", comment="#")
df.info()
print(df)
with pdbug.csv
being this:
#,Thermostat,identifier,(REDACTED)
#,Thermostat,name,Downstairs
#,Start,date,2020-12-22
#,End,date,2020-12-30
Date,Time,System Setting,System Mode,Calendar Event,Program Mode,Cool Set Temp (F),Heat Set Temp (F),Current Temp (F),Current Humidity (%RH),Outdoor Temp (F),Wind Speed (km/h),Cool Stage 1 (sec),Heat Stage 1 (sec),Fan (sec),DM Offset,Thermostat Temperature (F),Thermostat Humidity (%RH),Study (F),Study2,Breakfast Nook (F),Breakfast Nook2
2020-12-22,02:55:00,heat,heatOff,,Sleep,85,61,67.4,40,30.8,0,0,0,0,,67.4,40,,,,,
2020-12-22,03:00:00,heat,heatOff,,Sleep,85,61,67.3,40,29.6,0,0,0,0,,67.3,40,,,,,
2020-12-22,03:05:00,heat,heatOff,,Sleep,85,61,67.2,40,29.6,0,0,0,180,,67.2,40,,,,,
2020-12-22,03:10:00,heat,heatOff,,Sleep,85,61,67.1,42,29.6,0,0,0,120,,67.1,42,,,,,
2020-12-22,03:15:00,heat,heatOff,,Sleep,85,61,67.1,40,29.6,0,0,0,0,,67.1,40,,,,,
2020-12-22,03:20:00,heat,heatOff,,Sleep,85,61,67,40,29.6,0,0,0,0,,67,40,,,,,
2020-12-22,03:25:00,heat,heatOff,,Sleep,85,61,67,40,29.6,0,0,0,0,,67,40,,,,,
2020-12-22,03:30:00,heat,heatOff,,Sleep,85,61,66.9,40,29,0,0,0,0,,66.9,40,,,,,
(a slight edit of a CSV file of HVAC system performance data downloaded from https://ecobee.com)
gives
<class 'pandas.core.frame.DataFrame'>
Index: 8 entries, 2020-12-22 to 2020-12-22
Data columns (total 22 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 8 non-null object
1 Time 8 non-null object
2 System Setting 8 non-null object
3 System Mode 0 non-null float64
4 Calendar Event 8 non-null object
5 Program Mode 8 non-null int64
6 Cool Set Temp (F) 8 non-null int64
7 Heat Set Temp (F) 8 non-null float64
8 Current Temp (F) 8 non-null int64
9 Current Humidity (%RH) 8 non-null float64
10 Outdoor Temp (F) 8 non-null int64
11 Wind Speed (km/h) 8 non-null int64
12 Cool Stage 1 (sec) 8 non-null int64
13 Heat Stage 1 (sec) 8 non-null int64
14 Fan (sec) 0 non-null float64
15 DM Offset 8 non-null float64
16 Thermostat Temperature (F) 8 non-null int64
17 Thermostat Humidity (%RH) 0 non-null float64
18 Study (F) 0 non-null float64
19 Study2 0 non-null float64
20 Breakfast Nook (F) 0 non-null float64
21 Breakfast Nook2 0 non-null float64
dtypes: float64(10), int64(8), object(4)
memory usage: 1.4+ KB
Date Time System Setting System Mode Calendar Event Program Mode Cool Set Temp (F) Heat Set Temp (F) Current Temp (F) Current Humidity (%RH) Outdoor Temp (F) Wind Speed (km/h) Cool Stage 1 (sec) Heat Stage 1 (sec) Fan (sec) DM Offset Thermostat Temperature (F) Thermostat Humidity (%RH) Study (F) Study2 Breakfast Nook (F) Breakfast Nook2
2020-12-22 02:55:00 heat heatOff NaN Sleep 85 61 67.4 40 30.8 0 0 0 0 NaN 67.4 40 NaN NaN NaN NaN NaN
2020-12-22 03:00:00 heat heatOff NaN Sleep 85 61 67.3 40 29.6 0 0 0 0 NaN 67.3 40 NaN NaN NaN NaN NaN
2020-12-22 03:05:00 heat heatOff NaN Sleep 85 61 67.2 40 29.6 0 0 0 180 NaN 67.2 40 NaN NaN NaN NaN NaN
2020-12-22 03:10:00 heat heatOff NaN Sleep 85 61 67.1 42 29.6 0 0 0 120 NaN 67.1 42 NaN NaN NaN NaN NaN
2020-12-22 03:15:00 heat heatOff NaN Sleep 85 61 67.1 40 29.6 0 0 0 0 NaN 67.1 40 NaN NaN NaN NaN NaN
2020-12-22 03:20:00 heat heatOff NaN Sleep 85 61 67.0 40 29.6 0 0 0 0 NaN 67.0 40 NaN NaN NaN NaN NaN
2020-12-22 03:25:00 heat heatOff NaN Sleep 85 61 67.0 40 29.6 0 0 0 0 NaN 67.0 40 NaN NaN NaN NaN NaN
2020-12-22 03:30:00 heat heatOff NaN Sleep 85 61 66.9 40 29.0 0 0 0 0 NaN 66.9 40 NaN NaN NaN NaN NaN
Problem description
Observe that the "Date" column absorbs both the date and time values provided, and that the "Time" column returend by read_csv contains data intended for the next column over ("System Setting"). The source CSV seems sane, but the read_csv
output does not.
Output of pd.show_versions()
pandas : 1.2.0
numpy : 1.19.4
pytz : 2020.4
dateutil : 2.8.1
pip : 20.3.1
setuptools : 44.0.0
Cython : 0.29.21
pytest : 6.2.0
hypothesis : None
sphinx : 3.3.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : None
pandas_datareader: None
bs4 : 4.9.3
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None