Closed
Description
Pandas version checks
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of pandas.
- I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Create a file locally named example.json
with the following contents:
{"Date":{"0":1703653200000,"1":1703566800000,"2":1703221200000,"3":1703134800000,"4":1703048400000,"5":1702962000000,"6":1702875600000,"7":1702616400000,"8":1702530000000,"9":1702443600000},"Revenue":{"0":3880359,"1":3139100,"2":2849700,"3":4884800,"4":4032200,"5":4979100,"6":6314700,"7":11503000,"8":8033300,"9":7727900}}
Create a file locally named example.py
with the following contents:
import pandas as pd
# Assuming "example.json" is in the same directory as your Python script or notebook
file_path = 'example.json'
# Read DataFrame from JSON file
df = pd.read_json(file_path)
# Display the DataFrame
print(df)
# Recreate the problem
df['Date'].apply(lambda x: print(x, type(x)))
- Unfortunately, I cannot replicate this error without revealing more sensitive sections of the codebase.
Issue Description
I currently have the following pandas dataframe object:
>>> print(my_df)
Date Revenue
0 2023-12-27 00:00:00-05:00 3880359
1 2023-12-26 00:00:00-05:00 3139100
2 2023-12-22 00:00:00-05:00 2849700
3 2023-12-21 00:00:00-05:00 4884800
4 2023-12-20 00:00:00-05:00 4032200
5 2023-12-19 00:00:00-05:00 4979100
6 2023-12-18 00:00:00-05:00 6314700
7 2023-12-15 00:00:00-05:00 11503000
8 2023-12-14 00:00:00-05:00 8033300
9 2023-12-13 00:00:00-05:00 7727900
I get normal expected results when I loop through Revenue
column:
>>> my_df['Revenue'].apply(lambda x: print(x, type(x)))
3880359 <class 'int'>
3139100 <class 'int'>
2849700 <class 'int'>
4884800 <class 'int'>
4032200 <class 'int'>
4979100 <class 'int'>
6314700 <class 'int'>
11503000 <class 'int'>
8033300 <class 'int'>
7727900 <class 'int'>
I get abnormal unexpected results when I loop through Date
column:
>>> my_df['Date'].apply(lambda x: print(x, type(x)))
DatetimeIndex(['2023-12-27 00:00:00-05:00', '2023-12-26 00:00:00-05:00', '2023-12-22 00:00:00-05:00', '2023-12-21 00:00:00-05:00', '2023-12-20 00:00:00-05:00', '2023-12-19 00:00:00-05:00', '2023-12-18 00:00:00-05:00', '2023-12-15 00:00:00-05:00', '2023-12-14 00:00:00-05:00', '2023-12-13 00:00:00-05:00'], dtype='datetime64[ns, America/New_York]', freq=None) <class 'pandas.core.indexes.datetimes.DatetimeIndex'>
2023-12-27 00:00:00-05:00 <class 'pandas._libs.tslibs.timestamps.Timestamp'>
2023-12-26 00:00:00-05:00 <class 'pandas._libs.tslibs.timestamps.Timestamp'>
2023-12-22 00:00:00-05:00 <class 'pandas._libs.tslibs.timestamps.Timestamp'>
2023-12-21 00:00:00-05:00 <class 'pandas._libs.tslibs.timestamps.Timestamp'>
2023-12-20 00:00:00-05:00 <class 'pandas._libs.tslibs.timestamps.Timestamp'>
2023-12-19 00:00:00-05:00 <class 'pandas._libs.tslibs.timestamps.Timestamp'>
2023-12-18 00:00:00-05:00 <class 'pandas._libs.tslibs.timestamps.Timestamp'>
2023-12-15 00:00:00-05:00 <class 'pandas._libs.tslibs.timestamps.Timestamp'>
2023-12-14 00:00:00-05:00 <class 'pandas._libs.tslibs.timestamps.Timestamp'>
2023-12-13 00:00:00-05:00 <class 'pandas._libs.tslibs.timestamps.Timestamp'>
Why is this happening? Why do I get an index object at first?
Expected Behavior
I should be getting exclusively timestamp objects on iteration:
>>> my_df['Date'].apply(lambda x: print(x, type(x)))
2023-12-27 00:00:00-05:00 <class 'pandas._libs.tslibs.timestamps.Timestamp'>
2023-12-26 00:00:00-05:00 <class 'pandas._libs.tslibs.timestamps.Timestamp'>
2023-12-22 00:00:00-05:00 <class 'pandas._libs.tslibs.timestamps.Timestamp'>
2023-12-21 00:00:00-05:00 <class 'pandas._libs.tslibs.timestamps.Timestamp'>
2023-12-20 00:00:00-05:00 <class 'pandas._libs.tslibs.timestamps.Timestamp'>
2023-12-19 00:00:00-05:00 <class 'pandas._libs.tslibs.timestamps.Timestamp'>
2023-12-18 00:00:00-05:00 <class 'pandas._libs.tslibs.timestamps.Timestamp'>
2023-12-15 00:00:00-05:00 <class 'pandas._libs.tslibs.timestamps.Timestamp'>
2023-12-14 00:00:00-05:00 <class 'pandas._libs.tslibs.timestamps.Timestamp'>
2023-12-13 00:00:00-05:00 <class 'pandas._libs.tslibs.timestamps.Timestamp'>
Installed Versions
INSTALLED VERSIONS
------------------
commit : 0f437949513225922d851e9581723d82120684a6
python : 3.8.8.final.0
python-bits : 64
OS : Darwin
OS-release : 23.2.0
Version : Darwin Kernel Version 23.2.0: Wed Nov 15 21:54:10 PST 2023; root:xnu-10002.61.3~2/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 2.0.3
numpy : 1.22.4
pytz : 2022.7
dateutil : 2.8.2
setuptools : 52.0.0.post20210125
pip : 21.0.1
Cython : 0.29.23
pytest : 6.2.3
hypothesis : None
sphinx : 4.0.1
blosc : None
feather : None
xlsxwriter : 1.3.8
lxml.etree : 4.9.2
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.3
IPython : 7.22.0
pandas_datareader: None
bs4 : 4.11.1
bottleneck : 1.3.2
brotli :
fastparquet : None
fsspec : 2023.1.0
gcsfs : None
matplotlib : 3.3.4
numba : 0.53.1
numexpr : 2.7.3
odfpy : None
openpyxl : 3.0.7
pandas_gbq : None
pyarrow : 11.0.0
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.6.2
snappy : None
sqlalchemy : 1.4.7
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 2.0.1
zstandard : None
tzdata : 2023.3
qtpy : 1.9.0
pyqt5 : None