Open
Description
When doing some work involving dataframes in python via PythonCall, it seems like DataFrame(PyTable(p))
where p
is a pandas data table converts the date and datetime columns into byte vectors. Is this issue related to the issue #265 with milliseconds vs microseconds, or due to a missing part of the DataFrame(::PyPandasDataFrame)
implementation?
Here are a few minimal examples, in a conda environment with pandas.
using PythonCall
using Dates
using DataFrames
a = DataFrame(x = [now()]) # julia dataframe
b = pytable(a) # pandas dataframe
c = PyTable(b) # PyPandasDataFrame
d = DataFrame(c) # julia dataframe again
This results in:
julia> c
1×1 PyPandasDataFrame
x
0 2023-04-13 14:36:13.939
julia> d
1×1 DataFrame
Row │ x
│ PyArray…
─────┼───────────────────────────────────
1 │ UInt8[0xc0, 0x62, 0x31, 0x8c, 0x…
The same thing happens when initially defining b
as a pandas dataframe, so the microsecond issue in #265 seems to not be the problem?
julia> b = pd.DataFrame([[dt.datetime.now()]])
Python DataFrame:
0
0 2023-04-13 14:46:57.940077
julia> c = PyTable(b)
1×1 PyPandasDataFrame
0
0 2023-04-13 14:46:57.940077
julia> d = DataFrame(c)
1×1 DataFrame
Row │ 0
│ PyArray…
─────┼───────────────────────────────────
1 │ UInt8[0xc8, 0xf9, 0xa5, 0x7d, 0x…
Metadata
Metadata
Assignees
Labels
No labels