Skip to content

DataFrame(::PyPandasDataFrame) converts date & datetime to bytes #293

Open
@tomdstone

Description

@tomdstone

When doing some work involving dataframes in python via PythonCall, it seems like DataFrame(PyTable(p)) where p is a pandas data table converts the date and datetime columns into byte vectors. Is this issue related to the issue #265 with milliseconds vs microseconds, or due to a missing part of the DataFrame(::PyPandasDataFrame) implementation?

Here are a few minimal examples, in a conda environment with pandas.

using PythonCall
using Dates
using DataFrames

a = DataFrame(x = [now()])  # julia dataframe
b = pytable(a)              # pandas dataframe
c = PyTable(b)              # PyPandasDataFrame
d = DataFrame(c)            # julia dataframe again

This results in:

julia> c
1×1 PyPandasDataFrame
                        x
0 2023-04-13 14:36:13.939

julia> d
1×1 DataFrame
 Row │ x
     │ PyArray…
─────┼───────────────────────────────────
   1 │ UInt8[0xc0, 0x62, 0x31, 0x8c, 0x…

The same thing happens when initially defining b as a pandas dataframe, so the microsecond issue in #265 seems to not be the problem?

julia> b = pd.DataFrame([[dt.datetime.now()]])
Python DataFrame:
                           0
0 2023-04-13 14:46:57.940077

julia> c = PyTable(b)
1×1 PyPandasDataFrame       
                           0
0 2023-04-13 14:46:57.940077

julia> d = DataFrame(c)
1×1 DataFrame
 Row │ 0
     │ PyArray…
─────┼───────────────────────────────────
   1 │ UInt8[0xc8, 0xf9, 0xa5, 0x7d, 0x…

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions