Skip to content

Pandas compatibility #501

Open
Open
@MilesCranmer

Description

@MilesCranmer

Affects: PythonCall

Describe the bug

I have been trying to use pandas from PythonCall.jl and just wanted to document a few different calls that do not directly translate to Julia. I guess this might just mean we need a PythonPandas package to translate calls but I wonder if there's any missing methods that could be implemented to fix things automatically.

First, the preamble for this:

using PythonCall

pd = pyimport("pandas")
  • 1. Constructing pandas.DataFrame:

Using a similar syntax to Python:

df = pd.DataFrame(Dict([
    "a" => [1, 2, 3],
    "b" => [4, 5, 6]
]))

which results in the following dataframe:

julia> df
Python:
   0
0  b
1  a

i.e., it seems to have a single column named "0" and rows for a and b.

If I instead write this as a vector of pairs, I get:

julia> pd.DataFrame([
           "a" => [1, 2, 3],
           "b" => [4, 5, 6]
       ])
Python:
   0          1
0  a  [1, 2, 3]
1  b  [4, 5, 6]

I suppose this one makes sense.

I was able to get it working with the following syntax instead:

julia> df = pd.DataFrame([
            1   4
            2   5
            3   6
       ], columns=["a", "b"])
Python:
   a  b
0  1  4
1  2  5
2  3  6
  • 2. Selecting multiple columns

So, selecting a single column works:

julia> df["a"]
Python:
0    1
1    2
2    3
Name: a, dtype: int64

but multiple columns does not:

julia> df[["a", "b"]]
ERROR: Python: TypeError: Julia: MethodError: objects of type Vector{String} are not callable
Use square brackets [] for indexing an Array.
Python stacktrace:
 [1] __call__
   @ ~/.julia/packages/PythonCall/S5MOg/src/JlWrap/any.jl:223
 [2] apply_if_callable
   @ pandas.core.common ~/Documents/pysr_projects/arya/bigbench/.CondaPkg/env/lib/python3.12/site-packages/pandas/core/common.py:384
 [3] __getitem__
   @ pandas.core.frame ~/Documents/pysr_projects/arya/bigbench/.CondaPkg/env/lib/python3.12/site-packages/pandas/core/frame.py:4065
Stacktrace:
 [1] pythrow()
   @ PythonCall.Core ~/.julia/packages/PythonCall/S5MOg/src/Core/err.jl:92
 [2] errcheck
   @ ~/.julia/packages/PythonCall/S5MOg/src/Core/err.jl:10 [inlined]
 [3] pygetitem(x::Py, k::Vector{String})
   @ PythonCall.Core ~/.julia/packages/PythonCall/S5MOg/src/Core/builtins.jl:171
 [4] getindex(x::Py, i::Vector{String})
   @ PythonCall.Core ~/.julia/packages/PythonCall/S5MOg/src/Core/Py.jl:292
 [5] top-level scope
   @ REPL[18]:1

I got around this by inserting a pylist call:

julia> df[pylist(["a", "b"])]
Python:
   a  b
0  1  4
1  2  5
2  3  6

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions