Skip to content

ENH: DataFrame.to_expr() method #41837

Open
@kerrickstaley

Description

@kerrickstaley

Is your feature request related to a problem?

I would like to write some unit tests for my Pandas code. I want to test that some DataFrame is equal to an expected value. The expected value is complicated and I would like an easy way to get the Python code to construct it. My program already computes the expected DataFrame value but I need a way to serialize/deserialize it for use in my test code.

Here is a StackOverflow question with more detail.

Describe the solution you'd like

I would like there to be a DataFrame.to_expr() method. It should return a str containing valid Python code that can be used to re-construct the DataFrame.

To the greatest extent possible, it should be true that pd.testing.assert_frame_equal(df1, eval(df2.to_expr())) throws an AssertionError if and only if pd.testing.assert_frame_equal(df1, df2) throws an AssertionError. I am using assert_frame_equal because it checks column dtypes, whereas DataFrame.equals() does not.

Concretely, I think the return value of .to_expr() should be something like

pandas.DataFrame({'column_1': pandas.Series([1, 2, 3], dtype='int64'), 'column_2': pandas.Series([1.0, 2.0, 3.0], dtype='float64')})

Note that on many Python objects, this .to_expr() method is called __repr__(). The Python docs state:

For many types, [__repr__] makes an attempt to return a string that would yield an object with the same value when passed to eval()...

However, DataFrame.__repr__ is already defined to print a different representation (which is arguably more useful in an interactive environment).

API breaking implications

This is a backwards-compatible change.

Describe alternatives you've considered

I've used DataFrame.to_dict() and DataFrame.from_dict() for this purpose in the past. However, this doesn't preserve the type, and so it doesn't work if you're working with an empty DataFrame. I also worry that from_dict will sometimes fail to infer the original type even for non-empty DataFrames.

Metadata

Metadata

Assignees

No one assigned

    Labels

    DataFrameDataFrame data structureEnhancementIO DataIO issues that don't fit into a more specific labelNeeds DiscussionRequires discussion from core team before further action

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions