Description
Is your feature request related to a problem?
I would like to write some unit tests for my Pandas code. I want to test that some DataFrame
is equal to an expected value. The expected value is complicated and I would like an easy way to get the Python code to construct it. My program already computes the expected DataFrame
value but I need a way to serialize/deserialize it for use in my test code.
Here is a StackOverflow question with more detail.
Describe the solution you'd like
I would like there to be a DataFrame.to_expr()
method. It should return a str
containing valid Python code that can be used to re-construct the DataFrame
.
To the greatest extent possible, it should be true that pd.testing.assert_frame_equal(df1, eval(df2.to_expr()))
throws an AssertionError
if and only if pd.testing.assert_frame_equal(df1, df2)
throws an AssertionError
. I am using assert_frame_equal
because it checks column dtypes, whereas DataFrame.equals()
does not.
Concretely, I think the return value of .to_expr()
should be something like
pandas.DataFrame({'column_1': pandas.Series([1, 2, 3], dtype='int64'), 'column_2': pandas.Series([1.0, 2.0, 3.0], dtype='float64')})
Note that on many Python objects, this .to_expr()
method is called __repr__()
. The Python docs state:
For many types, [
__repr__
] makes an attempt to return a string that would yield an object with the same value when passed toeval()
...
However, DataFrame.__repr__
is already defined to print a different representation (which is arguably more useful in an interactive environment).
API breaking implications
This is a backwards-compatible change.
Describe alternatives you've considered
I've used DataFrame.to_dict()
and DataFrame.from_dict()
for this purpose in the past. However, this doesn't preserve the type, and so it doesn't work if you're working with an empty DataFrame
. I also worry that from_dict
will sometimes fail to infer the original type even for non-empty DataFrame
s.