Skip to content

ENH: In pandas.testing.assert_frame_equal, support per-column configuration #59548

Open
@adrian17

Description

@adrian17

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

Our internal validation tool's tolerance needs to depend on compared metrics. For example, when obtaining results from an analytical database from a query like

SELECT count(distinct device_id) as device_count, avg(score) as score GROUP BY ...

We expect device_count to always be accurate, but score is expected to have random numerical floating point inaccuracies.

My old code ran assert_frame_equal several times on different subsets of columns, which is cumbersome and doesn't express the intent well.
I recently refactored it by extracting assert_frame_equal's implementation and just adding the extra arguments to support per-column customizable rtol and atol.
It would be nice if such an ability was built into Pandas.

Note that this overlaps a bit with feature request #54861 .

Feature Description

One way is to add extra arguments to assert_frame_equal, usable like so:

assert_frame_equal(
    left,
    right,
    rtol=1e-5,
    atol=1e-8,
    rtols={'device_count': 0, 'score': 1e-6},
    atols={'device_count': 0}, # for unspecified columns, the rtol/atol argument is used as default
)

Or the entire comparison configuration (check_exact, check_datetimelike_compat etc) could be overridden per-series, for example

assert_frame_equal(
    left,
    right,
    overrides={
        'device_count': {'check_exact': True},
        'score': {'rtol': 1e-6},
    }
)

Alternative Solutions

The current way to do it with public APIs is to do something like

for column_names, rtol in [(["device_count", ...], 0.0), (["score", ...], 1e-6), ...]:
    left = # extract index and columns from left
    right = # extract index and columns from right
    assert_frame_equal(left, right, rtol=rtol)

Metadata

Metadata

Assignees

Labels

Closing CandidateMay be closeable, needs more eyeballsEnhancementNeeds DiscussionRequires discussion from core team before further actionTestingpandas testing functions or related to the test suite

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions