Skip to content

DEPR,VIS: remove custom matplotlib code #50059

Open
@MarcoGorelli

Description

@MarcoGorelli

The good - definitely keep

Before talking about what I'd like to remove, here's what I'd like to keep.

I love the plotting backend. Good one @datapythonista ! What I'm suggesting is to only keep that and get rid of everything else.

The not so good - deprecate?

Visualisation code is hard to maintain and hard to test. It regularly causes issues due to insufficient testing. The worst example I've come across is #39522, in which the lines on the plot don't correspond with the legend. If someone had trusted pandas and had made a business decision using such a plot, they'd have made the wrong decision!

On the maintenance side, this PR #29944 took nearly 3 years (!) to get reviewed.

I'm not blaming anyone, I'm just pointing out that visualisation is really hard to maintain and test. How about we reduce the maintenance burden and get rid of most of it?

What to do instead

seaborn is a high-level wrapper around matplotlib (a bit like how plotly.express is to plotly). Instead of maintaining all this buggy custom matplotlib code ourselves, let's leave it to the experts.

Concretely, this would mean adding a seaborn plotting backend, and making that the default backend. So df.plot.scatter(x=x, y=y) would defer to seaborn.scatterplot(data=df, x=x, y=y), and similarly for other common methods.
This could live in seaborn itself, just like how the plotly backend code live in the plotly repo:

https://github.com/plotly/plotly.py/blob/master/packages/python/plotly/plotly/__init__.py#L99

If the seaborn folks weren't up for that, then it could still live in pandas. It'd be way less to maintain than all the custom unmaintained matplotlib code we currently have.

Impact on users

For the vast, vast majority, it should be practically nothing. If before they called df.plot.scatter(x=x, y=y), then they will still get a matplotlib scatter plot with x on the x-axis and y on the y-axis, but it'll have been produced by seaborn instead of with the custom matplotlib code we currently have in pandas.

There may be some unusual plots which pandas currently produces but which seaborn doesn't - I don't know, I only tried this as a POC and it seemed to work well - and to be honest I'd be fine with deprecating them completely. If they're too unusual to be in a plotting library, they're too unusual to be in pandas.

Alternatives

Split the custom matplotlib code into a separate repo. If someone wants to step up and volunteer to maintain it, then sure, no objections. If nobody does, my suggestion is to defer to seaborn for matplotlib plots and stop hacking around with custom matplotlib code in pandas.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Needs DiscussionRequires discussion from core team before further actionVisualizationplotting

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions