Description
The good - definitely keep
Before talking about what I'd like to remove, here's what I'd like to keep.
I love the plotting backend. Good one @datapythonista ! What I'm suggesting is to only keep that and get rid of everything else.
The not so good - deprecate?
Visualisation code is hard to maintain and hard to test. It regularly causes issues due to insufficient testing. The worst example I've come across is #39522, in which the lines on the plot don't correspond with the legend. If someone had trusted pandas and had made a business decision using such a plot, they'd have made the wrong decision!
On the maintenance side, this PR #29944 took nearly 3 years (!) to get reviewed.
I'm not blaming anyone, I'm just pointing out that visualisation is really hard to maintain and test. How about we reduce the maintenance burden and get rid of most of it?
What to do instead
seaborn is a high-level wrapper around matplotlib (a bit like how plotly.express
is to plotly
). Instead of maintaining all this buggy custom matplotlib code ourselves, let's leave it to the experts.
Concretely, this would mean adding a seaborn
plotting backend, and making that the default backend. So df.plot.scatter(x=x, y=y)
would defer to seaborn.scatterplot(data=df, x=x, y=y)
, and similarly for other common methods.
This could live in seaborn
itself, just like how the plotly backend code live in the plotly
repo:
https://github.com/plotly/plotly.py/blob/master/packages/python/plotly/plotly/__init__.py#L99
If the seaborn
folks weren't up for that, then it could still live in pandas. It'd be way less to maintain than all the custom unmaintained matplotlib code we currently have.
Impact on users
For the vast, vast majority, it should be practically nothing. If before they called df.plot.scatter(x=x, y=y)
, then they will still get a matplotlib
scatter plot with x
on the x-axis and y
on the y-axis, but it'll have been produced by seaborn
instead of with the custom matplotlib code we currently have in pandas.
There may be some unusual plots which pandas currently produces but which seaborn doesn't - I don't know, I only tried this as a POC and it seemed to work well - and to be honest I'd be fine with deprecating them completely. If they're too unusual to be in a plotting library, they're too unusual to be in pandas.
Alternatives
Split the custom matplotlib code into a separate repo. If someone wants to step up and volunteer to maintain it, then sure, no objections. If nobody does, my suggestion is to defer to seaborn
for matplotlib
plots and stop hacking around with custom matplotlib code in pandas.