Closed
Description
The default log-scaled axes, activated by the logx
, logy
, and loglog
methods to the Pandas plotting API, do the straightforward thing and take the log of 0 values. It then attempt to plot with these infinite logs, and makes the entire plot unusable without warning in the presence of 0s.
For example:
draws = pd.DataFrame({'freq': np.random.zipf(1.7, 1000) - 1})
draws['rank'] = (-draws['freq']).rank()
draws.plot(x='rank', y='freq', kind='scatter', loglog=True)
Matplotlib provides another scale, the symlog
scale, that makes a small region near 0 linear to avoid these problems. For quick-and-dirty 'look at my data on a log axis' plotting, symlog
is significantly more useful.
I can access it like this:
draws = pd.DataFrame({'freq': np.random.zipf(1.7, 1000) - 1})
draws['rank'] = (-draws['freq']).rank()
p = draws.plot(x='rank', y='freq', kind='scatter', loglog=True)
p.set_xscale('symlog')
p.set_yscale('symlog')
p
Either making the symlog
scale the default log scale for plotting, or supporting a loglog='sym'
option, would make it significantly easier to do quick data inspection with Pandas' convenience plotting.