Description
I often need to produce histograms where the x axis uses a date scale, typically binned by day, week, or month. The only sensible end result is where the scale's breaks align with the bins, but the existing methods I'm aware of for getting there are a bit fragile:
library(ggplot2)
set.seed(2024)
df <- data.frame(date = as.Date("2024-01-01") + rnorm(100, 0, 5))
Use geom_bar()
and a binned scale
ggplot(df, aes(date)) +
geom_bar() +
scale_x_binned(
transform = scales::transform_date(),
breaks = scales::breaks_width("1 week")
)
#> Warning in scale_x_binned(transform = scales::transform_date(), breaks = scales::breaks_width("1 week")): Ignoring `n.breaks`. Use a breaks function that supports setting number of
#> breaks.
Nice because the binning is specified only once, but now the whole scale is binned, so I can't for example add a geom_vline()
to mark a specific date on the axis, since the vertical line would then be snapped into a bin by the scale transform.
Use stat_bin()
ggplot(df, aes(date)) +
geom_histogram(binwidth = 7, closed = "right") +
scale_x_date(date_breaks = "1 week")
The naive approach leaves the scale breaks and the bins unaligned (offset by 0.5 days here). Of course this can be improved by specifying a bin boundary
or manually passing breaks
but this gets a bit fiddly and fragile.
Since #5963 there's a better workaround:
ggplot(df, aes(date)) +
geom_histogram(breaks = function(x) { scales::breaks_width("1 week")(as.Date(range(x))) }) +
scale_x_date(date_breaks = "1 week")
Created on 2024-10-25 with reprex v2.1.1
which is the result I want. However, there's duplication of the breaks and transforms between the scale and the stat. Ideally I'd like a way to request stat_bin()
to just use the scale's breaks.
It's technically possible, since StatBin::compute_group
(where the bins are computed) already has access to the scale object, but I'm not sure if it violates any sort of ggplot API encapsulation principles to have the scale directly affecting the stat's output in the way I'm proposing.
The same situation applies for stat_bin_2d()
and stat_summary_bin()
. I'd be happy to open a PR if there's agreement about the idea. I'm imagining either new value/s for breaks
or a new param mutually exclusive with breaks
that lets users choose to use the corresponding scale's major or minor breaks for the stat's binning breaks.