Skip to content

Way to request stat_bin() to inherit breaks from the scale #6159

Open
@arcresu

Description

@arcresu

I often need to produce histograms where the x axis uses a date scale, typically binned by day, week, or month. The only sensible end result is where the scale's breaks align with the bins, but the existing methods I'm aware of for getting there are a bit fragile:

library(ggplot2)

set.seed(2024)
df <- data.frame(date = as.Date("2024-01-01") + rnorm(100, 0, 5))

Use geom_bar() and a binned scale

ggplot(df, aes(date)) +
  geom_bar() +
  scale_x_binned(
    transform = scales::transform_date(),
    breaks = scales::breaks_width("1 week")
  )
#> Warning in scale_x_binned(transform = scales::transform_date(), breaks = scales::breaks_width("1 week")): Ignoring `n.breaks`. Use a breaks function that supports setting number of
#> breaks.

Nice because the binning is specified only once, but now the whole scale is binned, so I can't for example add a geom_vline() to mark a specific date on the axis, since the vertical line would then be snapped into a bin by the scale transform.

Use stat_bin()

ggplot(df, aes(date)) +
  geom_histogram(binwidth = 7, closed = "right") +
  scale_x_date(date_breaks = "1 week")

The naive approach leaves the scale breaks and the bins unaligned (offset by 0.5 days here). Of course this can be improved by specifying a bin boundary or manually passing breaks but this gets a bit fiddly and fragile.

Since #5963 there's a better workaround:

ggplot(df, aes(date)) +
  geom_histogram(breaks = function(x) { scales::breaks_width("1 week")(as.Date(range(x))) }) +
  scale_x_date(date_breaks = "1 week")

Created on 2024-10-25 with reprex v2.1.1

which is the result I want. However, there's duplication of the breaks and transforms between the scale and the stat. Ideally I'd like a way to request stat_bin() to just use the scale's breaks.

It's technically possible, since StatBin::compute_group (where the bins are computed) already has access to the scale object, but I'm not sure if it violates any sort of ggplot API encapsulation principles to have the scale directly affecting the stat's output in the way I'm proposing.

The same situation applies for stat_bin_2d() and stat_summary_bin(). I'd be happy to open a PR if there's agreement about the idea. I'm imagining either new value/s for breaks or a new param mutually exclusive with breaks that lets users choose to use the corresponding scale's major or minor breaks for the stat's binning breaks.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions