Skip to content

QST: Consistently apply Welford Method and Kahan Summation in roll_xxx functions #59715

Open
@kaixiongg

Description

@kaixiongg

Research

  • I have searched the [pandas] tag on StackOverflow for similar questions.

  • I have asked my usage related question on StackOverflow.

Link to question on StackOverflow

https://stackoverflow.com/questions/78951576/pandas-consistently-apply-welford-method-and-kahan-summation-in-roll-xxx-functi

Question about pandas

For the following functions:

1.def nancorr
2.cdef void add_var
3.cdef void add_skew
4.cdef void add_mean

It appears that both the Welford method and Kahan summation are taken into account. However, for second-order functions like correlation and variance, only the Welford method is used without Kahan summation for the means (meanx or meany). For third-order functions like skewness, only Kahan summation for the naive one-pass algorithm is employed without using Welford.

My question is: How does the Pandas community decide which method to use for stable precision? If our goal is to achieve the highest possible stability, it seems that all these functions should utilize a combination of Welford and Kahan methods.

Could you please clarify the rationale behind these choices?

Metadata

Metadata

Assignees

No one assigned

    Labels

    API DesignNeeds DiscussionRequires discussion from core team before further actionPerformanceMemory or execution speed performanceReduction Operationssum, mean, min, max, etc.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions