Skip to content

API: update pd.cut with the new categorical integration #8077

Closed
@jorisvandenbossche

Description

@jorisvandenbossche

xref #8074. Some issues about pd.cut/qcut (http://pandas.pydata.org/pandas-docs/stable/generated/pandas.cut.html#pandas.cut):

  • If the input is a Series, also return a Series?
  • The produced categorical is not ordered, while the Categorical constructor is by default. Also in the case of cut it seems logical that it would be ordered?
  • The docstring of cut makes use of labels, but I think here the individual categories are meant, and not the numerical representation (codes) (still another use of labels, good we changed that name! :-)), but I am not fully sure of the explanation:
    • "Labels to use for bin edges" -> why the 'edges'? Are it not just the labels for the bins itself? (this sets what we now call the 'levels'):

      In [23]: pd.cut([1,2,3,4], bins=3, labels=['a', 'b', 'c'])
      Out[23]:
      a
      a
      b
      c
      Levels (3, object): [a, b, c]
      
    • "or False to return integer bin labels" -> this is what now is called 'codes'?

    • should we rename this to levels/categories?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions