Skip to content

API/BUG: freq retention in value_counts #33830

Open
@jbrockmendel

Description

@jbrockmendel
dti = pd.date_range('2016-01-01', periods=5)

dti.value_counts().index.freq    # <-- None
dti.factorize()[1].freq   # <-- None

mi = pd.MultiIndex.from_arrays([dti, dti])

mi.levels[0].freq   # <-- None

There is a comment in tests.indexes.datetimes.test_datetime test_factorize suggesting that freq should be preserved by factorize, but that is not checked and would fail if it were

        # freq must be preserved
        idx3 = date_range("2000-01", periods=4, freq="M", tz="Asia/Tokyo")
        exp_arr = np.array([0, 1, 2, 3], dtype=np.intp)
        arr, idx = idx3.factorize()
        tm.assert_numpy_array_equal(arr, exp_arr)
        tm.assert_index_equal(idx, idx3)

So the question: do we want to try to preserve freq in factorize?

xref #33677 for the MultiIndex case

Update One more: Categorical:

dti = pd.date_range('2016-01-01', periods=5)
cat = pd.Categorical(dti)
cat.categories.freq   # <-- None

Metadata

Metadata

Assignees

Labels

AlgosNon-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diffBugFrequencyDateOffsetsfreq retentionUser expects "freq" attribute to be preserved

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions