Skip to content

ENH: Support returning the same dtype as the caller for window ops (including extension dtypes) #11446

Open
@sandhujasmine

Description

@sandhujasmine

In sample code below rolling_apply takes an argument 'ix' which is a numpy array of dtype = 'int64' and by the time this array gets to get_type() function, its dtype has changed to 'float64'. I can make an explicit call in get_type() function to change this back: ix = ix.astype('int64'), but was curious why it gets changed.

Example below. I'm on version '0.17.0':

import numpy as np
import pandas as pd


def get_type(ix, df, hours):
    # invoked by rolling_apply to illustrate the problem
    # of rolling_apply changing the dtype of 'ix' array from
    # int64 to float64

    print ix.dtype

    # need to convert index dtype back to int64
    #ix = ix.astype('int64')

    ixv = ix[ix > -1]
    print ixv.dtype

    # the data in ix must be int64 else following fails with 
    # IndexError: arrays used as indices must be of integer (or boolean) type
    h = hours[ixv] - hours[ixv[0]]
    df.iloc[ix[-1]] = h[0]
    return 0.0


# we start out with ix.dtype = int64 but rolling_apply changes this to float64
ix = np.arange(0, 10)
hours = np.random.randint(0, 10, len(ix))
df = pd.DataFrame(np.random.randn(10, 1), columns=['h'])

pd.rolling_apply(ix, window=3, func=get_type, args=(df, hours,))

I also stepped through the code and believe I've identified the source of the problem. I thought I'd report it and see if others see this as an issue before trying to fix. Doing an explicit type change inside the get_type function as in this example also works.

The _process_data_structure() function turns this into a float.

Here's the logic that is explicitly changing the dtype to a float the first time. This can be omitted and the check updated to include 'float':

    if kill_inf and values.dtype == float:
        values = values.copy()
        values[np.isinf(values)] = np.NaN

However, the cython code that I assume does the rolling window, also expects a float64. In this case, maybe an option is to update the dtype after the call_cython function.

Metadata

Metadata

Assignees

No one assigned

    Labels

    ApplyApply, Aggregate, Transform, MapDtype ConversionsUnexpected or buggy dtype conversionsEnhancementWindowrolling, ewma, expanding

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions