Skip to content

pd.concat loses frequency attribute for 'continuous' DataFrame appends #3232

Closed
@nehalecky

Description

@nehalecky

Hey all,

I have a DataFrame (df) that stores live sensor data that is captured at a specific frequency. New raw data from sensor is updated at a set interval (an attempt at bandwidth conservation), which is parsed into a new df object.

These new update dataframes are of the same frequency, and contain data that is 'continuous' in time (i.e., they pick up right where the last timestamp left off), and ultimately I would like to append this new data to the existing dataframe while preserving the main dataframe frequency attribute. I tried by using a concat of old and new dataframes, however, it seems that concat doesn't check this case for continuous time series, and loses its frequency attribute. This can be reproduced in code below:

import pandas as pd
import numpy as np
dr = pd.date_range('01-Jan-2013', periods=100, freq='50L', tz='UTC')
df = pd.DataFrame(np.random.randn(100, 2), index=dr)
df.index
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-01-01 00:00:00, ..., 2013-01-01 00:00:04.950000]
Length: 100, Freq: 50L, Timezone: UTC

These guys look good:

#Preserves frequency
print df[:50].index
print df[50:].index
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-01-01 00:00:00, ..., 2013-01-01 00:00:02.450000]
Length: 50, Freq: 50L, Timezone: UTC
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-01-01 00:00:02.500000, ..., 2013-01-01 00:00:04.950000]
Length: 50, Freq: 50L, Timezone: UTC

However, these guys, together, forget where they came from:

#Loses frequency
pd.concat([df[:50], df[50:]]).index
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-01-01 00:00:00, ..., 2013-01-01 00:00:04.950000]
Length: 100, Freq: None, Timezone: UTC

I currently get around this with a resample of the resulting df to set frequency, which isn't that big of a deal, however, thought I'd mention it so that a more elegant behavior could be implemented. I'll try and take a look when I have time, but I know that all you here are so much more familiar with pandas internals. Any pointers?
And, as always, thank you! :)

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugDatetimeDatetime data dtypeReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions