Skip to content

Unexpected behaviour / potential bug in Dataframe.shift() with duplicated column titles #9092

Closed
@koschink

Description

@koschink

Hi,

I have recently come across an unexpexted behaviour of the dataframe "shift" function.
When column titles are duplicated (or present more often), a column-specific shift using index-based positioning (by "iloc") does not only shift the specified columns, but all columns sharing the same name. From the behaviour I guess that internally, the column names and not the indices are used, causing all identically named columns to shift.
This behaviour did not occur (at least to my knowledge ) with v. 0.13, but is present in both 0.15.1 and 0.15.2.
Please find attached a small code snippet which demonstrates the behaviour, and the system information.

Thanks a lot for any help

Kay

Begin example code

from pandas import *
import pandas as pd


data = DataFrame(np.random.randn(20, 5)) # Generating a random data frame
data2 = data.copy()
data3 = data.copy()
shifting_matrix = [1,2,3,4,5] # shifting matrix - we want to shift the columns by these values


#Shifting the columns:  Expected behaviour. Column 1 is shifter by 1, Column 2 by 2 and so on

for position, movement in enumerate(shifting_matrix):
    print position 
    print movement
    data.iloc[:, position] = data.iloc[:,position].shift(movement)



## Changing the titles of the dataframe
titles = [1,1,1,1,1]
data2.columns = titles

# Shifting the columns again. Since we are using index based positions, the shifting behaviour should be the same as above
for position, movement in enumerate(shifting_matrix):
    print position 
    print movement
    data2.iloc[:, position] = data2.iloc[:,position].shift(movement)



titles2 = [1,1,2,2,1] # Making mixed column titles, duplicating
data3.columns = titles2
# Shifting the columns again. Since we are using index based positions, the shifting behaviour should be the same as above
for position, movement in enumerate(shifting_matrix):
    print position 
    print movement
    data3.iloc[:, position] = data3.iloc[:,position].shift(movement)

print data
print data2
print data3

End example code

INSTALLED VERSIONS

commit: None
python: 2.7.6.final.0
python-bits: 32
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: nb_NO

pandas: 0.15.2
nose: 1.3.3
Cython: 0.20.1
numpy: 1.8.1
scipy: 0.14.0
statsmodels: 0.5.0
IPython: 2.1.0
sphinx: 1.2.2
patsy: 0.2.1
dateutil: 2.2
pytz: 2014.3
bottleneck: 0.8.0
tables: None
numexpr: 2.4
matplotlib: 1.3.1
openpyxl: 2.1.0
xlrd: None
xlwt: 0.7.5
xlsxwriter: 0.5.8
lxml: 3.3.5
bs4: 4.3.2
html5lib: 0.999
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: 0.9.4
pymysql: None
psycopg2: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugIndexingRelated to indexing on series/frames, not to indexes themselvesReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions