Skip to content

The apply function of a DataFrame is called twice on the first row #6753

Closed
@anton-d

Description

@anton-d

When calling the apply function of a DataFrame its called twice for the first row. The following code

df = DataFrame({"a":["x", "y"], "b":[1,2]})
def identity(row):
  print tuple(row)
  return row
df2=df.apply(identity, axis=1)

prints

('x', 1)
('x', 1)
('y', 2)

The result df2 is not affected by this behavior. However, when the function called by apply (identity in the example above) has side-effects, this can lead to very surprising and unexpected effects.
It would be good to have at least a note on this behavior in the documentation of the apply function.

This is related to #2656 and #2936, where this behaviour was reported for calling apply after groupby. Here it appears for a DataFrame.

INSTALLED VERSIONS

commit: None
python: 2.7.3.final.0
python-bits: 64
OS: Linux
OS-release: 3.2.0-4-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.utf8

pandas: 0.13.1
Cython: 0.15.1
numpy: 1.8.1
scipy: 0.13.3
statsmodels: None
IPython: 1.2.1
sphinx: 1.1.3
patsy: None
scikits.timeseries: None
dateutil: 2.2
pytz: 2014.2
bottleneck: None
tables: 2.3.1
numexpr: 2.0.1
matplotlib: 1.3.1
openpyxl: 1.5.8
xlrd: 0.6.1
xlwt: 0.7.4
xlsxwriter: None
sqlalchemy: None
lxml: None
bs4: None
html5lib: None
bq: None
apiclient: None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions