Skip to content

BUG: memory leak in find_stack_level #54628

Closed
@lmmx

Description

@lmmx

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

from pandas.util._exceptions import find_stack_level
find_stack_level()

Issue Description

The source for find_stack_level does not delete the frame reference

def find_stack_level() -> int:
"""
Find the first place in the stack that is not inside pandas
(tests notwithstanding).
"""
import pandas as pd
pkg_dir = os.path.dirname(pd.__file__)
test_dir = os.path.join(pkg_dir, "tests")
# https://stackoverflow.com/questions/17407119/python-inspect-stack-is-slow
frame = inspect.currentframe()
n = 0
while frame:
fname = inspect.getfile(frame)
if fname.startswith(pkg_dir) and not fname.startswith(test_dir):
frame = frame.f_back
n += 1
else:
break
return n

From the inspect docs:

Note

Keeping references to frame objects, as found in the first element of the frame records these functions return, can cause your program to create reference cycles. Once a reference cycle has been created, the lifespan of all objects which can be accessed from the objects which form the cycle can become much longer even if Python’s optional cycle detector is enabled. If such cycles must be created, it is important to ensure they are explicitly broken to avoid the delayed destruction of objects and increased memory consumption which occurs.

Though the cycle detector will catch these, destruction of the frames (and local variables) can be made deterministic by removing the cycle in a finally clause. This is also important if the cycle detector was disabled when Python was compiled or using gc.disable(). For example:

def handle_stackframe_without_leak():
    frame = inspect.currentframe()
    try:
        # do something with the frame
    finally:
        del frame

If you want to keep the frame around (for example to print a traceback later), you can also break reference cycles by using the frame.clear() method.

Expected Behavior

def find_stack_level() -> int:
    """
    Find the first place in the stack that is not inside pandas
    (tests notwithstanding).
    """

    import pandas as pd

    pkg_dir = os.path.dirname(pd.__file__)
    test_dir = os.path.join(pkg_dir, "tests")
    # https://stackoverflow.com/questions/17407119/python-inspect-stack-is-slow
    frame = inspect.currentframe()
    try:
        n = 0
        while frame:
            fname = inspect.getfile(frame)
            if fname.startswith(pkg_dir) and not fname.startswith(test_dir):
                frame = frame.f_back
                n += 1
            else:
                break
    finally:
        del frame
    return n
  • I would also replace the while loop with for frame in iter(lambda: frame.f_back, None)

Installed Versions

N/A

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugPerformanceMemory or execution speed performance

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions