Skip to content

ENH: make pandas.DataFrame.info() method able to display memory usage of each column #59690

Open
@Gregory108

Description

@Gregory108

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

.info() method describes a DataFrame by each column dtype and count of non-null values, but, IMO, misses an opportunity to be more valuable by also displaying memory usage of each column.

Feature Description

I think, thousand of hours of human time would be saved if this would be a build-in feature with "memory_usage='by_column'" and "memory_usage='by_column_deep'" argument options.

Alternative Solutions

The alternative way to see all "technical" information in by-column form in one table is to create the following "Frankenstein":

import pandas
def better_info(df: pandas.DataFrame) -> None:
  import io
  import sys

  print(f"{sys.getsizeof(df) / 1024} KB")
  buffer = io.StringIO()
  df.info(buf=buffer)
  lines = buffer.getvalue().splitlines()
  df = (pd.DataFrame([x.split() for x in lines[5:-2]], columns=lines[3].split())
        .drop('Count',axis=1)
        .rename(columns={'Non-Null':'Non-Null Count'})) \
        .join(
            pd.DataFrame(
                [(col, df[col].memory_usage(deep=True)) for col in df.columns],
                columns=['Column', 'Memory Usage (bytes)']
            ).set_index('Column'),
            on='Column'
        ) \
        .drop(columns=["#"])
  print(df)

Resulting in some output looking like this:
image

Additional Context

I searched for similar suggestions in repo issues and have not found a duplicate.

Metadata

Metadata

Assignees

Labels

EnhancementNeeds DiscussionRequires discussion from core team before further actionOutput-Formatting__repr__ of pandas objects, to_string

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions