Open
Description
Feature Type
-
Adding new functionality to pandas
-
Changing existing functionality in pandas
-
Removing existing functionality in pandas
Problem Description
.info()
method describes a DataFrame by each column dtype and count of non-null values, but, IMO, misses an opportunity to be more valuable by also displaying memory usage of each column.
Feature Description
I think, thousand of hours of human time would be saved if this would be a build-in feature with "memory_usage='by_column'"
and "memory_usage='by_column_deep'"
argument options.
Alternative Solutions
The alternative way to see all "technical" information in by-column form in one table is to create the following "Frankenstein":
import pandas
def better_info(df: pandas.DataFrame) -> None:
import io
import sys
print(f"{sys.getsizeof(df) / 1024} KB")
buffer = io.StringIO()
df.info(buf=buffer)
lines = buffer.getvalue().splitlines()
df = (pd.DataFrame([x.split() for x in lines[5:-2]], columns=lines[3].split())
.drop('Count',axis=1)
.rename(columns={'Non-Null':'Non-Null Count'})) \
.join(
pd.DataFrame(
[(col, df[col].memory_usage(deep=True)) for col in df.columns],
columns=['Column', 'Memory Usage (bytes)']
).set_index('Column'),
on='Column'
) \
.drop(columns=["#"])
print(df)
Resulting in some output looking like this:
Additional Context
I searched for similar suggestions in repo issues and have not found a duplicate.