Skip to content

DOC: update the Series.memory_usage() docstring #20086

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 40 additions & 13 deletions pandas/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -2696,28 +2696,55 @@ def reindex_axis(self, labels, axis=0, **kwargs):
return self.reindex(index=labels, **kwargs)

def memory_usage(self, index=True, deep=False):
"""Memory usage of the Series
"""
Return the memory usage of the Series.

The memory usage can optionally include the contribution of
the index and of elements of `object` dtype.

Parameters
----------
index : bool
Specifies whether to include memory usage of Series index
deep : bool
Introspect the data deeply, interrogate
`object` dtypes for system-level memory consumption
index : bool, default True
Specifies whether to include the memory usage of the Series index.
deep : bool, default False
If True, introspect the data deeply by interrogating
`object` dtypes for system-level memory consumption, and include
it in the returned value.

Returns
-------
scalar bytes of memory consumed

Notes
-----
Memory usage does not include memory consumed by elements that
are not components of the array if deep=False
int
Bytes of memory consumed.

See Also
--------
numpy.ndarray.nbytes
numpy.ndarray.nbytes : Total bytes consumed by the elements of the
array.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also DataFrame.memory_usage ?

Examples
--------

>>> s = pd.Series(range(3))
>>> s.memory_usage()
104

Not including the index gives the size of the rest of the data, which
is necessarily smaller:

>>> s.memory_usage(index=False)
24

The memory footprint of `object` values is ignored by default:

>>> class MyClass: pass
>>> s = pd.Series(MyClass())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would maybe use an existing python object (something from the standard library, eg decimal.Decimal, IPAddress object, ... or your other favorite python object)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, maybe also useful, to just use some strings. Because of storing them as object (typical gotcha in pandas), there memory is not reflected well by default in this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I like the idea of using strings.

You'll need to make sure the string is is not interned by the interprepter, so don't use just pd.Series(['a', 'b'])

Copy link
Contributor Author

@lebigot lebigot Mar 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use interned strings? They are longer than their pointer, so they also are a good example no?

>>> s = pd.Series(["a", "b"])
>>> s.memory_usage()
96
>>> s.memory_usage(deep=True)
212

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so apparantly it is giving the memory of that object, so for the example that looks fine (but I would expect that if you have repeated interned strings, it might then not give the correct result)

Copy link
Contributor Author

@lebigot lebigot Mar 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A repeated interned string does not give the real memory footprint indeed:

>>> id("a")
4397773352
>>> id("a")
4397773352
>>> s = pd.Series(["a"]*1000)
>>> s.memory_usage()
8080
>>> s.memory_usage(deep=True)
66080

>>> s
0 <__main__.MyClass object at ...>
dtype: object
>>> s.memory_usage()
88
>>> s.memory_usage(deep=True)
120
"""
v = super(Series, self).memory_usage(deep=deep)
if index:
Expand Down