-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
DOC: update the Series.memory_usage() docstring #20086
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 4 commits
ad7f06f
601ffd7
da08897
6c0205d
6626d67
cea218f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2696,28 +2696,55 @@ def reindex_axis(self, labels, axis=0, **kwargs): | |
return self.reindex(index=labels, **kwargs) | ||
|
||
def memory_usage(self, index=True, deep=False): | ||
"""Memory usage of the Series | ||
""" | ||
Return the memory usage of the Series. | ||
|
||
The memory usage can optionally include the contribution of | ||
the index and of elements of `object` dtype. | ||
|
||
Parameters | ||
---------- | ||
index : bool | ||
Specifies whether to include memory usage of Series index | ||
deep : bool | ||
Introspect the data deeply, interrogate | ||
`object` dtypes for system-level memory consumption | ||
index : bool, default True | ||
Specifies whether to include the memory usage of the Series index. | ||
deep : bool, default False | ||
If True, introspect the data deeply by interrogating | ||
`object` dtypes for system-level memory consumption, and include | ||
it in the returned value. | ||
|
||
Returns | ||
------- | ||
scalar bytes of memory consumed | ||
|
||
Notes | ||
----- | ||
Memory usage does not include memory consumed by elements that | ||
are not components of the array if deep=False | ||
int | ||
Bytes of memory consumed. | ||
|
||
See Also | ||
-------- | ||
numpy.ndarray.nbytes | ||
numpy.ndarray.nbytes : Total bytes consumed by the elements of the | ||
array. | ||
|
||
Examples | ||
-------- | ||
|
||
>>> s = pd.Series(range(3)) | ||
>>> s.memory_usage() | ||
104 | ||
|
||
Not including the index gives the size of the rest of the data, which | ||
is necessarily smaller: | ||
|
||
>>> s.memory_usage(index=False) | ||
24 | ||
|
||
The memory footprint of `object` values is ignored by default: | ||
|
||
>>> class MyClass: pass | ||
>>> s = pd.Series(MyClass()) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would maybe use an existing python object (something from the standard library, eg decimal.Decimal, IPAddress object, ... or your other favorite python object) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Or, maybe also useful, to just use some strings. Because of storing them as object (typical gotcha in pandas), there memory is not reflected well by default in this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I like the idea of using strings. You'll need to make sure the string is is not interned by the interprepter, so don't use just There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why not use interned strings? They are longer than their pointer, so they also are a good example no?
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. so apparantly it is giving the memory of that object, so for the example that looks fine (but I would expect that if you have repeated interned strings, it might then not give the correct result) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A repeated interned string does not give the real memory footprint indeed: >>> id("a")
4397773352
>>> id("a")
4397773352
>>> s = pd.Series(["a"]*1000)
>>> s.memory_usage()
8080
>>> s.memory_usage(deep=True)
66080 |
||
>>> s | ||
0 <__main__.MyClass object at ...> | ||
dtype: object | ||
>>> s.memory_usage() | ||
88 | ||
>>> s.memory_usage(deep=True) | ||
120 | ||
""" | ||
v = super(Series, self).memory_usage(deep=deep) | ||
if index: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe also DataFrame.memory_usage ?