Skip to content

PERF: Unnecessary hash table with RangeIndex #16685

Closed
@chris-b1

Description

@chris-b1

Example

def log_memory():
    import os
    import gc
    import psutil
    for i in range(3):
        gc.collect(i)
    process = psutil.Process(os.getpid())
    mem_usage = process.memory_info().rss / float(2 ** 20)
    print("[Memory usage] {:12.1f} MB".format(
        mem_usage
    ))

In [20]: df = pd.DataFrame({'a': np.arange(1000000)})

In [23]: log_memory()
[Memory usage]        132.4 MB

In [24]: df.loc[5, :]
Out[24]: 
a    5
Name: 5, dtype: int32

In [25]: log_memory()
[Memory usage]        172.2 MB

Rather than materializing the hash table, should directly convert labels into positions. Low priority in my opinion, atypical to be using loc with a RangeIndex.

pandas 0.20.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    IndexingRelated to indexing on series/frames, not to indexes themselvesPerformanceMemory or execution speed performance

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions