Skip to content

PERF: Remove unnecessary copies in sorting functions (#33917) #34192

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Jun 3, 2020

Conversation

mproszewska
Copy link
Contributor

This PR fixes pereformance regression after commit dec736f3f by removing unnecessary copies if key=None in sorting functions.

setup = """
import pandas as pd
import numpy as np
N = 10 ** 5
idx = pd.date_range(start="1/1/2000", periods=N, freq="s")
s = pd.Series(np.random.randn(N), index=idx)
"""

import timeit
timeit.timeit("s.sort_index()",setup=setup, number=100000)

# master
# 41.245621485984884
# now
# 14.826273362035863

@pep8speaks
Copy link

pep8speaks commented May 16, 2020

Hello @mproszewska! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-06-03 01:25:38 UTC

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add an asv which hits this. I think this regression is in 1.1 so no whatsnew is needed.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also in tests where this is hit, can you assert that values is not modified

@jreback jreback added Performance Memory or execution speed performance Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels May 17, 2020
@mproszewska
Copy link
Contributor Author

also in tests where this is hit, can you assert that values is not modified

This is used in function that sorts series by index, so I believe that values=index will be modified

@TomAugspurger TomAugspurger added this to the 1.1 milestone May 22, 2020
Comment on lines 177 to 186
class SortIndexSeries:
def setup(self):
N = 10 ** 5
idx = pd.date_range(start="1/1/2000", periods=N, freq="s")
self.s = pd.Series(np.random.randn(N), index=idx)

def time_sort_index(self):
self.s.sort_index()


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this. SortIndex.time_sort_index already covers this code path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I haven't checked timeseries before adding asv

@TomAugspurger
Copy link
Contributor

Can you add a release note stating that we fixed the performance regression in sort_index?

@mproszewska
Copy link
Contributor Author

Can you add a release note stating that we fixed the performance regression in sort_index?

Should I add note even though regression was in 1.1?

@jreback
Copy link
Contributor

jreback commented May 22, 2020

Can you add a release note stating that we fixed the performance regression in sort_index?

Should I add note even though regression was in 1.1?

no this is fine (just add the asvs)

@mproszewska
Copy link
Contributor Author

Can you add a release note stating that we fixed the performance regression in sort_index?

Should I add note even though regression was in 1.1?

no this is fine (just add the asvs)

Isn't timeseries.SortIndex.time_sort_index enough?

@jreback
Copy link
Contributor

jreback commented May 23, 2020

what does it show pls post a run

@mproszewska
Copy link
Contributor Author

     <perf>           <master>  
+        158±50μs         367±30μs     2.32  timeseries.SortIndex.time_sort_index(True)
+      3.84±0.4ms       5.94±0.5ms     1.55  timeseries.DatetimeAccessor.time_dt_accessor_normalize(None)
-      3.31±0.1ms       2.47±0.3ms     0.75  timeseries.ResampleDataFrame.time_method('max')

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks fine

can u merge master and ping on green

@mproszewska
Copy link
Contributor Author

looks fine

can u merge master and ping on green

ping

@jreback jreback merged commit fb7bf94 into pandas-dev:master Jun 3, 2020
@jreback
Copy link
Contributor

jreback commented Jun 3, 2020

thanks @mproszewska

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Performance regression in timeseries.SortIndex.time_sort_index
4 participants