Skip to content

PERF: Improve efficiency of BlockValuesRefs._clear_dead_references(...) #59643

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 28, 2024

Conversation

Tolker-KU
Copy link
Contributor

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

Improves efficiency of BlockValuesRefs._clear_dead_references(...) by allowing for more efficient C code-gen as a function lookup and FastCall is eliminated from a tight loop.

Before

image

After

image

@Tolker-KU Tolker-KU requested a review from WillAyd as a code owner August 28, 2024 18:01
@@ -26,6 +30,10 @@ from pandas._libs.util cimport (
)


cdef extern from "Python.h":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surprised that this also makes a difference - Cython doesn't translate that is None call for a cdef object into the C-equivalent expression already?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does! But PyWeakref_GetObject returns a PyObject* which is not comparable to a PyObject (not-ponter) and None is the latter type. This is the reason for this quite ugly hack. If you know a better way I would really like to hear about it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did Cython give you a compilation error? I would expect None to be a pointer to a PyObject as well and for Cython to handle this within its native syntax.

Of course I am wrong a lot! So I might have the wrong mental model for what Cython is doing, but worth double checking this can't be expressed more naturally first

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I don't want to go down a rabbit hole on this - just double check real quick. If you spend anything more than 2 minutes trying to find an answer, let's just roll with what we've got

Copy link
Contributor Author

@Tolker-KU Tolker-KU Aug 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem at all. Cython gives this error when using None

Error compiling Cython file:
------------------------------------------------------------
...
        # all these objects stay alive, e.g. df.items() for wide DataFrames
        # see GH#55245 and GH#55008
        if force or len(self.referenced_blocks) > self.clear_counter:
            self.referenced_blocks = [
                ref for ref in self.referenced_blocks
                if PyWeakref_GetObject(ref) is not None
                                            ^
------------------------------------------------------------

pandas/_libs/internals.pyx:914:44: Invalid types for 'is_not' (PyObject *, Python object)

@mroeschke mroeschke added Performance Memory or execution speed performance Copy / view semantics labels Aug 28, 2024
@mroeschke mroeschke added this to the 3.0 milestone Aug 28, 2024
@mroeschke mroeschke merged commit ad077aa into pandas-dev:main Aug 28, 2024
53 checks passed
@mroeschke
Copy link
Member

Thanks @Tolker-KU

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Copy / view semantics Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants