Skip to content

POC for New GroupBy Dispatching Module #20485

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

WillAyd
Copy link
Member

@WillAyd WillAyd commented Mar 26, 2018

This is nowhere near completion but looking for feedback on the direction.

@jreback

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

}
"""
return {
'any': {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought here is that a dict of dicts can most cleanly describe the metadata that each function may have to pass to Cython

def application_type(self):
return self.func_metadata[self.func_nm]['application']

def _any_all_convertor(self, vals):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hoping we don't have a ton of these, but any custom conversion routines can be defined in this module (maybe not even within this class) and simply reference in the metadata dict

raise TypeError("'{}' cannot be applied to a dtype of {}".format(
self.func_nm, self.obj.values.dtype))

def _get_result(self, **kwargs):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This func does all the heavy lifting; mostly matches the implementation of _get_cythonized_result currently in groupby.py

@@ -0,0 +1,186 @@
import collections
import inspect
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than put this here and trying to make this too generic. let's do:

  • move pandas.core.groupby to pandas.core.groupby.groupby (and add a shim to deprecate pandas.core.groupby, simliar to what we do for pandas.core.categorical), do as a pre-cursor PR, only doing this move
  • put this module in pandas.core.groupby, call it dispatch.py

# Since this func is called in a loop, the below might be better
# served outside of the loop and passed in?
labels, _, ngroups = self.groupby.grouper.group_info

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above these should be subclasses

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make the things to dispatch just methods. Otherwise this has to have a lot of magic. Its much simpler to just call a method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Struggling to conceptualize this - so do you think it would be best to have a BaseDispatcher and then have the various groupings of applications as the subclasses (so here AnyAllDispatcher)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my comments above

first order of business is to move groupby to a sub module
so can easily create more modules

@jreback
Copy link
Contributor

jreback commented Jul 8, 2018

if you can move underneath pandas/core/groupby then can have a look

@WillAyd
Copy link
Member Author

WillAyd commented Jul 8, 2018

Closing for now - going to take an additional pass at a later date

@WillAyd WillAyd closed this Jul 8, 2018
@WillAyd WillAyd deleted the grpby-dispatch branch December 25, 2018 06:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants