-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
POC for New GroupBy Dispatching Module #20485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
} | ||
""" | ||
return { | ||
'any': { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought here is that a dict of dicts can most cleanly describe the metadata that each function may have to pass to Cython
def application_type(self): | ||
return self.func_metadata[self.func_nm]['application'] | ||
|
||
def _any_all_convertor(self, vals): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hoping we don't have a ton of these, but any custom conversion routines can be defined in this module (maybe not even within this class) and simply reference in the metadata dict
raise TypeError("'{}' cannot be applied to a dtype of {}".format( | ||
self.func_nm, self.obj.values.dtype)) | ||
|
||
def _get_result(self, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This func does all the heavy lifting; mostly matches the implementation of _get_cythonized_result
currently in groupby.py
@@ -0,0 +1,186 @@ | |||
import collections | |||
import inspect |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather than put this here and trying to make this too generic. let's do:
- move
pandas.core.groupby
topandas.core.groupby.groupby
(and add a shim to deprecatepandas.core.groupby
, simliar to what we do forpandas.core.categorical
), do as a pre-cursor PR, only doing this move - put this module in
pandas.core.groupby
, call itdispatch.py
# Since this func is called in a loop, the below might be better | ||
# served outside of the loop and passed in? | ||
labels, _, ngroups = self.groupby.grouper.group_info | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see above these should be subclasses
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make the things to dispatch just methods. Otherwise this has to have a lot of magic. Its much simpler to just call a method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Struggling to conceptualize this - so do you think it would be best to have a BaseDispatcher
and then have the various groupings of applications as the subclasses (so here AnyAllDispatcher
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see my comments above
first order of business is to move groupby to a sub module
so can easily create more modules
if you can move underneath pandas/core/groupby then can have a look |
Closing for now - going to take an additional pass at a later date |
This is nowhere near completion but looking for feedback on the direction.
@jreback
git diff upstream/master -u -- "*.py" | flake8 --diff