-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: preserve dtypes in interpolate #6291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: preserve dtypes in interpolate #6291
Conversation
@TomAugspurger no....this is where it gets tricky you basically split the block's values then create and return 2 blocks from the Block routine (the apply at the BlockManager level willcombine them properly). downcast, putmask, eval, convert all do this; they do it item by item (which you don't want to do) give a shot and I can help |
you want to do this check at a much lower level, in Block.interpolate; e.g. instead of passing every array to the interpolate routines, you just selectively pass, then reassemble |
@jreback two things
|
@@ -823,6 +823,12 @@ def interpolate(self, method='pad', axis=0, index=None, | |||
m = None | |||
|
|||
if m is not None: | |||
# Skip interpolating this block if no NaNs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is fine; but you of course have the problem if say part of block is interpolated and part is not. but better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. I'm still looking into how to handle blocks with some good cols, and some that need interpolating.
@TomAugspurger you could do this on
|
you could try to a-prior split blocks (or post-process the changed ones)....can do later PR |
this looks good...you punted the splitting blocks? |
The perf isn't looking good (assuming I'm running it correctly). There's an 80x slowdown on one of my new vbenches. I'll look at it closer tomorrow and ping you. On Feb 13, 2014, at 9:35 PM, "jreback" <[email protected]mailto:[email protected]> wrote: this looks good...you punted the splitting blocks? — |
Am I doing this correctly? I'm running
The script runs and here are the interpolate related ones:
But when I checkout those commits manually, I get:
So it's faster when I actually check it out, not 80x slower.
|
no...those are the actual running times in ms, not the ratio...
I make sure its checked in and pushed because it pulls from the remote repo |
Thanks. This makes more sense.
|
@@ -823,6 +823,11 @@ def interpolate(self, method='pad', axis=0, index=None, | |||
m = None | |||
|
|||
if m is not None: | |||
if isinstance(self, (IntBlock, BoolBlock)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you want me to do this check in on the self._interpolate_with_fill
part a few lines up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
Here's the perf for the fillna parts:
I fixed that typo on vbench/merge_join also. |
@@ -802,6 +802,16 @@ def interpolate(self, method='pad', axis=0, index=None, | |||
values=None, inplace=False, limit=None, | |||
fill_value=None, coerce=False, downcast=None, **kwargs): | |||
|
|||
def check_int_bool(self, inplace): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can also do this by self.is_integer, self.is_bool, self.is_timedelta....
these are defined for all blocks (but not a biggie)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback, are you able to view this PR on github? I’m getting an error every
time I try to view it. Not any other pages though.
Anyway I think it’s ready to go.
On Feb 14, 2014, at 11:22 AM, jreback <[email protected]mailto:[email protected]> wrote:
In pandas/core/internals.py:
@@ -802,6 +802,16 @@ def interpolate(self, method='pad', axis=0, index=None,
values=None, inplace=False, limit=None,
fill_value=None, coerce=False, downcast=None, **kwargs):
def check_int_bool(self, inplace):
you can also do this by self.ishttp://self.is_integer, self.ishttp://self.is_bool, self.ishttp://self.is_timedelta.... these are defined for all blocks (but not a biggie)
—
Reply to this email directly or view it on GitHubhttps://github.com//pull/6291/files#r9756105.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
github seems kind of flaky now
will try later
On Feb 14, 2014, at 4:39 PM, Tom Augspurger [email protected] wrote:
In pandas/core/internals.py:
@@ -802,6 +802,16 @@ def interpolate(self, method='pad', axis=0, index=None,
values=None, inplace=False, limit=None,
fill_value=None, coerce=False, downcast=None, **kwargs):
@jreback, are you able to view this PR on github? I’m getting an error every time I try to view it. Not any other pages though. Anyway I think it’s ready to go. On Feb 14, 2014, at 11:22 AM, jreback <[email protected]mailto:[email protected]> wrote: In pandas/core/internals.py:def check_int_bool(self, inplace):
@@ -802,6 +802,16 @@ def interpolate(self, method='pad', axis=0, index=None, values=None, inplace=False, limit=None, fill_value=None, coerce=False, downcast=None, **kwargs): + def check_int_bool(self, inplace):
you can also do this by self.ishttp://self.is_integer, self.ishttp://self.is_bool, self.ishttp://self.is_timedelta.... these are defined for all blocks (but not a biggie) — Reply to this email directly or view it on GitHubhttps://github.com/BUG: preserve dtypes in interpolate #6291/files#r9756105.
—
Reply to this email directly or view it on GitHub.
@TomAugspurger looks like github finally works again...pls rebase and can get this in (and if you want to change to |
github emailed me back - they had a bug that was preventing things from working today fixed now |
Closing without merging in favor of #6378. |
Closes #6290
@jreback is there a good way to update part of a BlockManager inplace? I have a new BlockManager from
interpolate
and the original BlockManager (or DataFrame) which contains some "all good" columns that didn't get interpolate on, and some bad columns that did get interpolated on.For the non inplace version I'm calling
self._constructor
on the BlockManager returned from interpolating and concating that to the all good columns.