BUG: preserve dtypes in interpolate #6291

TomAugspurger · 2014-02-06T21:45:18Z

@jreback is there a good way to update part of a BlockManager inplace? I have a new BlockManager from interpolate and the original BlockManager (or DataFrame) which contains some "all good" columns that didn't get interpolate on, and some bad columns that did get interpolated on.

For the non inplace version I'm calling self._constructor on the BlockManager returned from interpolating and concating that to the all good columns.

jreback · 2014-02-06T21:48:59Z

@TomAugspurger no....this is where it gets tricky

you basically split the block's values then create and return 2 blocks from the Block routine (the apply at the BlockManager level willcombine them properly).

downcast, putmask, eval, convert all do this; they do it item by item (which you don't want to do)

give a shot and I can help

jreback · 2014-02-06T21:50:26Z

you want to do this check at a much lower level, in Block.interpolate; e.g. instead of passing every array to the interpolate routines, you just selectively pass, then reassemble

TomAugspurger · 2014-02-07T18:33:44Z

@jreback two things

We raise a ValueError if you interpolate with method='time' on an object that doesn't have a time series index. With this PR if you don't have any NaNs in the Series/Frame you just get it back, no ValueError. If there are any NaNs it still raises. I think this is actually an improvement, but I could be persuaded otherwise.
I'm just doing this on the self._interpolate method in core/internals (the part I added). Would you like to do a similar check for the self._interpolate_with_fill side also?

jreback · 2014-02-07T19:08:42Z

pandas/core/internals.py

@@ -823,6 +823,12 @@ def interpolate(self, method='pad', axis=0, index=None,
            m = None

        if m is not None:
+            # Skip interpolating this block if no NaNs.


this is fine; but you of course have the problem if say part of block is interpolated and part is not. but better.

Yep. I'm still looking into how to handle blocks with some good cols, and some that need interpolating.

jreback · 2014-02-07T19:10:21Z

@TomAugspurger you could do this on _interpolate_with_fill as well. keeping in mind things:

this may break some tests (which expect the float to come back always)
this may introduce a performance issue (isnull is not free; though reasonably cheap)...maybe just give an overall vbench to see if anything pops up

jreback · 2014-02-07T19:25:33Z

you could try to a-prior split blocks (or post-process the changed ones)....can do later PR

jreback · 2014-02-14T03:35:41Z

this looks good...you punted the splitting blocks?

TomAugspurger · 2014-02-14T03:43:33Z

The perf isn't looking good (assuming I'm running it correctly). There's an 80x slowdown on one of my new vbenches. I'll look at it closer tomorrow and ping you.

On Feb 13, 2014, at 9:35 PM, "jreback" <[email protected]mailto:[email protected]> wrote:

this looks good...you punted the splitting blocks?

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/6291#issuecomment-35052784.

TomAugspurger · 2014-02-14T16:00:55Z

Am I doing this correctly? I'm running python test_perf.py -b cab2a93 -H,

cab2a93 is just a merge commit from the hexbin plots earlier today.

The script runs and here are the interpolate related ones:

$cat ../vb_suite.log | grep interpolate
frame_interpolate_some_good                  |   1.8273 |
frame_interpolate_some_good_infer            |   3.2444 |
frame_interpolate                            |  79.5443 |

But when I checkout those commits manually, I get:

cab2a93

In [11]: %timeit df.interpolate()
10 loops, best of 3: 110 ms per loop

master  # should be the same as cab2a93, and it is.

In [5]: %timeit df.interpolate()
10 loops, best of 3: 110 ms per loop

my branch

In [5]: %timeit df.interpolate()
10 loops, best of 3: 78.4 ms per loop

So it's faster when I actually check it out, not 80x slower.

df is the same as the one from the vbench:

df = DataFrame(randn(10000, 100))
df.values[::2] = np.nan

jreback · 2014-02-14T16:03:36Z

no...those are the actual running times in ms, not the ratio...

./test_perf.sh -b <1 prior commit to yours> -t <last commit of yours> -r interpolate

I make sure its checked in and pushed because it pulls from the remote repo

TomAugspurger · 2014-02-14T16:14:38Z

Thanks. This makes more sense.

-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------
frame_interpolate_some_good                  |   1.8317 |   3.9976 |   0.4582 |
frame_interpolate                            |  79.5573 | 109.9803 |   0.7234 |
frame_interpolate_some_good_infer            |   3.3453 |   3.9697 |   0.8427 |
-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------

TomAugspurger · 2014-02-14T16:28:41Z

pandas/core/internals.py

@@ -823,6 +823,11 @@ def interpolate(self, method='pad', axis=0, index=None,
            m = None

        if m is not None:
+            if isinstance(self, (IntBlock, BoolBlock)):


Did you want me to do this check in on the self._interpolate_with_fill part a few lines up?

TomAugspurger · 2014-02-14T17:10:25Z

Here's the perf for the fillna parts:

-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------
frame_fillna_inplace                         |   9.4923 |   9.4840 |   1.0009 |
replace_fillna                               |   1.5057 |   1.4993 |   1.0042 |
-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |

I fixed that typo on vbench/merge_join also.

jreback · 2014-02-14T17:22:07Z

pandas/core/internals.py

@@ -802,6 +802,16 @@ def interpolate(self, method='pad', axis=0, index=None,
                    values=None, inplace=False, limit=None,
                    fill_value=None, coerce=False, downcast=None, **kwargs):

+        def check_int_bool(self, inplace):


you can also do this by self.is_integer, self.is_bool, self.is_timedelta.... these are defined for all blocks (but not a biggie)

@jreback, are you able to view this PR on github? I’m getting an error every
time I try to view it. Not any other pages though.

Anyway I think it’s ready to go.

On Feb 14, 2014, at 11:22 AM, jreback <[email protected]mailto:[email protected]> wrote:

In pandas/core/internals.py:

@@ -802,6 +802,16 @@ def interpolate(self, method='pad', axis=0, index=None,
values=None, inplace=False, limit=None,
fill_value=None, coerce=False, downcast=None, **kwargs):

def check_int_bool(self, inplace):

you can also do this by self.ishttp://self.is_integer, self.ishttp://self.is_bool, self.ishttp://self.is_timedelta.... these are defined for all blocks (but not a biggie)

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/6291/files#r9756105.

github seems kind of flaky now

will try later

On Feb 14, 2014, at 4:39 PM, Tom Augspurger [email protected] wrote:

In pandas/core/internals.py:

@@ -802,6 +802,16 @@ def interpolate(self, method='pad', axis=0, index=None,
values=None, inplace=False, limit=None,
fill_value=None, coerce=False, downcast=None, **kwargs):

def check_int_bool(self, inplace):
@jreback, are you able to view this PR on github? I’m getting an error every time I try to view it. Not any other pages though. Anyway I think it’s ready to go. On Feb 14, 2014, at 11:22 AM, jreback <[email protected]mailto:[email protected]> wrote: In pandas/core/internals.py:
@@ -802,6 +802,16 @@ def interpolate(self, method='pad', axis=0, index=None, values=None, inplace=False, limit=None, fill_value=None, coerce=False, downcast=None, **kwargs): + def check_int_bool(self, inplace):
you can also do this by self.ishttp://self.is_integer, self.ishttp://self.is_bool, self.ishttp://self.is_timedelta.... these are defined for all blocks (but not a biggie) — Reply to this email directly or view it on GitHubhttps://github.com/BUG: preserve dtypes in interpolate #6291/files#r9756105.
—
Reply to this email directly or view it on GitHub.

jreback · 2014-02-15T22:21:07Z

@TomAugspurger looks like github finally works again...pls rebase and can get this in (and if you want to change to is_bool.....etc cool

jreback · 2014-02-16T00:57:44Z

github emailed me back - they had a bug that was preventing things from working today

fixed now

TomAugspurger · 2014-02-17T17:06:56Z

Closing without merging in favor of #6378.

jreback added this to the 0.14.0 milestone Feb 7, 2014

jreback added Dtypes labels Feb 7, 2014

jreback reviewed Feb 7, 2014
View reviewed changes

TomAugspurger reviewed Feb 14, 2014
View reviewed changes

jreback reviewed Feb 14, 2014
View reviewed changes

jreback mentioned this pull request Feb 15, 2014

BUG: Interpolate should be more careful with dtypes #6290

Closed

BUG: preserve dtypes in interpolate

08b2984

TomAugspurger mentioned this pull request Feb 17, 2014

BUG: interpolate should preserve dtypes #6378

Merged

TomAugspurger closed this Feb 17, 2014

TomAugspurger deleted the interpolate-ignore-good branch November 3, 2016 12:37

Uh oh!

BUG: preserve dtypes in interpolate #6291

BUG: preserve dtypes in interpolate #6291

Uh oh!

Conversation

TomAugspurger commented Feb 6, 2014

Uh oh!

jreback commented Feb 6, 2014

Uh oh!

jreback commented Feb 6, 2014

Uh oh!

TomAugspurger commented Feb 7, 2014

Uh oh!

jreback Feb 7, 2014

Choose a reason for hiding this comment

Uh oh!

TomAugspurger Feb 7, 2014

Choose a reason for hiding this comment

Uh oh!

jreback commented Feb 7, 2014

Uh oh!

jreback commented Feb 7, 2014

Uh oh!

jreback commented Feb 14, 2014

Uh oh!

TomAugspurger commented Feb 14, 2014

Uh oh!

TomAugspurger commented Feb 14, 2014

Uh oh!

jreback commented Feb 14, 2014

Uh oh!

TomAugspurger commented Feb 14, 2014

Uh oh!

TomAugspurger Feb 14, 2014

Choose a reason for hiding this comment

Uh oh!

jreback Feb 14, 2014

Choose a reason for hiding this comment

Uh oh!

TomAugspurger commented Feb 14, 2014

Uh oh!

jreback Feb 14, 2014

Choose a reason for hiding this comment

Uh oh!

TomAugspurger Feb 14, 2014

Choose a reason for hiding this comment

Uh oh!

jreback Feb 15, 2014

Choose a reason for hiding this comment

Uh oh!

jreback commented Feb 15, 2014

Uh oh!

jreback commented Feb 16, 2014

Uh oh!

TomAugspurger commented Feb 17, 2014

Uh oh!

Uh oh!