Skip to content

issue#27482 - added a check for if obj is instance of type in _isna-new #27664

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Aug 20, 2019
Merged

issue#27482 - added a check for if obj is instance of type in _isna-new #27664

merged 5 commits into from
Aug 20, 2019

Conversation

joy-rosie
Copy link
Contributor

@joy-rosie joy-rosie commented Jul 30, 2019

@joy-rosie
Copy link
Contributor Author

Can someone help me; when I run git diff upstream/master -u -- "*.py" | flake8 --diff
I get: fatal: bad revision 'upstream/master'

@jbrockmendel
Copy link
Member

Needs a test, don’t worry too much about the git thing

@joy-rosie
Copy link
Contributor Author

could you point me to the file where the test would go for this?

@joy-rosie
Copy link
Contributor Author

I had a quick look, would it be okay to go in this function: https://github.com/pandas-dev/pandas/blob/master/pandas/tests/dtypes/test_missing.py#L81

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this logic needs to be replicated in _isnew_old as well. tests are needed.

may want to move this to the last clause of the if/else. performance tests need to be run for this.

@jreback jreback added the Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate label Jul 31, 2019
@joy-rosie
Copy link
Contributor Author

joy-rosie commented Jul 31, 2019

this logic needs to be replicated in _isnew_old as well. tests are needed.

may want to move this to the last clause of the if/else. performance tests need to be run for this.

I added it above this elif https://github.com/joy-rosie/pandas/blob/issue-27482-isnull/pandas/core/dtypes/missing.py#L138 because that is causing the error. I will add to _isna_old and also add tests, not sure how to add perfomance tests.

@joy-rosie
Copy link
Contributor Author

I have added the tests and the statement to _isna_old, please let me know how I can run the performance tests

@jbrockmendel
Copy link
Member

please let me know how I can run the performance tests

https://pandas.pydata.org/pandas-docs/stable/development/contributing.html#running-the-performance-test-suite

@joy-rosie
Copy link
Contributor Author

joy-rosie commented Aug 3, 2019

Hello, I ran the performance tests as shown in the link and this is the final output:

. before after ratio
[143bc34] [7aef72c]

  •  21.7±0.5ms       29.6±0.6ms     1.37  indexing.NumericSeriesIndexing.time_loc_scalar(<class 'pandas.core.indexes.numeric.UInt64Index'>, 'nonunique_monotonic_inc')
    
  •     409±5μs         498±10μs     1.22  indexing.NumericSeriesIndexing.time_ix_scalar(<class 'pandas.core.indexes.numeric.Int64Index'>, 'nonunique_monotonic_inc')
    
  • 12.4±0.09ms       15.0±0.1ms     1.21  index_object.SetOperations.time_operation('int', 'symmetric_difference')
    
  •  6.12±0.2ms       6.99±0.3ms     1.14  stat_ops.SeriesMultiIndexOps.time_op(0, 'median')
    
  • 1.67±0.06μs       1.87±0.1μs     1.12  index_cached_properties.IndexCache.time_values('PeriodIndex')
    
  •   332±0.6ns         369±10ns     1.11  timestamp.TimestampOps.time_tz_localize('US/Eastern')
    
  • 1.01±0.02μs      1.12±0.02μs     1.11  index_cached_properties.IndexCache.time_inferred_type('RangeIndex')
    
  • 2.14±0.05ms      2.37±0.04ms     1.11  reindex.LevelAlign.time_align_level
    
  •     329±2ns          362±9ns     1.10  timestamp.TimestampOps.time_tz_localize(tzutc())
    
  •   331±0.5ns          365±8ns     1.10  timestamp.TimestampOps.time_tz_localize(<UTC>)
    
  •  4.64±0.2μs      4.21±0.05μs     0.91  timedelta.TimedeltaConstructor.time_from_missing
    
  • 1.71±0.02μs      1.55±0.05μs     0.91  period.PeriodProperties.time_property('M', 'year')
    
  • 1.42±0.08ms      1.29±0.01ms     0.90  ctors.SeriesConstructors.time_series_constructor(<function list_of_lists>, False, 'int')
    
  •    567±50μs        511±0.9μs     0.90  groupby.GroupByMethods.time_dtype_as_field('float', 'head', 'transformation')
    
  •  1.04±0.4ms          936±3μs     0.90  groupby.GroupByMethods.time_dtype_as_group('int', 'cumsum', 'direct')
    
  • 1.68±0.01μs       1.51±0.1μs     0.90  period.PeriodProperties.time_property('min', 'second')
    
  • 1.48±0.08ms      1.34±0.02ms     0.90  ctors.SeriesConstructors.time_series_constructor(<function list_of_lists>, True, 'int')
    
  • 2.41±0.05ms      2.16±0.09ms     0.90  reindex.LevelAlign.time_reindex_level
    
  • 1.69±0.04μs      1.52±0.06μs     0.90  period.PeriodProperties.time_property('M', 'month')
    
  • 1.69±0.02μs      1.51±0.06μs     0.89  period.PeriodProperties.time_property('M', 'day')
    
  •    586±60μs        523±0.7μs     0.89  groupby.GroupByMethods.time_dtype_as_field('datetime', 'tail', 'transformation')
    
  • 1.69±0.01μs      1.49±0.06μs     0.88  period.PeriodProperties.time_property('min', 'day')
    
  •  1.43±0.2ms         1.24±0ms     0.87  groupby.GroupByMethods.time_dtype_as_group('int', 'rank', 'transformation')
    
  •    609±90μs        528±0.9μs     0.87  groupby.GroupByMethods.time_dtype_as_group('int', 'quantile', 'transformation')
    
  •  1.46±0.2ms         1.24±0ms     0.85  groupby.GroupByMethods.time_dtype_as_group('int', 'rank', 'direct')
    
  •  7.26±0.1ms       6.01±0.3ms     0.83  algorithms.Duplicated.time_duplicated('first', 'int')
    
  •    348±80μs          287±1μs     0.83  groupby.GroupByMethods.time_dtype_as_field('float', 'last', 'transformation')
    
  •  1.69±0.3ms      1.36±0.01ms     0.81  groupby.GroupByMethods.time_dtype_as_field('int', 'value_counts', 'direct')
    
  •   90.4±20μs       65.4±0.1μs     0.72  algorithms.SortIntegerArray.time_argsort(1000)
    
  •   422±200μs        289±0.9μs     0.68  groupby.GroupByMethods.time_dtype_as_field('float', 'last', 'direct')
    
  •   887±300μs          553±1μs     0.62  groupby.GroupByMethods.time_dtype_as_field('object', 'head', 'transformation')
    

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.

Machine information:
{
"arch": "x86_64",
"cpu": "Intel(R) Core(TM) i5-3230M CPU @ 2.60GHz",
"machine": "josie-lenovo-ideapad",
"num_cpu": "4",
"os": "Linux 4.18.0-25-generic",
"ram": "3897484",
"version": 1
}

@joy-rosie
Copy link
Contributor Author

joy-rosie commented Aug 3, 2019

I feel like it is a bit random where the performance has gone down and where its gone up

@jbrockmendel
Copy link
Member

I feel like it is a bit random where the performance has gone down and where its gone up

Yah, this is a common issue with asv (especially in cases where there are no non-noise changes). The usual solution is to re-run it a couple of times and pick out the changes that are consistent across runs.

@jreback
Copy link
Contributor

jreback commented Aug 4, 2019

this also needs a whatsnew note, put in 1.0 in Missing bug fix section

@joy-rosie
Copy link
Contributor Author

sorry to be a pain but where do i add a whatsnew note?

I am also finding it difficult to get any reliable results for the performance tests, I have ran it a few times and there are different results each time

Is there anything i can do there?

@jbrockmendel
Copy link
Member

sorry to be a pain but where do i add a whatsnew note?

Don't worry about it, we were all new here at one point. whatsnew note goes in doc/source/whatsnew/0.25.1.rst

I am also finding it difficult to get any reliable results for the performance tests, I have ran it a few times and there are different results each time

That usually indicates that there isn't any real impact. The non-asv alternative is to use Ipython %timeit directly on the affected function(s)

@joy-rosie
Copy link
Contributor Author

joy-rosie commented Aug 13, 2019

That usually indicates that there isn't any real impact. The non-asv alternative is to use Ipython %timeit directly on the affected function(s)
for benchmarking I am running the test function:

import pandas as pd
from pandas.util import testing as tm
isna_f = pd.isna
pd.__version__
%%timeit -n 10000

assert not isna_f(1.0)
assert isna_f(None)
assert isna_f(pd.np.nan)
assert not isna_f(pd.np.inf)
assert not isna_f(-pd.np.inf)

# # type
# assert not isna_f(type(pd.Series()))
# assert not isna_f(type(pd.DataFrame()))

# series
for s in [
    tm.makeFloatSeries(),
    tm.makeStringSeries(),
    tm.makeObjectSeries(),
    tm.makeTimeSeries(),
    tm.makePeriodSeries(),
]:
    assert isinstance(isna_f(s), pd.Series)

# frame
for df in [
    tm.makeTimeDataFrame(),
    tm.makePeriodFrame(),
    tm.makeMixedDataFrame(),
]:
    result = isna_f(df)
    expected = df.apply(isna_f)
    tm.assert_frame_equal(result, expected)

Outputs:

old:

'0.25.0'
33 ms ± 1.74 ms per loop (mean ± std. dev. of 7 runs, 10000 loops each)

new:

'0.25.0+71.g7aef72ced'
33 ms ± 1.8 ms per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Please let me know if you think I should run the benchmark on anything else

Don't worry about it, we were all new here at one point. whatsnew note goes in doc/source/whatsnew/0.25.1.rst

Will this go into bugfixes or missing?

@jbrockmendel
Copy link
Member

Will this go into bugfixes or missing?

Yes. Probably in the "Missing" section

@joy-rosie
Copy link
Contributor Author

I added into whatsnew and it seems like there are some things failing but I am not sure why a change in whatsnew could cause these particular failures?

@TomAugspurger
Copy link
Contributor

Restarted the build. That test is randomly failing and we haven't diagnosed the cause yet.

@joy-rosie
Copy link
Contributor Author

Is this ready to go now? Let me know if there are any remaining bits.

@jbrockmendel
Copy link
Member

LGTM

@TomAugspurger TomAugspurger merged commit e118b1d into pandas-dev:master Aug 20, 2019
@TomAugspurger TomAugspurger added this to the 0.25.1 milestone Aug 20, 2019
TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this pull request Aug 20, 2019
…s-dev#27664)

* added a check for if obj is instance of type in _isna-new
TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this pull request Aug 20, 2019
…e of type in _isna-new (pandas-dev#27664)

* added a check for if obj is instance of type in _isna-new
TomAugspurger added a commit that referenced this pull request Aug 20, 2019
… in _isna-new (#27664) (#28041)

* added a check for if obj is instance of type in _isna-new
galuhsahid pushed a commit to galuhsahid/pandas that referenced this pull request Aug 25, 2019
…s-dev#27664)

* added a check for if obj is instance of type in _isna-new
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pd.isnull() raises AttributeError on Pandas type objects
4 participants