Skip to content

[26660] compare int with double by subtracting during describe percen… #26768

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

jkovacevic
Copy link
Contributor

I am not sure whether I had put my issue in whatsnew entry in appropriate place.

@gfyoung gfyoung added the Numeric Operations Arithmetic, Comparison, and Logical operations label Jun 10, 2019
@codecov
Copy link

codecov bot commented Jun 10, 2019

Codecov Report

Merging #26768 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26768      +/-   ##
==========================================
- Coverage   91.71%   91.71%   -0.01%     
==========================================
  Files         178      178              
  Lines       50755    50755              
==========================================
- Hits        46552    46548       -4     
- Misses       4203     4207       +4
Flag Coverage Δ
#multiple 90.3% <100%> (ø) ⬆️
#single 41.22% <0%> (-0.09%) ⬇️
Impacted Files Coverage Δ
pandas/io/formats/format.py 97.9% <100%> (ø) ⬆️
pandas/io/gbq.py 78.94% <0%> (-10.53%) ⬇️
pandas/core/frame.py 96.88% <0%> (-0.12%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5574a9f...0946ae6. Read the comment docs.

@codecov
Copy link

codecov bot commented Jun 10, 2019

Codecov Report

Merging #26768 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26768      +/-   ##
==========================================
- Coverage   91.71%   91.71%   -0.01%     
==========================================
  Files         178      178              
  Lines       50771    50776       +5     
==========================================
+ Hits        46567    46570       +3     
- Misses       4204     4206       +2
Flag Coverage Δ
#multiple 90.31% <100%> (ø) ⬆️
#single 41.19% <0%> (-0.09%) ⬇️
Impacted Files Coverage Δ
pandas/io/formats/format.py 97.9% <100%> (ø) ⬆️
pandas/io/gbq.py 78.94% <0%> (-10.53%) ⬇️
pandas/core/indexes/interval.py 96.11% <0%> (-0.32%) ⬇️
pandas/core/frame.py 96.88% <0%> (-0.12%) ⬇️
pandas/io/excel/_base.py 91.92% <0%> (+0.11%) ⬆️
pandas/core/generic.py 94.1% <0%> (+0.19%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d47fc0c...293da55. Read the comment docs.

Copy link
Member

@gfyoung gfyoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

cc @jreback

@jkovacevic jkovacevic force-pushed the issue26660_describe-percentiles-formatting branch from a868714 to 894c73e Compare June 10, 2019 18:29
@jkovacevic jkovacevic force-pushed the issue26660_describe-percentiles-formatting branch from 0d5450e to 5a73741 Compare June 10, 2019 18:40
@@ -1246,7 +1246,7 @@ def format_percentiles(percentiles):
raise ValueError("percentiles should all be in the interval [0,1]")

percentiles = 100 * percentiles
int_idx = (percentiles.astype(int) == percentiles)
int_idx = (abs(percentiles.astype(int) - percentiles) < 1e-4)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reasoning for using 1e-4? May make more sense just to leverage np.allclose

Copy link
Contributor Author

@jkovacevic jkovacevic Jun 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good insight - it makes code more readable. Instead of np.allclose, I've opted out for np.isclose, which returns array of booleans. Reasoning for that is that further down in the function that mask is used when determining how to format integers (where mask is True) and decimal numbers (where mask is False).

Another approach would be that if at least one percentile is decimal, all percentiles should be represented as decimals. This is currently not the case, for instance if percentiles to describe are: [0, 0.5, 0.334] the method will return ["0%", "50%", "33.4%"]. This approach would yield: ["0.0%", "50.0%", "33.4%"].

I wanted to change as little as possible, so I returned mask of booleans from this expression, but if it fits your business logic to change to second approach, I can modify the code to that behavior also.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. This is fine as is - really just wanted to circumvent hard coding thresholds like before. Thanks for reviewing in detail!

@jreback jreback added this to the 0.25.0 milestone Jun 10, 2019
@jreback jreback added the Output-Formatting __repr__ of pandas objects, to_string label Jun 10, 2019
@@ -2740,6 +2740,14 @@ def test_format_percentiles():
fmt.format_percentiles([0.1, 0.5, 'a'])


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add an additional test that uses .describe, like you did in the OP: #26660 (comment), add in pandas/tests/frame/test_analytics

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the test you mentioned. Please comment if it is not sufficient.

@jreback jreback merged commit 16e8ea8 into pandas-dev:master Jun 11, 2019
@jreback
Copy link
Contributor

jreback commented Jun 11, 2019

thanks @jkovacevic

@jkovacevic jkovacevic deleted the issue26660_describe-percentiles-formatting branch June 11, 2019 05:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Numeric Operations Arithmetic, Comparison, and Logical operations Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Indices of df.describe(percentiles=np.linspace(...)) are mix of integers and decimal numbers
4 participants