Skip to content

Remove clutter from ext_data #17555

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

jbrockmendel
Copy link
Member

I'm quite bothered by the lack of clarity in setup.py as to what is and is not required for the build. To demonstrate the issue, this PR removes a bunch of gee-I-thought-that-was-necessary clutter from the Extension kwargs.

@gfyoung gfyoung added Clean Internals Related to non-user accessible pandas implementation labels Sep 16, 2017
@codecov
Copy link

codecov bot commented Sep 16, 2017

Codecov Report

Merging #17555 into master will decrease coverage by 0.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17555      +/-   ##
==========================================
- Coverage   91.25%   91.23%   -0.02%     
==========================================
  Files         163      163              
  Lines       49606    49617      +11     
==========================================
+ Hits        45266    45268       +2     
- Misses       4340     4349       +9
Flag Coverage Δ
#multiple 89.02% <ø> (ø) ⬆️
#single 40.19% <ø> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.77% <0%> (-0.1%) ⬇️
pandas/core/indexes/datetimes.py 95.53% <0%> (ø) ⬆️
pandas/core/indexes/interval.py 93.57% <0%> (ø) ⬆️
pandas/core/api.py 100% <0%> (ø) ⬆️
pandas/plotting/_core.py 82.73% <0%> (+0.03%) ⬆️
pandas/io/stata.py 93.71% <0%> (+0.04%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 328c7e1...bb26908. Read the comment docs.

@jreback
Copy link
Contributor

jreback commented Sep 17, 2017

so try changing something in say util.pxd and see if this still builds. It should rebuild the correct files.

@chris-b1
Copy link
Contributor

@jbrockmendel - I'm pretty these depends are necessary for rebuilds to work correctly. As @jreback said, try editing something in util.pxd - things cimporting from it (e.g. tslib) should rebuild too.

I'm guessing part of #17478 may need reverted.

@jbrockmendel
Copy link
Member Author

Small change to util; does not appear to have broken anything.

@jreback
Copy link
Contributor

jreback commented Sep 17, 2017

yeah we used to have this problem where a clean build would work, but changing only a small set of deps would fail. you can see this when you change branches and rebuild, way more things that are necessary are rebuilt.

So we need some more testing to prove that this doesn't have bad side effects.

@jreback
Copy link
Contributor

jreback commented Sep 17, 2017

Small change to util; does not appear to have broken anything.

how can you tell?

this is NOT tested anywhere, rather is a development issue.

@jbrockmendel
Copy link
Member Author

how can you tell?

You asked me to change something in util.pxd and see if the build breaks. The "all green" is the measure I'm using for build-doesn't-appear-broken. What measure would you prefer I use for this?

yeah we used to have this problem where a clean build would work, but changing only a small set of deps would fail. you can see this when you change branches and rebuild, way more things that are necessary are rebuilt.

If I understand what you're saying correctly, then it seems like we're on the same page vis-a-vis the unreliability of the build being related to the under-specification of dependencies.

@jreback
Copy link
Contributor

jreback commented Sep 17, 2017

You asked me to change something in util.pxd and see if the build breaks. The "all green" is the measure I'm using for build-doesn't-appear-broken. What measure would you prefer I use for this?

no, this has to be done locally. The point is the dev tool chain breaks. The CI by-definition is a completely clean build, not a build, change, build one.

@chris-b1
Copy link
Contributor

chris-b1 commented Sep 17, 2017

The CI won't pick this up, the depends is used on local re-builds, i.e., will the correct things be marked as 'changed' if something they depend on is changed.

You will need to test locally - e.g. make an incompatible change to util.pxd and see if the deps correctly rebuild.

(edit: @jreback and I commented simultaneously, we're saying the exact the same thing)

@jreback
Copy link
Contributor

jreback commented Sep 17, 2017

if I understand what you're saying correctly, then it seems like we're on the same page vis-a-vis the unreliability of the build being related to the under-specification of dependencies.

not sure what you are saying.

I think changing these dependency chain actually DOES break things locally. We don't have a test for this on the CI, nor do I think we should actually do that.

You need to assure that the development chain still works correctly. The point is to prevent over-compiling things; under specifying deps tends to do this.

@jreback
Copy link
Contributor

jreback commented Sep 17, 2017

you can test each dep by changing what was there. e.g. if you change something in an algos pxi.in file, then algos.so should be recompiled, that is the case now. however a change in hashtable should also trigger the recompile. I don't think that is the case now (after #17478)

@jbrockmendel
Copy link
Member Author

not sure what you are saying.

Exactly what I was saying is now irrelevant, since the "if i understand correctly" clause is now clearly False.

@jbrockmendel
Copy link
Member Author

I think changing these dependency chain actually DOES break things locally. We don't have a test for this on the CI, nor do I think we should actually do that.

So... the CI doesn't test whether the build works, and you don't think that should be changed?

@jbrockmendel
Copy link
Member Author

no, this has to be done locally. The point is the dev tool chain breaks. The CI by-definition is a completely clean build, not a build, change, build one.

So when you say "done locally", what does this entail? What I have been doing is roughly

rm -rf build
rm pandas/_libs/*.c
rm pandas/_libs/*.so
rm pandas/_libs/tslibs/*.c
rm pandas/_libs/tslibs/*.so
python setup.py build_ext --inplace
python
>>> import pandas as pd
>>> pd.test()

@jreback
Copy link
Contributor

jreback commented Sep 17, 2017

So... the CI doesn't test whether the build works, and you don't think that should be changed?

of course it checks whether the build works. But its a clean build, which the entire point of CI. What I am talking about are incremental changes in development. Imagine you are in a branch and you make a change and you recompile, great. Then you switch to another branch, since the deps are not setup correctly, you have to either: a) recompile everything slow, or b) the recompile is not correct and you get weird errors.

With incremental deps things just work. My point is that you can't simply change deps w/o testing them like this. Since we rarely change them this is generally not an issue. If you are going to try to change them you need a metric for whether your change is good or not, so that it works in dev & production (which is what we are basically testing on CI).

not averse to adding some kind of incremental build for dev on CI, but I think its quite hard to do right.

@jreback
Copy link
Contributor

jreback commented Sep 17, 2017

@jbrockmendel you are essentially running

make clean
make

(this runs setup.py)

which is a clean build test. But not good enough for dependency checking.

@chris-b1
Copy link
Contributor

chris-b1 commented Sep 17, 2017

@jbrockmendel local builds should work without deleting anything (generally). So to test

python setup.py build_ext --inplace
<make edits to cython>
python setup.py build_ext --inplace
# did correct extensions get rebuilt?
<test>

@jreback
Copy link
Contributor

jreback commented Sep 17, 2017

I reverted #17478, so happy to re-incorporate useful bits here

@jbrockmendel
Copy link
Member Author

This may be approaching an answer to the question I asked the other day in #17519. So the depends argument is just there for it to figure out whether to invalidate a cache?

So what about, say, tslib which cimports from khash. khash is not mentioned in tslib's ext_data entry. Does tslib not need to be re-compiled when khash changes?

Or how about _libs.window which has both from skiplist cimport [...] and `include "skiplist.pyx". Why are those not redundant?

@jreback
Copy link
Contributor

jreback commented Sep 17, 2017

Or how about _libs.window which has both from skiplist cimport [...] and `include "skiplist.pyx". Why are those not redundant?

so this is slightly different; include is a source file include, so whenever window.pyx needs to be recompiled, the skiplist.pyx is added to the source file and recompiled. It should have a dependency on this.

This is a similar pattern to lib.pyx which includes inference, properties and such.

So what about, say, tslib which cimports from khash. khash is not mentioned in tslib's ext_data entry. Does tslib not need to be re-compiled when khash changes?

khash is a dep of almost everything that is a dep of hashtable (a couple of cases where it is used directly). So in that case if hashtable will change if khash changes, and deps of hashtable will also change.. So nothing actually needs to depend on khash directly, except for hashtable itself.

@jbrockmendel
Copy link
Member Author

The two of you have gone above-and-beyond today in explaining this, thanks. I'll go back to focusing on tslibs and not worry about this for a bit.

@jbrockmendel jbrockmendel deleted the remove_deps branch October 30, 2017 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Internals Related to non-user accessible pandas implementation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants