-
Notifications
You must be signed in to change notification settings - Fork 533
[FIX] load_results in nodes.py now uses node output directory as wd #2694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@achetverikov - would it be possible to add a simple test where the original code fails and this mod succeeds? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this. I have a couple ideas of how we can narrow this change to only the potentially erroring sections.
nipype/pipeline/engine/nodes.py
Outdated
if not isinstance(self, MapNode): | ||
self._copyfiles_to_wd(linksonly=True) | ||
aggouts = self._interface.aggregate_outputs( | ||
needed_outputs=self.needed_outputs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this now, I think we want to try narrowing the "critical section" where we modify the working directory. I think it might actually just be here:
with indirectory(cwd):
aggouts = self._interface.aggregate_outputs(
needed_outputs=self.needed_outputs)
Almost all other function calls use the cwd
directly.
By keeping it narrow, we effectively mark places where we're working-directory dependent, which may be useful in future refactors.
nipype/pipeline/engine/nodes.py
Outdated
_save_resultfile(result, cwd, self.name) | ||
else: | ||
logger.debug('aggregating mapnode results') | ||
result = self._run_interface() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here's a potential other place we might want to wrap in indirectory(cwd)
. I would test without, first.
I had difficulties reducing a failing case to a test. But now that we've identified the likely section of code (and I have a couple ideas of the relevant lines), perhaps it will be easier to come up with a failing test. |
Codecov Report
@@ Coverage Diff @@
## master #2694 +/- ##
=========================================
- Coverage 67.61% 67.6% -0.01%
=========================================
Files 340 340
Lines 43143 43144 +1
Branches 5349 5349
=========================================
- Hits 29170 29168 -2
- Misses 13271 13274 +3
Partials 702 702
Continue to review full report at Codecov.
|
I have move indirectory as suggested by @effigies, but I have no experience with writing tests so I'm not sure I can do this. I guess the test should be related to loading a result with autogenerated filename from an interface that resulted in an error before, like fsl.utils.MathCommand |
I can't figure out why sometimes we're in the wrong directory. I believe it has something to do with @achetverikov Can you verify that this resolves your error, at least? |
@effigies - that's why i was hoping that we could create a test? do the mapnodes that crash have function nodes as interfaces that change directory? (i think we protect against that, but i'm just throwing out ideas). is it always the same mapnode that crashes? and does the problem happen with all plugins, or just the multiproc plugin? |
I spent a while trying to reproduce the condition, but cannot. Here's the test I wrote: class TrivialInterface(BaseInterface):
class input_spec(TraitedSpec):
inputs = traits.Any()
class output_spec(TraitedSpec):
out = File(exists=True)
def _run_interface(self, runtime):
open(self._list_outputs()['out'], 'w').close()
return runtime
def _list_outputs(self):
outputs = {}
outputs['out'] = os.path.abspath('out')
return outputs
def test_mapnode_aggregation(tmpdir):
wf = pe.Workflow(name='wf', base_dir=str(tmpdir))
trivial = pe.MapNode(TrivialInterface(), name='trivial',
iterfield=['inputs'])
trivial.inputs.inputs = list(range(5))
consume = pe.Node(TrivialInterface(), name='consume')
wf.connect(trivial, 'out', consume, 'inputs')
res = wf.run(plugin='MultiProc')
assert list(res.nodes)[0].result.outputs.out == [
os.path.join(tmpdir, 'wf', 'trivial', 'mapflow',
'_trivial%d' % i, 'out')
for i in range(5)] I've tried in and out of workflows, with |
I don't think it's a MultiProc problem - I got the same with PBS and I also tried setting run_without_submitting for the problem nodes, it did not matter. For me, two nodes were giving errors, one with Both of the nodes that were producing errors generate absolute paths for the expected outputs and that was a problem when the _list_outputs was run outside of the node folder. I think one difference with the test above is that the problem occurred when the nodes results were loaded from cache and then aggregated. But doesn't the error make sense when you look at the |
I agree it makes sense. I just can't figure out how to get into that code path from another working directory. (I can call |
I think I can try and rerun the old code to see if I can reproduce the problem and then maybe share the python script that is created for each node for batch jobs and the pickled results to see if it helps to reproduce what's going on exactly. |
If you could, that would be great. And if you could try removing pieces, one at a time, until you get the smallest possible workflow that still exhibits the error, that's really what we need to test. |
Sorry for the delays in further testing, lots of stuff on my plate this week. I'll get back to it, hopefully before the weekend. |
Hi @achetverikov, this would be great to have in for 1.1.3, which we plan to release next Monday. Please let me know if you need any help, here. |
Sorry, it seems that I am now unable to replicate the error myself. I've tried several times to rerun the whole preprocessing workflow without luck (in a sense that there were no errors). Possible reasons why it appeared in the first place: old cache remaining from before I upgraded nipype last time or some weird combination of circumstances on a cluster. If I see it again, I'll try to track it down. |
Yeah, it may be a problem that arises from an old cache... I can try having another go at a test, at some point, but definitely not this week. If this is a problem that can be resolved with a new cache, I'm okay pushing it off to the next release to make sure we can get a test and verify it's fixed. (If it goes in now, it'll be a pain to verify.) |
I think it might be safely postponed, maybe somebody else would also encounter similar problems |
Hi @achetverikov, have you had any more chance to look at this? I'll admit that I haven't. |
I try a couple of times to reproduce it by rerunning the pipeline from
scratch but I wasn't able to see this error again.
…On Tue, 23 Oct 2018 at 18:51, Chris Markiewicz ***@***.***> wrote:
Hi @achetverikov <https://github.com/achetverikov>, have you had any more
chance to look at this? I'll admit that I haven't.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2694 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABZdzpnhHeAqeuQIismGwjqDxIqYT2K3ks5un0kjgaJpZM4Wa26Z>
.
--
Andrey Chetverikov, Ph.D.
PostDoc
Donders Institute for Brain, Cognition and Behavior
Center for Cognitive Neuroimaging
Netherlands
Phone: +31 6 83 777 123
E-mail: [email protected]
|
Thanks for the update. If I remember correctly, we may be able to test this as follows:
That said, I haven't looked seriously at this again, so I may be misremembering my thought process before. If you have some time, could you give that a shot? Otherwise I will try to find a chance. |
Sorry, I don't have much time now, unfortunately |
Thanks. I'll have a look if I have time, otherwise I'll shift it to next month. |
As an update: The suggestion doesn't work. There's a lot of weird machinery around aggregating outputs, and it's not really clear how to enter any given code path at will. That entire block is not covered by tests. I'm pushing this off to future, as I don't see any real prospect of reproducing the issue. |
I think this has been superseded by recent changes. Closing for now. If #2689 re-emerges, we can look at this again. |
Summary
Fixes #2689. Makes sure that when results are loaded the current working directory is a node's output directory so that autogenerated filenames will have the right paths.
List of changes proposed in this PR (pull-request)
Acknowledgment