[FIX] load_results in nodes.py now uses node output directory as wd #2694

achetverikov · 2018-09-05T13:00:34Z

Summary

Fixes #2689. Makes sure that when results are loaded the current working directory is a node's output directory so that autogenerated filenames will have the right paths.

List of changes proposed in this PR (pull-request)

add indirectory to Node._load_results

Acknowledgment

(Mandatory) I acknowledge that this contribution will be available under the Apache 2 license.

satra · 2018-09-05T13:55:37Z

@achetverikov - would it be possible to add a simple test where the original code fails and this mod succeeds?

effigies

Thanks for this. I have a couple ideas of how we can narrow this change to only the potentially erroring sections.

effigies · 2018-09-05T13:57:53Z

nipype/pipeline/engine/nodes.py

+                if not isinstance(self, MapNode):
+                    self._copyfiles_to_wd(linksonly=True)
+                    aggouts = self._interface.aggregate_outputs(
+                        needed_outputs=self.needed_outputs)


Looking at this now, I think we want to try narrowing the "critical section" where we modify the working directory. I think it might actually just be here:

with indirectory(cwd): aggouts = self._interface.aggregate_outputs( needed_outputs=self.needed_outputs)

Almost all other function calls use the cwd directly.

By keeping it narrow, we effectively mark places where we're working-directory dependent, which may be useful in future refactors.

effigies · 2018-09-05T14:01:42Z

nipype/pipeline/engine/nodes.py

+                    _save_resultfile(result, cwd, self.name)
+                else:
+                    logger.debug('aggregating mapnode results')
+                    result = self._run_interface()


I think here's a potential other place we might want to wrap in indirectory(cwd). I would test without, first.

effigies · 2018-09-05T14:04:20Z

I had difficulties reducing a failing case to a test. But now that we've identified the likely section of code (and I have a couple ideas of the relevant lines), perhaps it will be easier to come up with a failing test.

codecov-io · 2018-09-05T14:15:40Z

Codecov Report

Merging #2694 into master will decrease coverage by <.01%.
The diff coverage is 0%.

@@            Coverage Diff            @@
##           master   #2694      +/-   ##
=========================================
- Coverage   67.61%   67.6%   -0.01%     
=========================================
  Files         340     340              
  Lines       43143   43144       +1     
  Branches     5349    5349              
=========================================
- Hits        29170   29168       -2     
- Misses      13271   13274       +3     
  Partials      702     702

Flag	Coverage Δ
#smoketests	`50.57% <0%> (-0.01%)`	⬇️
#unittests	`65.1% <0%> (+0.01%)`	⬆️

Impacted Files	Coverage Δ
nipype/pipeline/engine/nodes.py	`84.16% <0%> (-0.15%)`	⬇️
nipype/utils/profiler.py	`43.1% <0%> (-1.15%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 37f3781...e4894a9. Read the comment docs.

…e place

achetverikov · 2018-09-05T15:42:11Z

I have move indirectory as suggested by @effigies, but I have no experience with writing tests so I'm not sure I can do this. I guess the test should be related to loading a result with autogenerated filename from an interface that resulted in an error before, like fsl.utils.MathCommand

effigies · 2018-09-06T14:47:29Z

I can't figure out why sometimes we're in the wrong directory. I believe it has something to do with _list_outputs calling os.path.abspath() or os.getcwd() when loading results, but I cannot reproduce the condition within a workflow or without. @satra Any ideas?

@achetverikov Can you verify that this resolves your error, at least?

satra · 2018-09-06T14:53:51Z

@effigies - that's why i was hoping that we could create a test? do the mapnodes that crash have function nodes as interfaces that change directory? (i think we protect against that, but i'm just throwing out ideas).

is it always the same mapnode that crashes? and does the problem happen with all plugins, or just the multiproc plugin?

effigies · 2018-09-06T15:17:52Z

I spent a while trying to reproduce the condition, but cannot. Here's the test I wrote:

class TrivialInterface(BaseInterface):
    class input_spec(TraitedSpec):
        inputs = traits.Any()

    class output_spec(TraitedSpec):
        out = File(exists=True)

    def _run_interface(self, runtime):
        open(self._list_outputs()['out'], 'w').close()
        return runtime

    def _list_outputs(self):
        outputs = {}
        outputs['out'] = os.path.abspath('out')
        return outputs


def test_mapnode_aggregation(tmpdir):
    wf = pe.Workflow(name='wf', base_dir=str(tmpdir))
    trivial = pe.MapNode(TrivialInterface(), name='trivial',
                         iterfield=['inputs'])
    trivial.inputs.inputs = list(range(5))
    consume = pe.Node(TrivialInterface(), name='consume')
    wf.connect(trivial, 'out', consume, 'inputs')

    res = wf.run(plugin='MultiProc')
    assert list(res.nodes)[0].result.outputs.out == [
        os.path.join(tmpdir, 'wf', 'trivial', 'mapflow',
                     '_trivial%d' % i, 'out')
        for i in range(5)]

I've tried in and out of workflows, with MultiProc and without.

achetverikov · 2018-09-06T16:03:44Z

I don't think it's a MultiProc problem - I got the same with PBS and I also tried setting run_without_submitting for the problem nodes, it did not matter. For me, two nodes were giving errors, one with fsl.maths.MeanImage, another with fsl.MCFLIRT, both were subnodes in MapNode. I can confirm that with the modified code I did not yet see this problem, although I had to delete the previous cache before testing the new code.

Both of the nodes that were producing errors generate absolute paths for the expected outputs and that was a problem when the _list_outputs was run outside of the node folder. I think one difference with the test above is that the problem occurred when the nodes results were loaded from cache and then aggregated.

But doesn't the error make sense when you look at the _load_results code? I'm not very familiar with nipype inner workings, but the line cwd = self.output_dir() suggests that it is not intended to be executed inside the node output directory. Then when the filenames are generated, they won't have the same paths as when the node runs. Sorry if this does not make much sense, as I said, I'm not very familiar with nipype architecture.

effigies · 2018-09-06T16:10:06Z

I think one difference with the test above is that the problem occurred when the nodes results were loaded from cache and then aggregated.

I believe MapNodes do that even on the first run.
Adding a second wf.run() does not change anything. The test always passes.

But doesn't the error make sense when you look at the _load_results code?

I agree it makes sense. I just can't figure out how to get into that code path from another working directory. (I can call _load_results() directly from another path, but within a workflow I can't see how.)

achetverikov · 2018-09-06T16:18:33Z

I think I can try and rerun the old code to see if I can reproduce the problem and then maybe share the python script that is created for each node for batch jobs and the pickled results to see if it helps to reproduce what's going on exactly.

effigies · 2018-09-06T17:32:08Z

If you could, that would be great. And if you could try removing pieces, one at a time, until you get the smallest possible workflow that still exhibits the error, that's really what we need to test.

achetverikov · 2018-09-12T21:49:03Z

Sorry for the delays in further testing, lots of stuff on my plate this week. I'll get back to it, hopefully before the weekend.

effigies · 2018-09-17T13:34:43Z

Hi @achetverikov, this would be great to have in for 1.1.3, which we plan to release next Monday. Please let me know if you need any help, here.

achetverikov · 2018-09-19T10:54:23Z

Sorry, it seems that I am now unable to replicate the error myself. I've tried several times to rerun the whole preprocessing workflow without luck (in a sense that there were no errors). Possible reasons why it appeared in the first place: old cache remaining from before I upgraded nipype last time or some weird combination of circumstances on a cluster. If I see it again, I'll try to track it down.

effigies · 2018-09-19T13:26:37Z

Yeah, it may be a problem that arises from an old cache... I can try having another go at a test, at some point, but definitely not this week.

If this is a problem that can be resolved with a new cache, I'm okay pushing it off to the next release to make sure we can get a test and verify it's fixed. (If it goes in now, it'll be a pain to verify.)

achetverikov · 2018-09-19T14:10:37Z

I think it might be safely postponed, maybe somebody else would also encounter similar problems

effigies · 2018-10-23T16:47:41Z

Hi @achetverikov, have you had any more chance to look at this? I'll admit that I haven't.

achetverikov · 2018-10-24T12:45:28Z

I try a couple of times to reproduce it by rerunning the pipeline from scratch but I wasn't able to see this error again.

…

On Tue, 23 Oct 2018 at 18:51, Chris Markiewicz ***@***.***> wrote: Hi @achetverikov <https://github.com/achetverikov>, have you had any more chance to look at this? I'll admit that I haven't. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2694 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABZdzpnhHeAqeuQIismGwjqDxIqYT2K3ks5un0kjgaJpZM4Wa26Z> .

-- Andrey Chetverikov, Ph.D. PostDoc Donders Institute for Brain, Cognition and Behavior Center for Cognitive Neuroimaging Netherlands Phone: +31 6 83 777 123 E-mail: [email protected]

effigies · 2018-10-24T15:17:14Z

Thanks for the update. If I remember correctly, we may be able to test this as follows:

Create a workflow where _list_outputs() will be run from the CWD without this patch, and run it.
Recreate the exact same workflow fresh, which will avoid any cached results objects from coming into play, and run that.

That said, I haven't looked seriously at this again, so I may be misremembering my thought process before. If you have some time, could you give that a shot? Otherwise I will try to find a chance.

achetverikov · 2018-10-26T14:15:11Z

Sorry, I don't have much time now, unfortunately

effigies · 2018-10-26T14:19:30Z

Thanks. I'll have a look if I have time, otherwise I'll shift it to next month.

effigies · 2018-10-26T19:44:45Z

As an update: The suggestion doesn't work. There's a lot of weird machinery around aggregating outputs, and it's not really clear how to enter any given code path at will. That entire block is not covered by tests.

I'm pushing this off to future, as I don't see any real prospect of reproducing the issue.

effigies · 2020-02-24T15:43:22Z

I think this has been superseded by recent changes. Closing for now. If #2689 re-emerges, we can look at this again.

load_results in nodes.py now uses node output directory as wd

955ae96

effigies reviewed Sep 5, 2018

View reviewed changes

effigies added this to the 1.1.3 milestone Sep 5, 2018

achetverikov added 3 commits September 5, 2018 17:30

load_results in nodes.py now uses node output directory as wd

f0f1e19

Merge branch 'master' of https://github.com/achetverikov/nipype

a4230b4

Second attempt at fixing nodes.py, moved indirectory to an appropriat…

e4894a9

…e place

effigies modified the milestones: 1.1.3, 1.1.4 Sep 19, 2018

effigies mentioned this pull request Oct 26, 2018

REL: 1.1.4 #2760

Merged

12 tasks

effigies modified the milestones: 1.1.4, future Oct 26, 2018

effigies mentioned this pull request Sep 5, 2019

Error checking MapNode hash with fMRIPrep #3014

Closed

effigies closed this Feb 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FIX] load_results in nodes.py now uses node output directory as wd #2694

[FIX] load_results in nodes.py now uses node output directory as wd #2694

achetverikov commented Sep 5, 2018

satra commented Sep 5, 2018

effigies left a comment

effigies Sep 5, 2018

effigies Sep 5, 2018

effigies commented Sep 5, 2018

codecov-io commented Sep 5, 2018 •

edited

Loading

achetverikov commented Sep 5, 2018

effigies commented Sep 6, 2018 •

edited

Loading

satra commented Sep 6, 2018

effigies commented Sep 6, 2018

achetverikov commented Sep 6, 2018 •

edited

Loading

effigies commented Sep 6, 2018

achetverikov commented Sep 6, 2018

effigies commented Sep 6, 2018

achetverikov commented Sep 12, 2018

effigies commented Sep 17, 2018

achetverikov commented Sep 19, 2018

effigies commented Sep 19, 2018

achetverikov commented Sep 19, 2018

effigies commented Oct 23, 2018

achetverikov commented Oct 24, 2018 via email

effigies commented Oct 24, 2018

achetverikov commented Oct 26, 2018

effigies commented Oct 26, 2018

effigies commented Oct 26, 2018

effigies commented Feb 24, 2020

[FIX] load_results in nodes.py now uses node output directory as wd #2694

[FIX] load_results in nodes.py now uses node output directory as wd #2694

Conversation

achetverikov commented Sep 5, 2018

Summary

List of changes proposed in this PR (pull-request)

Acknowledgment

satra commented Sep 5, 2018

effigies left a comment

Choose a reason for hiding this comment

effigies Sep 5, 2018

Choose a reason for hiding this comment

effigies Sep 5, 2018

Choose a reason for hiding this comment

effigies commented Sep 5, 2018

codecov-io commented Sep 5, 2018 • edited Loading

Codecov Report

achetverikov commented Sep 5, 2018

effigies commented Sep 6, 2018 • edited Loading

satra commented Sep 6, 2018

effigies commented Sep 6, 2018

achetverikov commented Sep 6, 2018 • edited Loading

effigies commented Sep 6, 2018

achetverikov commented Sep 6, 2018

effigies commented Sep 6, 2018

achetverikov commented Sep 12, 2018

effigies commented Sep 17, 2018

achetverikov commented Sep 19, 2018

effigies commented Sep 19, 2018

achetverikov commented Sep 19, 2018

effigies commented Oct 23, 2018

achetverikov commented Oct 24, 2018 via email

effigies commented Oct 24, 2018

achetverikov commented Oct 26, 2018

effigies commented Oct 26, 2018

effigies commented Oct 26, 2018

effigies commented Feb 24, 2020

codecov-io commented Sep 5, 2018 •

edited

Loading

effigies commented Sep 6, 2018 •

edited

Loading

achetverikov commented Sep 6, 2018 •

edited

Loading