[ENH] New ResourceMonitor (replaces resource profiler) #2200

oesteban · 2017-09-27T21:09:25Z

Supersedes #2193 (I removed the specs to make code review easier)

This PR revises the resource profiler (with a new name: resource monitor):

Created new nipype.utils.profiler where all the code related to resource monitoring is gathered together. Tests are also brought into the nipype.utils module.
Removed monitoring responsibility from the runner into the interfaces. Consequences:
1. Resources can be monitored when running bare interfaces
2. All kinds of interfaces can be monitored (it was reduced to only CommandLine-based and Function interfaces before).
3. There was a bit of a mixture of resource-management and resource-monitoring in the node execution code. That has been cleaned up.
4. Interfaces save the traces directly to a file, so the log callback is not needed anymore (it has been left here just in case users want to generate the old gantt chart).
5. Interfaces place peak usage in their runtime object.
Previously, even though resource consumption with a certain frequency, only peak usage was finally reported in the Gantt chart. Now the traces are kept in the proc_dict member of the runtime object.
When the workflow is finished, the execution graph is post-processed and a new runtime_monitor.json file is generated, with all the traces and necessary information to reconstruct the performance profile with the resolution of the new runtime_monitor_frequency option (default: every 1s).
The resource manager of MultiProc blocked if a node was defined with n_procs > MultiProc's setting n_procs (similarly with the estimated memory). Now, either a warning or an error are raised (depending on the new plugin arg raise_insufficient).

Important: this PRs removes the old filemanip logger and replaces it by a more generic utils. Documentation has been updated accordingly and Deprecation warnings are generated consequently.

This PR revises the resource profiler with the following objectives: - Increase robustness (and making sure it does not crash nipype) - Extend profiling to all interfaces (including pure python) The increase of robustness will be expected from: 1. Trying to reduce (or remove at all if possible) the logger callback to register the estimations of memory and cpus. This could be achieved by making interfaces responsible or keeping track of their resources to then collect all results after execution of the node. 2. Centralize profiler imports, like the config or logger object so that the applicability of the profiler is checked only once. This first commit just creates one new module nipype.utils.profiler, and moves the related functions in there.

…eProfiler

oesteban · 2017-10-03T18:46:14Z

@satra Am I correct thinking that this: https://github.com/oesteban/nipype/blob/d2599e2f86df754bfa0fa41b8f6789e2408fb7bd/nipype/pipeline/engine/nodes.py#L1150-L1151 would ensure that MapNodes actually inherit the memory/cpu constraints?

effigies · 2017-10-03T18:48:32Z

@oesteban I think you actually want to pass n_procs=self._n_procs, so that when it's None, it attempts to read self._interface.inputs.num_threads.

oesteban · 2017-10-03T18:50:03Z

Oh, good catch! - well, that would only make sense if the node will have different num_threads for each iterable... right?

effigies

I think I'd be comfortable merging this, assuming tests pass. @satra any further comments?

satra · 2017-10-04T02:53:55Z

@oesteban - correct, mapnodes inherit constraints. solid work - really appreciate the refactoring that was much needed.

assuming tests pass, this looks good to me.

oesteban · 2017-10-04T04:32:04Z

We'll need to revisit the estimation of resources, but what we have with this PR is one step further from where we were before and it is enough to get going.

Includes nipy/nipype#2198, nipy/nipype#2200

Includes - nipy/nipype#2198 - nipy/nipype#2200 - nipy/nipype#2203 - nipy/nipype#2205 - nipy/nipype#2208

Includes - nipy/nipype#2198 - nipy/nipype#2200 - nipy/nipype#2203 - nipy/nipype#2205 - nipy/nipype#2208 - nipy/nipype#2211 - nipy/nipype#2212

mgxd · 2017-10-24T20:46:23Z

@oesteban what's up with test_resource_monitor.py? I see we're skipping it in CI tests - but running it locally will also lead to a fail

    @pytest.mark.skipif(os.getenv('CI_SKIP_TEST', False), reason='disabled in CI tests')
    @pytest.mark.parametrize("mem_gb,n_procs", [(0.5, 3), (2.2, 8), (0.8, 4), (1.5, 1)])
    def test_cmdline_profiling(tmpdir, mem_gb, n_procs):
        """
        Test runtime profiler correctly records workflow RAM/CPUs consumption
        of a CommandLine-derived interface
        """
        from nipype import config
        config.set('execution', 'resource_monitor_frequency', '0.2')  # Force sampling fast
    
        tmpdir.chdir()
        iface = UseResources(mem_gb=mem_gb, n_procs=n_procs)
        result = iface.run()
    
        assert abs(mem_gb - result.runtime.mem_peak_gb) < 0.3, 'estimated memory error above .3GB'
>       assert int(result.runtime.cpu_percent / 100 + 0.2) == n_procs, 'wrong number of threads estimated'
E       AssertionError: wrong number of threads estimated
E       assert 3 == 1
E        +  where 3 = int(((366.80000000000001 / 100) + 0.2))
E        +    where 366.80000000000001 = Bunch(cmdline='/code/nipype/nipype/utils/tests/use_resources -g 1.500000 -p 1', command_path='/code/nipype/nipype/util...5.92338, 1508877125.970061]}, returncode=0, startTime='2017-10-24T20:32:03.693559', stderr='', stdout='', version=None).cpu_percent
E        +      where Bunch(cmdline='/code/nipype/nipype/utils/tests/use_resources -g 1.500000 -p 1', command_path='/code/nipype/nipype/util...5.92338, 1508877125.970061]}, returncode=0, startTime='2017-10-24T20:32:03.693559', stderr='', stdout='', version=None) = <nipype.interfaces.base.InterfaceResult object at 0x7f25b8754278>.runtime

oesteban · 2017-10-24T22:57:22Z

Hi, as I mentioned before it seems that the measurements we get are not very accurate and those tests fail, especially when running in circleci (I was more successful on my desktop).

Since those tests were already disabled before refactoring, that code is left there for when we have the time and will of making sure that the cpu and memory estimates are accurate. I noted that we should revisit these tests in my last comment

mgxd · 2017-10-27T15:05:30Z

@oesteban sounds good - perhaps we should just always skip it until we can assure the estimates are accurate, WDYT?

oesteban added 30 commits September 21, 2017 16:47

fix tests

32c2f39

Python 2 compatibility

0e2c581

add nipype_mprof

5a8e7fe

implement monitor in a parallel process

7d953cc

set profiling outputs to runtime object, read it from node execution

306c4ec

revise profiler callback

8a903f0

Merge remote-tracking branch 'upstream/master' into enh/ReviseResourc…

02fdbda

…eProfiler

robuster constructor

e3982d7

remove unused import

48f87af

various fixes

46dde32

cleaning up code

9d70a2f

remove comment

1fabd25

interface.base cleanup

ecedfcf

update new config settings

2d35959

make naming consistent across tests

3f34711

implement raise_insufficient

99ded42

fix test

b0d25bd

fix test (amend previous commit)

2a37693

address review comments

10d0f39

fix typo

62a6593

fixes to the tear-up section of interfaces

d6401f3

fix NoSuchProcess exception

ce3f08a

making monitor robuster

ffb7509

Merge remote-tracking branch 'upstream/master' into enh/ReviseResourc…

7b7846b

…eProfiler

first functional prototype

c9b474b

Merge remote-tracking branch 'upstream/master' into enh/ReviseResourc…

117924c

…eProfiler

add warning to old filemanip logger

cf1f15b

do not search for filemanip_level in config

4b7ab93

fix CommandLine interface doctest

c7a1992

disable resource_monitor tests when running tests in Circle and Travis

d2599e2

let the inner interface set _n_procs and _mem_gb

678bb1a

effigies approved these changes Oct 3, 2017

View reviewed changes

oesteban merged commit 6e95b3c into nipy:master Oct 4, 2017

oesteban deleted the enh/ReviseResourceProfiler branch October 4, 2017 04:36

effigies added a commit to effigies/niworkflows that referenced this pull request Oct 4, 2017

PIN: nipy/nipype master branch

e445713

Includes nipy/nipype#2198, nipy/nipype#2200

This was referenced Oct 4, 2017

PIN: Update nipype nipreps/niworkflows#197

Merged

[RTM] PIN: Update niworkflows nipreps/mriqc#645

Merged

[WIP] PIN: Update niworkflows nipreps/fmriprep#737

Merged

ENH: Reduce verbosity of distributed plugin #2208

Merged

This was referenced Oct 4, 2017

Add profile_runtime to list of settings in documentation #2003

Closed

Resource profiler callback seemingly not working #1998

Closed

effigies added a commit to effigies/niworkflows that referenced this pull request Oct 5, 2017

PIN: nipy/nipype master branch

dafbfda

Includes - nipy/nipype#2198 - nipy/nipype#2200 - nipy/nipype#2203 - nipy/nipype#2205 - nipy/nipype#2208

effigies mentioned this pull request Oct 5, 2017

[ENH] Minor Xvfb and MultiProcPlugin cleanups #2211

Merged

effigies added a commit to effigies/nipype that referenced this pull request Oct 5, 2017

TEST: Update tests post-nipy#2200

e594185

effigies added a commit to effigies/nipype that referenced this pull request Oct 5, 2017

TEST: Update tests post-nipy#2200

386162b

effigies added a commit to effigies/nipype that referenced this pull request Oct 6, 2017

TEST: Update tests post-nipy#2200

83e3d16

effigies added a commit to effigies/niworkflows that referenced this pull request Oct 6, 2017

PIN: nipy/nipype master branch

335e878

Includes - nipy/nipype#2198 - nipy/nipype#2200 - nipy/nipype#2203 - nipy/nipype#2205 - nipy/nipype#2208 - nipy/nipype#2211 - nipy/nipype#2212

effigies mentioned this pull request Oct 9, 2017

[RTM] Resource annotation nipreps/fmriprep#746

Merged

satra added this to the 0.14.0 milestone Oct 20, 2017

divetea mentioned this pull request Jan 12, 2018

Multiproc plugin with numpy 1.9.0 #2372

Closed

oesteban mentioned this pull request Jan 30, 2018

Version of nipype found in ... does not contain the MultiProc plugin. FCP-INDI/C-PAC#749

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ENH] New ResourceMonitor (replaces resource profiler) #2200

[ENH] New ResourceMonitor (replaces resource profiler) #2200

Uh oh!

oesteban commented Sep 27, 2017 •

edited

Loading

Uh oh!

oesteban commented Oct 3, 2017

Uh oh!

effigies commented Oct 3, 2017

Uh oh!

oesteban commented Oct 3, 2017 •

edited

Loading

Uh oh!

effigies left a comment

Uh oh!

satra commented Oct 4, 2017

Uh oh!

oesteban commented Oct 4, 2017

Uh oh!

mgxd commented Oct 24, 2017

Uh oh!

oesteban commented Oct 24, 2017

Uh oh!

mgxd commented Oct 27, 2017

Uh oh!

Uh oh!

[ENH] New ResourceMonitor (replaces resource profiler) #2200

[ENH] New ResourceMonitor (replaces resource profiler) #2200

Uh oh!

Conversation

oesteban commented Sep 27, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oesteban commented Oct 3, 2017

Uh oh!

effigies commented Oct 3, 2017

Uh oh!

oesteban commented Oct 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

effigies left a comment

Choose a reason for hiding this comment

Uh oh!

satra commented Oct 4, 2017

Uh oh!

oesteban commented Oct 4, 2017

Uh oh!

mgxd commented Oct 24, 2017

Uh oh!

oesteban commented Oct 24, 2017

Uh oh!

mgxd commented Oct 27, 2017

Uh oh!

Uh oh!

oesteban commented Sep 27, 2017 •

edited

Loading

oesteban commented Oct 3, 2017 •

edited

Loading