Skip to content

[ENH] New ResourceMonitor (replaces resource profiler) #2200

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 88 commits into from
Oct 4, 2017

Conversation

oesteban
Copy link
Contributor

@oesteban oesteban commented Sep 27, 2017

Supersedes #2193 (I removed the specs to make code review easier)

This PR revises the resource profiler (with a new name: resource monitor):

  • Created new nipype.utils.profiler where all the code related to resource monitoring is gathered together. Tests are also brought into the nipype.utils module.
  • Removed monitoring responsibility from the runner into the interfaces. Consequences:
    1. Resources can be monitored when running bare interfaces
    2. All kinds of interfaces can be monitored (it was reduced to only CommandLine-based and Function interfaces before).
    3. There was a bit of a mixture of resource-management and resource-monitoring in the node execution code. That has been cleaned up.
    4. Interfaces save the traces directly to a file, so the log callback is not needed anymore (it has been left here just in case users want to generate the old gantt chart).
    5. Interfaces place peak usage in their runtime object.
  • Previously, even though resource consumption with a certain frequency, only peak usage was finally reported in the Gantt chart. Now the traces are kept in the proc_dict member of the runtime object.
  • When the workflow is finished, the execution graph is post-processed and a new runtime_monitor.json file is generated, with all the traces and necessary information to reconstruct the performance profile with the resolution of the new runtime_monitor_frequency option (default: every 1s).
  • The resource manager of MultiProc blocked if a node was defined with n_procs > MultiProc's setting n_procs (similarly with the estimated memory). Now, either a warning or an error are raised (depending on the new plugin arg raise_insufficient).

Important: this PRs removes the old filemanip logger and replaces it by a more generic utils. Documentation has been updated accordingly and Deprecation warnings are generated consequently.

This PR revises the resource profiler with the following objectives:

  - Increase robustness (and making sure it does not crash nipype)
  - Extend profiling to all interfaces (including pure python)

The increase of robustness will be expected from:

  1. Trying to reduce (or remove at all if possible) the logger callback
     to register the estimations of memory and cpus. This could be achieved
     by making interfaces responsible or keeping track of their resources
     to then collect all results after execution of the node.

  2. Centralize profiler imports, like the config or logger object
     so that the applicability of the profiler is checked only once.

This first commit just creates one new module nipype.utils.profiler, and
moves the related functions in there.
@oesteban
Copy link
Contributor Author

oesteban commented Oct 3, 2017

@satra Am I correct thinking that this: https://github.com/oesteban/nipype/blob/d2599e2f86df754bfa0fa41b8f6789e2408fb7bd/nipype/pipeline/engine/nodes.py#L1150-L1151 would ensure that MapNodes actually inherit the memory/cpu constraints?

@effigies
Copy link
Member

effigies commented Oct 3, 2017

@oesteban I think you actually want to pass n_procs=self._n_procs, so that when it's None, it attempts to read self._interface.inputs.num_threads.

@oesteban
Copy link
Contributor Author

oesteban commented Oct 3, 2017

Oh, good catch! - well, that would only make sense if the node will have different num_threads for each iterable... right?

Copy link
Member

@effigies effigies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd be comfortable merging this, assuming tests pass. @satra any further comments?

@satra
Copy link
Member

satra commented Oct 4, 2017

@oesteban - correct, mapnodes inherit constraints. solid work - really appreciate the refactoring that was much needed.

assuming tests pass, this looks good to me.

@oesteban
Copy link
Contributor Author

oesteban commented Oct 4, 2017

We'll need to revisit the estimation of resources, but what we have with this PR is one step further from where we were before and it is enough to get going.

@oesteban oesteban merged commit 6e95b3c into nipy:master Oct 4, 2017
@oesteban oesteban deleted the enh/ReviseResourceProfiler branch October 4, 2017 04:36
effigies added a commit to effigies/niworkflows that referenced this pull request Oct 4, 2017
effigies added a commit to effigies/niworkflows that referenced this pull request Oct 5, 2017
effigies added a commit to effigies/nipype that referenced this pull request Oct 5, 2017
effigies added a commit to effigies/nipype that referenced this pull request Oct 5, 2017
effigies added a commit to effigies/nipype that referenced this pull request Oct 6, 2017
@satra satra added this to the 0.14.0 milestone Oct 20, 2017
@mgxd
Copy link
Member

mgxd commented Oct 24, 2017

@oesteban what's up with test_resource_monitor.py? I see we're skipping it in CI tests - but running it locally will also lead to a fail

    @pytest.mark.skipif(os.getenv('CI_SKIP_TEST', False), reason='disabled in CI tests')
    @pytest.mark.parametrize("mem_gb,n_procs", [(0.5, 3), (2.2, 8), (0.8, 4), (1.5, 1)])
    def test_cmdline_profiling(tmpdir, mem_gb, n_procs):
        """
        Test runtime profiler correctly records workflow RAM/CPUs consumption
        of a CommandLine-derived interface
        """
        from nipype import config
        config.set('execution', 'resource_monitor_frequency', '0.2')  # Force sampling fast
    
        tmpdir.chdir()
        iface = UseResources(mem_gb=mem_gb, n_procs=n_procs)
        result = iface.run()
    
        assert abs(mem_gb - result.runtime.mem_peak_gb) < 0.3, 'estimated memory error above .3GB'
>       assert int(result.runtime.cpu_percent / 100 + 0.2) == n_procs, 'wrong number of threads estimated'
E       AssertionError: wrong number of threads estimated
E       assert 3 == 1
E        +  where 3 = int(((366.80000000000001 / 100) + 0.2))
E        +    where 366.80000000000001 = Bunch(cmdline='/code/nipype/nipype/utils/tests/use_resources -g 1.500000 -p 1', command_path='/code/nipype/nipype/util...5.92338, 1508877125.970061]}, returncode=0, startTime='2017-10-24T20:32:03.693559', stderr='', stdout='', version=None).cpu_percent
E        +      where Bunch(cmdline='/code/nipype/nipype/utils/tests/use_resources -g 1.500000 -p 1', command_path='/code/nipype/nipype/util...5.92338, 1508877125.970061]}, returncode=0, startTime='2017-10-24T20:32:03.693559', stderr='', stdout='', version=None) = <nipype.interfaces.base.InterfaceResult object at 0x7f25b8754278>.runtime

@oesteban
Copy link
Contributor Author

Hi, as I mentioned before it seems that the measurements we get are not very accurate and those tests fail, especially when running in circleci (I was more successful on my desktop).

Since those tests were already disabled before refactoring, that code is left there for when we have the time and will of making sure that the cpu and memory estimates are accurate. I noted that we should revisit these tests in my last comment

@mgxd
Copy link
Member

mgxd commented Oct 27, 2017

@oesteban sounds good - perhaps we should just always skip it until we can assure the estimates are accurate, WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants