Skip to content

ENH: Centralize virtual/physical $DISPLAYs #2203

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Oct 5, 2017

Conversation

oesteban
Copy link
Contributor

@oesteban oesteban commented Oct 3, 2017

This PR addresses several (related) problems:

This PR adds a config.get_display() which identifies what display should be used, and creates a virtual one if necessary. Also adds a few unit tests to the config object to make sure precedence is fulfilled. If the virtual framebuffer is created, it is created only once for the whole workflow (solves the problem of many virtual fbs on a mapnode with _redirect_x = True.

One open question is: do we want to double check that an X server is running if $DISPLAY or display_variable are :0?
https://github.com/oesteban/nipype/blob/e8e18f039ce6ab7c7bd19c045e083bf02275c7c0/nipype/utils/config.py#L180-L185

This PR addresses several (related) problems:

  - Makes `$DISPLAY` and the config `display_variable` optional (close nipy#2055)
  - Should fix nipy#1403 since xvfb-run is not used anymore.
  - Relates to nipy#1400:
    - Will reduce its impact because now Xvfb is called only once and only if it is absolutely necessary
    - May make it worse in some cases when Xvfb fails to create a listener (root needed?).

This PR adds a `config.get_display()` which identifies what display should be used, and creates a virtual one if necessary. Also adds a few unit tests to the config object to make sure precedence is fulfilled.
@effigies
Copy link
Member

effigies commented Oct 3, 2017

One open question is: do we want to double check that an X server is running if $DISPLAY or display_variable are :0?

This strikes me as being outside the scope of nipype. If I set DISPLAY=":0" and don't have X running, programs that need X will crash. If an interface crashes and doesn't give a "No display found on $DISPLAY" message, I think it's within scope for that interface to give a better message.

@oesteban
Copy link
Contributor Author

oesteban commented Oct 3, 2017

Thanks a lot for the comment.

This strikes me as being outside the scope of nipype. If I set DISPLAY=":0" and don't have X running, programs that need X will crash. If an interface crashes and doesn't give a "No display found on $DISPLAY" message, I think it's within scope for that interface to give a better message.

Totally agree, and that's the reason why I commented that part out, just to have the code handy if needed. It was just a suggestion that could make nipype easier for the user with better error messages (instead of relying on the third party libraries for these infrastructure things).

However, there is a hidden complexity: if you set the $DISPLAY=':1' because you have a VNC server running, then the way to identify the process is different. And even more complicated if the display is remote ($DISPLAY='<ip-of-different-host>:6001').

I'll leave it there for the future (commented out), if we want to check for it only when $DISPLAY='localhost:0'.

@@ -172,3 +173,52 @@ def update_matplotlib(self):
def enable_provenance(self):
self._config.set('execution', 'write_provenance', 'true')
self._config.set('execution', 'hash_method', 'content')

def get_display(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand it, if this has not been called before workflow runtime, then each node will get a copy of config and thus start up its own Xvfb object.

And I don't see anywhere where it's stopped.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first time you call it, one display will be locked. As a matter of fact, when I first wrote the tests (https://github.com/oesteban/nipype/blob/fix/VirtualDisplay/nipype/utils/tests/test_config.py#L9-L49) I realized I had to manally reset the config object all the times because once it is called it is fixed to that display. Therefore, all those config._display = None and those re-setting/removing the nipype config setting.

Does this reply to your question?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I meant to put at the end of the first sentence: "Is this the desired behavior?" Sounds like it is.

Similarly, the fact that self._display.stop() is no longer called anywhere is intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that was actually the problem: nipype would spawn an endless number of displays.

Well, it is probably nice to add a self._display.stop() at the end when running a workflow. I'm not sure though about how to do that when running bare interfaces.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, added a callback when python exists the session :D.

@effigies
Copy link
Member

effigies commented Oct 3, 2017 via email

@codecov-io
Copy link

codecov-io commented Oct 3, 2017

Codecov Report

Merging #2203 into master will increase coverage by 0.04%.
The diff coverage is 83.21%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2203      +/-   ##
==========================================
+ Coverage   72.24%   72.28%   +0.04%     
==========================================
  Files        1177     1178       +1     
  Lines       58639    58733      +94     
  Branches     8447     8457      +10     
==========================================
+ Hits        42366    42458      +92     
+ Misses      14906    14900       -6     
- Partials     1367     1375       +8
Flag Coverage Δ
#smoketests 72.28% <83.21%> (+0.04%) ⬆️
#unittests 69.84% <83.21%> (+0.05%) ⬆️
Impacted Files Coverage Δ
nipype/interfaces/afni/preprocess.py 81.17% <0%> (-0.13%) ⬇️
nipype/interfaces/tests/test_base.py 96.79% <100%> (+0.04%) ⬆️
nipype/interfaces/base.py 84.82% <100%> (+1.42%) ⬆️
nipype/utils/config.py 67.25% <80.48%> (+3.89%) ⬆️
nipype/utils/tests/test_config.py 81.7% <81.7%> (ø)
nipype/pipeline/plugins/multiproc.py 77.94% <0%> (+1.47%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3fde2fb...82a2c6a. Read the comment docs.

@oesteban
Copy link
Contributor Author

oesteban commented Oct 4, 2017

This PR fixed the problem in MRIQC, and the whole workflow run correctly: https://circleci.com/gh/oesteban/mriqc/1368

# As of AFNI 16.0.00, redirect_x is not needed
if v[0] > 2015:
# Between AFNI 16.0.00 and 16.2.07, redirect_x is not needed
if v >= (2016, 0, 0) and v < (2016, 2, 7):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

woah - is this really true - crazy!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah, tell me about it. I'll report this to AFNI, to make sure this is not a mistake linking.

@oesteban oesteban changed the title [ENH] Centralize virtual/physical $DISPLAYs ENH: Centralize virtual/physical $DISPLAYs Oct 4, 2017
@oesteban
Copy link
Contributor Author

oesteban commented Oct 4, 2017

@satra: added the error plus a unit test for it

@effigies and Satra, this is ready for a final review.

and you would like to redirect all spawned windows to
it. If not set, nipype will try to configure a new virtual server using
Xvfb. (possible values: any X server address; default value: not
set)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little unclear. If I understand it correctly, we use the input $DISPLAY variable unless this is set? Under that assumption, here's a proposed rewrite:

Override the ``$DISPLAY`` environment variable for interfaces that require an X server.
This option is useful if there is a running X server, but ``$DISPLAY`` was not defined
in nipype's environment. For example, if an X server is listening on the default port of
6000, set ``display_variable = :0`` to enable nipype interfaces to use it. It may also
point to displays provided by VNC, `xnest
<http://www.x.org/archive/X11R7.5/doc/man/man1/Xnest.1.html>`_ or `Xvfb
<http://www.x.org/archive/X11R6.8.1/doc/Xvfb.1.html>`_. If neither ``display_variable``
nor the ``$DISPLAY`` environment variable are set, nipype will try to configure a new
virtual server using Xvfb. (possible values: ...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. display_variable should override $DISPLAY. I'll change the code and use your proposed text for the documentation.

config._display = None
dispstr = ':10'
config.set('execution', 'display_variable', dispstr)
monkeypatch.setitem(os.environ, 'DISPLAY', dispstr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this not be something other than dispstr?



def test_display_empty_patched(monkeypatch):
"""Check that when no display is specified, a virtual Xvfb is used"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstrings after this one are all the same.

from nipype import config
from nipype import logging
config.stop_display()
logging.getLogger('interface').info('Closing display (if virtual)')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If nipype is going to spawn many Xvfbs, shouldn't the stop be associated with the interface, rather than just closing one when Python quits?

I think I must be missing something here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the opposite. You want nipype to run only one Xvfb if possible. Let's say you start an ipython session, run an interface with _redirect_x = True and then another interface than again requires X. With the current implementation only one Xvfb is started. Only when you exit ipython the Xvfb service is killed.

The rationale is precisely stop spawning many Xvfbs which was the case before

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. And you're testing to make sure you're not producing multiple by running multiple interfaces? Because my first comment noted that it looks like each Node makes a copy of config before spawning an xvfb, so it's not clear they would actually share it.

Again, this is from a very cursory look, so it's possible I'm missing things.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@effigies - config makes copies of sections, not the entire config object. so as long as the config.get_display is called, nipype maintains a global config/state.

@satra
Copy link
Member

satra commented Oct 4, 2017

@oesteban - a couple of follow up questions to @effigies' comments:

i'm enumerating some use cases:

  1. one machine, one user - single multiproc process
  2. one machine, one user - multiple processes from different scripts/slurm/etc
  3. one machine, many users

in my head, 1 and 2 would reuse the same xvfb, however this will not work unless there is some reference counting in place and the one that shuts it down is the last one to finish. this could also result in race conditions of some kind.

3 would launch different xvfb's per user. but within each user's space considerations for 1 and 2 would apply.

@oesteban
Copy link
Contributor Author

oesteban commented Oct 4, 2017

  1. one machine, one user - single multiproc process

Will reuse same Xvfb.

  1. one machine, one user - multiple processes from different scripts/slurm/etc

Will open one new Xvfb per process. This is not ideal, but after all we are reducing the overload of Xvfb by a lot.

  1. one machine, many users

Will open as many Xvfbs as users x processes.

For use cases 2 and 3 we should generally recommend setting up VNC (for clusters) or a unique Xvfb, and setting display_variable in all cases.

There is a compromise here between how much we want to make it seamless to the user and basics of infrastructure. The former implementation was extremely inefficient (spawning literally hundreds of Xvfb, we've seen that in the early days of MRIQC) so this is a great advance. We also get rid of xvfb-run for CommandLine interfaces, also very advisable and not always installed in clusters.

@satra
Copy link
Member

satra commented Oct 4, 2017

IGNORE THIS COMMENT

i think we still need to support 2. imagine submitting slurm jobs on a cluster - it gets assigned to random nodes. if every job has to start a vnc server i don't think that's a good outcome. i'd rather have every job starting an xvfb session.

@satra
Copy link
Member

satra commented Oct 4, 2017

@oesteban - nevermind my previous comment on case 2 - i think we are still ok - case 2 will still launch a different xvfb per process.

@oesteban
Copy link
Contributor Author

oesteban commented Oct 4, 2017

@satra correct :). The vnc idea is when you have one node providing that, otherwise it's better to fall back to the default (ergo Xvfb).

@oesteban
Copy link
Contributor Author

oesteban commented Oct 4, 2017

An important addendum to #2203 (comment) is that Xvfb().stop() ensures that the $DISPLAY variable is set to the original value. The only side effect that I could see if that $DISPLAY == '', for which we unset $DISPLAY and it is not recovered. I thought this is not extremely relevant, though.

@oesteban
Copy link
Contributor Author

oesteban commented Oct 5, 2017

@satra, @effigies, this is ready for final review

Copy link
Member

@effigies effigies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright. @satra addressed my one confusion point. I think this all looks reasonable.

@satra
Copy link
Member

satra commented Oct 5, 2017

@oesteban - lgtm

@satra satra merged commit 4801a4c into nipy:master Oct 5, 2017
@oesteban oesteban deleted the fix/VirtualDisplay branch October 5, 2017 15:30
effigies added a commit to effigies/niworkflows that referenced this pull request Oct 5, 2017
@satra satra added this to the 0.14.0 milestone Oct 20, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants