Description
Overview
This is design proposal for the integration of orca into plotly.py in order to support the programmatic export of high quality static images.
Related issues:
- Directly save image (without opening in browser) #880
- Bug: Offline Static Image Export #596
- Is it possible to save graphs as images in offline mode without Ipython notebook? #564
Background
The programmatic export of static raster and vector images from JavaScript-based data visualization libraries is a notoriously complicated problem. One common solution is to combine selenium with a driver for a headless web browser like phantomjs or headless firefox/chrome. This approach is used by Bokeh and Altair for example. One challenge with this approach is that it requires the installation of dependencies that are not managed by a Python environment friendly package manager like conda
(Although phantomjs is available through conda, its development has been suspended and it does not support WebGL). This presents challenges in terms of portability and reproducibility.
The plotly.js team has taken a different approach with the Orca project. Orca is a standalone Electron application that can run as a command line image export tool, or it can run in a server mode and respond to image export requests interactively. Orca is the backbone of the plot.ly image export service, and it was open sourced earlier this year.
Because Orca can be built into a standalone executable that does not depend on a system web browser, it is possible to package Orca as a conda package, and we've had recent success towards this goal.
This issue is for the discussion of how to build the best plotly.py image export experience on top of Orca.
Goals
- Users shouldn't need to be aware of how complicated static image export is. At most it should require a single additional conda installation command.
- It should be as easy and reliable to use as matplotlib's image export.
- Nothing should flash on the screen or dock or taskbar during export.
- For raster formats, it should support png, jpg, and webp with configurable resolution.
- For vector formats it should support svg, pdf, and eps.
- It should be possible to save images directly to the local filesystem, or to a writable file object.
- It should be possible to return a byte string containing the image data without specifying filenames (and ideally without actually writing anything to temp files).
- It should be fast enough to support use as an interactive plotting backend (See New module proposal: plotly.io #1098)
- It should provide really helpful error messages if the orca executable isn't found.
Potential Approaches
1. Use command-line interface with figure as arg
The current Python instructions in the Orca README suggest the following usage:
from subprocess import call
import json
import plotly
fig = {"data": [{"y": [1,2,1]}]}
call(['orca', 'graph', json.dumps(fig, cls=plotly.utils.PlotlyJSONEncoder)])
Here the figure is serialized to a JSON string and passed as a command line argument to orca. This is nice because it avoids the need to create a temporary file. Unfortunately, there's a limit to how large the command line arguments can be, and large figures cross that boundary, resulting in an exception.
2. Use command-line interface with figure as tmp file
An alternative that doesn't run into this scaling problem is to first write the figure to a temporary file and then call orca with the path to the file. Furthermore, if a collection of figures needs to converted at once, the paths can all be passed to orca at once and orca will convert them in a batch mode. This is much faster on average because the orca executable only has to start up and shut down once per batch, rather than once per figure.
3. Use orca in server mode
Another approach would be to launch orca as a subprocess in server mode. The Python library would send individual image export requests to the server on an agreed upon port. The server would respond with the byte string of the converted image. This approach has several advantages, but also some increased complexity.
3.1 Advantages
Response time: Launch orca as a command line program or as a server process takes roughly 2 seconds to complete. However requests to an already running server process are much faster. I've seen round trip request to response times of under 50ms. 2 seconds is acceptable in the context of exporting figures to images on the filesystem, but it is not acceptable for interactive use as a static backend. 50ms feels as fast as matplotlib.
No temp files: This approach doesn't involve the use of any temporary files, and it makes it much simpler to support the non-file image use cases, like returning a bytes string or PIL.Image.Image
object to the user.
3.2 Complications
There are some additional complications to this approach. First, the long runner server process would need to be managed by the Python library. It's too resource intensive to run all the time by default, so the user would need to start it explicitly, or we would need to start it the first time an export is requested.
Then there's the question of whether we leave the server process running indefinitely. Or do we implement some kind of timeout that would shut the process down after a (configurable) period of inactivity?
Finally, the communication between the Python process and the server requires an open local port, so there's the potential for restrictive firewalls to be a problem. (But, on the other hand, this is also true of the Jupyter Notebook and most applications that interact with an ipython kernel.)
What's next
Next we're going to work on testing and releasing conda packages for orca version 1.1.0.
Method 2 above (temp files) is probably the least risky approach, but I really want the advantages that come with Method 3 (server process), so I'd like to give this a shot first. I've already developed a prototype of the server mode approach, with automatic startup and timeout shutdown, and I have it working on OS X, Linux, and Windows. So far I've found it to be very reliable, and the responsiveness is really exciting.
So, I'm quite hopeful that we'll be able to build a solid user experience on top of the server mode. But I would like hear some other perspectives here.
@chriddyp @jackparmer @cldougl @nicolaskruchten @etpinard @Kully