Summary: An early report on getting the scientific Python stack compiled to WebAssembly.
Data Science in the Browser
Shortly after starting at Mozilla in January, I became aware of Hamilton Ulmer and Brendan Colloran's Iodide project, an experiment to build a data science notebook based on web technologies. Unlike Jupyter notebooks, the computation happens in the browser, with direct access to Web API technologies like the DOM. Sharing a notebook is as simple as passing around a single HTML file, since there's no server side to worry about. It's not out to replace Jupyter notebooks, but rather to exist in a different design tradeoff space that makes it more suitable the sharing and collaboration.
You could use it from Python by using the from js import ... syntax:
# python from js import secret decoded = ''.join(chr(ord(x) - 3) for x in secret)
For example, changing the browser tab's title is as simple as importing window and setting an attribute:
from js import window window.title = "My mind is blown"
Most of the Python standard library works. The most notable exceptions are:
- subprocess: since the browser isn't an OS, it can't spawn new processes.
- socket: access to raw network sockets would break the browser security model. There are a lot of networking-related things in the standard library built on socket that therefore also don't work.
Within Numpy, all of the core functionality works, but there's no support for long double (but those are pretty niche). There are still some low-level compiler bugs that prevent the FFT stuff from compiling, but that should eventually resolve.
How fast is it?
To answer this question, I reached for a few existing Python and Numpy benchmarks:
- The venerable pystone, which ships with CPython.
- Serge Guelton's set of numpy benchmarks.
These benchmarks probably fall into the trap of being a little too "synthetic". I would have preferred to also use the Python Performance Benchmark Suite, which aims to be a little closer to "real world", but it has a significant number of dependencies and would need to be adapted to work on a platform without subprocess before it could be used in this context. Nonetheless, I think these benchmarks offer a useful approximation for now.
The benchmarks were run on the same machine in the native CPython implementation and in Firefox Nightly using selenium. The following figure shows how many times slower the WebAssembly implementation is.
EDIT 2018-04-10: The original results posted here inadvertently included Numpy import time in the WebAssembly times (but not in the native times). These have now been corrected above. There is some improvement in the results, but not in a best or worst case. You can see the original results here.
The results are interesting. For benchmarks that spend most of their time in Numpy routines, such as harris or rosen, runtime is at par with the native-compiled Python. When WebAssembly rocks, it really, really rocks. Unfortunately, for other benchmarks that spend a lot of time looping or making function calls in Python, runtimes can be as much as 35 times slower. I have an unsubstantiated hunch that this is due to the use of Emscripten's EMULATE_FUNCTION_POINTER_CASTS option which is required to make all of the function pointer calls that CPython does work correctly.
UPDATE 2018-04-11: My hunch was wrong, and I was able to get to the bottom of the root cause and significantly speed up these benchmarks. See my post Profiling WebAssembly for more info.
I'd love to see improvements to the toolchain that close the performance gap. At this point, I don't personally know enough to anticipate how much work is involved.
Another current limitation is that all of the packages you anticipate you might need must be compiled and wrapped into a single large data file that is downloaded in its entirety to your browser before anything can start. It would be great to modularize that, so that packages are downloaded on demand. Related to that, it would also be helpful to modularize the build system so that individual packages can be added more independently. Conda build could potentially serve as a basis for that.
Check it out
The easiest way to play with this is to visit the example Pyodide notebook (EDIT: This link was fixed to a working version). (Note that this only works on Firefox right now. Chrome support is pending).
You can also get involved at pyodide github repository. Note that while Pyodide grew out of the needs of Iodide, there's nothing Iodide-specific about it, and it should be useful in other contexts where you want to embed a scientific Python stack in the browser. I'm pretty new to WebAssembly and I'd love any help, advice or comments to make this better.
Commentscomments powered by Disqus