30 May 2009

Python MPI, with C++

The front end to my experimental code is in Python (with C++ running all the important stuff, and SWIG coupling them), which is how I want it to stay. Sooner or later my problems are going to outgrow my computer, and I'll need to run them on a cluster in parallel using MPI. It wasn't clear whether this was possible, or how it was to be done. But Python at its heart is C, and the shared objects it runs with are just code. I knew that, for example, Trilinos was somehow able to use both MPI and Python; so it's certainly possible.

The first thing I did was to search for implementations where people just used MPI with Python (without worrying about the C++ side). There are several such implementations, including the apparently stillborn pyMPI, the numpy-dependent Pypar, the basic mpi4py, and Python bindings to MPI through Boost::Python and Boost.MPI. Boost.MPI looked like the best implementation of them all, but unfortunately I could not for the life of me figure out where it was in the distribution or how to install it. (It turns out the version I was using, 1.36, didn't have it, but even in the latest 1.39 I can't see how to install the Python module.)

So the first task I did was to install mpi4py, which was trivial, and to write a quick program. mpi4py is a laughably thin front-end, so the code is not at all Pythonic.
from mpi4py import MPI
print "Hi, I am proc %d of %d" % (mpi.rank + 1, mpi.size)

Then, to run it, mpirun -np 2 python run.py

That worked fine. Surprising to me was how easily the MPI calls in C++ worked with this. I created a small class, Thing, with a member function Thing::sendRecv() that did some MPI_Send and MPI_Recv calls (depending on whether the processor was rank 0 or not), and return. I put a quick SWIG wrapper around it, compiled it to generate _thing.so and thing.py, and changed my Python code to:
from mpi4py import MPI
import thing
print "Hi, I am proc %d of %d" % (comm.Get_rank(), comm.Get_size())
t = thing.Thing()

Again, mpirun -np 2 python run.py, and it worked! The subroutine did its thing and returned with no problem. It looks like using both MPI and Python with scientific computing is good to go.

After this was working, I decided to try to use the "standalone" boost.mpi, but that was opening up a whole other can of worms. My excursion into the land of otool and bjam will be detailed soon. Suffice it to say that, once I got boost.mpi compiled and linked against the right version of Python, the same code (except the MPI in Python looked much better).

I surmise that the use of any Python MPI library can be rendered unnecessary as long as your C++ code calls MPI_Init and MPI_Finalize in the right places.


Post a Comment