CFD Online Discussion Forums - OpenCFD enhances OpenFOAM's parallel communications

OpenCFD enhances OpenFOAM's parallel communications.

OpenFOAM, the open source CFD toolbox, is renowned for its robust parallel communication using domain decomposition. In this article we review the parallel communication in OpenFOAM and describe developments by OpenCFD that will be available in its next release.

- LAM and OpenMPI
OpenFOAM is currently shipped with one public domain MPI implementation, LAM, which has proven to give good performance and be extremely stable. The next release of OpenFOAM will additionally be shipped with another public domain MPI implementation, OpenMPI. OpenMPI is the amalgamation of three separate MPI projects including LAM. Compared to LAM, OpenMPI is more configurable, has more supported interconnects and automatic usage of all TCP networks.

- GAMMA
In addition, the next release of OpenFOAM will include a completely new Pstream implementation which uses the Genoa Active Message MAchine (GAMMA) communication library. GAMMA is a low-latency replacement for TCP/IP on gigabit and is supported for Intel platforms on modern Linux kernels (both 32 and 64 bit). It completely bypasses the Linux network stack to produce record breaking latency figures.

The current OpenFOAM release can be configured to run with the MPI compatibility layer on top of GAMMA (MPI-GAMMA), and has been extensively tested on OpenCFD's 16 processor cluster. The lower latency affects especially those cases where the number of processors is large and the number of cells and processor faces is low. In real terms, using GAMMA instead of LAM or OpenMPI will give improved run times ranging from a few percent for large cases running on a small number of processors to a few hundred percent for small cases on a large number of processors (where the other MPI libraries can actually cause a decrease in run times compared to a non-parallel run).

For the next release of OpenFOAM, OpenCFD have been working closely with GAMMA developer Giuseppe Ciaccio to implement a direct GAMMA driver that bypasses the MPI layer. Because of the nature of the GAMMA protocol, the MPI layer causes a small overhead and bypassing it gives some speed-up especially in transmission of small messages. Taking the example of an unrealistically small testcase - 1000 cells per processor, 16 processors - we have seen a 30% improvement in run time. For more realistic test cases the benefits will be smaller. Apart from potential performance improvement the direct GAMMA driver also has dynamic-receive buffer sizing, removing the need to adapt the $MPI_BUFFER_SIZE environment variable for cases with extreme number of processor faces. We have found the direct GAMMA driver and MPI-GAMMA to be absolutely stable during running.

The improvement in communication speed of GAMMA over other MPI libraries is currently offset somewhat by: (1) problems relating to startup and shutdown of jobs; and, (2) the installation of the GAMMA library being difficult, requiring additional hardware in the form of a dedicated gigabit connection and switch and requiring patching of the linux kernel. However GAMMA is actively maintained and documentation is available for all steps of the installation.

Overall, working with GAMMA has been a very interesting experience for the OpenFOAM developers at OpenCFD. We feel there is a definite need for a public domain low-latency protocol on commodity hardware and GAMMA is by far the best candidate. It has been shown to be extremely applicable to the typical communications pattern of a domain decomposed CFD code.

We support the project and are reaping the benefit of faster communications on our own Linux cluster. We invite other OpenFOAM users running on a cluster to try GAMMA themselves and provide feedback of their experience to the GAMMA project.

http://www.openfoam.org/parallel1.4.html
Copyright (c) OpenCFD Ltd. 2006