MPI Implementation Status

($Date: 2007/11/12 06:11:59 $)

This document describes the implementation status and the notes on the use of MPI-2 features.

Contents


1. Known Problems and Compatibility Issues

Known Problems
threads supportCommunicator creation functions are not thread-safe in GridMPI, while the other functions are thread-safe. Extending the IMPI protocol for thread-safety is still under design. Note, in contrast, communicator creation functions in YAMPI are thread-safe.
pack_external
unpack_external
long double is not properly packed/unpacked in architecture neutral format, because of the diversity of long double formats -- Intel uses 96bits, IBM/Power uses a pair of doubles, and so on.
Compatibility Issues
pack_external
unpack_external
long data are packed into 64bit whereas MPI-2 specifies they are in 32bits. This non-standard behavior is for 64bit machines (most programs will fail with using 32bits for long data). Setting environment variable _YAMPI_COMMON_PACK_SIZE makes them to the standard behavior.
MPI_CancelCancellation is a non-local operation, opposed to the MPI specification in which MPI_Wait after MPI_Cancel is a local operation (MPI-1 §3.8). GridMPI/YAMPI may wait at MPI_Wait for the response from the peer receiver in order to check the status of a send request. Note that this behavior is compatible to many other MPI implementations.
MPI_Comm_freeCommunicators should not be freed before completion of send/recv, because they are not reference-counted in GridMPI/YAMPI. However, proper (non-erroneous) send/recv will be completed without communicators, because CIDs (communicator IDs) are valid after MPI_Comm_free (ie, they are not checked of their validity). Errors may not properly be handled after communicators are freed. Note, in contrast, datatypes are reference-counted.
mpi.h (headers)The header file definitions of GridMPI/YAMPI are almost compatible to the ones of MPICH-1 (but not identical).

See MX Implementation Status for support levels of Myrinet/MX.

See Checkpoint/Restart Implementation Status for support levels of checkpointing feature.


2. Implementation Status Summary

This part of the document roughly follows the format of the memo Status of the MPICH implementation of MPI-1 and MPI-2. It is organized by sections of the MPI-2 Standard.

Section Support
MPI-1.2

AllYes
MPI-2: Miscellany
4.1Portable MPI Process Startup (mpiexec) No
4.2Passing NULL to MPI_InitYes
4.4MPI_TYPE_CREATE_INDEXED_BLOCKYes
4.5MPI_STATUS_IGNORE
MPI_STATUSES_IGNORE
Yes
4.6Error Class for Invalid Keyval No. GridMPI/YAMPI does not distinguish target objects of keyvals.
4.7Committing a Committed Datatype Yes. GridMPI/YAMPI does nothing by commit.
4.8Allowing User Functions at Process TerminationYes
4.9Determining Whether MPI Has FinishedYes
4.10The Info ObjectYes
4.11Memory Allocation Yes. GridMPI/YAMPI implements memory allocators by malloc/free.
4.12Language Interoperability No. Fortran KIND is not yet implemented.
4.13Error Handlers (on communicators, windows, and datatypes) Yes. Note that GridMPI/YAMPI does not distinguish target objects of error handlers, and no errors are signaled when an error handler is set to an inappropriate object.
4.14New Datatype Manipulation FunctionsYes
4.15New Predefined DatatypesYes
4.16Canonical MPI_PACK and MPI_UNPACKYes
4.17Functions and Macros Yes. GridMPI/YAMPI implements all interfaces as functions.
4.18Profiling InterfaceYes
MPI-2: Process Creation and Management
5.3Process Manager Interface Yes (lightly tested). (*NOTE1)
5.4Establishing Communication Yes (lightly tested). (*NOTE1)
5.5Other Functionality Yes (lightly tested). (*NOTE1)
MPI-2: One-Sided Communications
6.2Initialization Yes (lightly tested). (*NOTE2)
6.3Communication Calls Yes (lightly tested). (*NOTE2)
6.4Synchronization Calls Yes (lightly tested). (*NOTE2)
6.6Error Handling Yes (lightly tested). (*NOTE2)
MPI-2: Extended Collective Operations
7.2Intercommunucator ConstructorsYes
7.3Extended Collective OperationsYes
MPI-2: External Interfaces
8.2Generalized RequestsYes
8.3Associating Information with StatusYes
8.4Naming ObjectsYes
8.5Error Classes, Codes, and HandlersYes
8.6Decoding a DatatypeYes
8.7MPI and Threads Yes (lightly tested). (*NOTE2)
8.8New Attribute Caching FunctionsYes
8.9Duplicating a DatatypeYes
MPI-2: MPI-I/O
9.2File Manipulation Yes (lightly tested). (*NOTE2, *NOTE3)
9.3File Views Yes (lightly tested). (*NOTE2, *NOTE3)
9.4Data Access Yes (lightly tested). (*NOTE2, *NOTE3)
9.5File Interoperability Yes (lightly tested). (*NOTE2, *NOTE3)
9.6Consistency and Semantics Yes (lightly tested). (*NOTE2, *NOTE3)
9.7I/O Error Handling Yes (lightly tested). (*NOTE2, *NOTE3)
9.8I/O Error Classes Yes (lightly tested). (*NOTE2, *NOTE3)
MPI-2: Language Bindings
10.1C++ Yes
10.2Fortran Support No. MPI module is not supported.

NOTES:


3. Threads

GridMPI/YAMPI supports multiple-threads (mostly). Thread support is configuration option and it is enabled by default. The thread model is MPI_THREAD_MULTIPLE (in the MPI-2 terminology). There are no implicit progress threads in GridMPI/YAMPI. Thus, at least one thread is needed in the polling routine (and may be in the wait state) to assure progress and to consume incoming messages.

Thread support is enabled when a program calls MPI_Init_thread for initialization as defined by MPI-2. Or, GridMPI/YAMPI provides an environment variable _YAMPI_THREADS, setting one to it makes thread support be enabled even when a program calls MPI_Init for initialization.

Put the setting of the environment variable in ".profile", ".bashrc", or ".cshrc" when a program uses MPI_Init.

(For sh/ksh/bash)
_YAMPI_THREADS=1; export _YAMPI_THREADS
(For csh/tcsh)
setenv _YAMPI_THREADS 1

Communicator creation operations are not thread-safe in GridMPI. (It has still a bug left). Ones in YAMPI are thread-safe. Note also that the collectives are not thread-safe by definition of the MPI standard, when the same communicators are used.


4. Onesided

Onesided in GridMPI/YAMPI (in GridMPI-2.0) is written with MPI and threads. It is not optimized for particular communication medium.

It needs MPI to enable threads by using MPI_Init_thread or using the environment variable _YAMPI_THREADS=1 when applications do use MPI_Init. Put the setting of the environment variable in ".profile", ".bashrc", or ".cshrc" when a program uses MPI_Init.

(For sh/ksh/bash)
_YAMPI_THREADS=1; export _YAMPI_THREADS
(For csh/tcsh)
setenv _YAMPI_THREADS 1

NOTE: The implementation of MPI_Win_fence in GridMPI/YAMPI (GridMPI-2.0) is slow which uses barriers three times.


5. Spawning

Spawning in GridMPI/YAMPI (GridMPI-2.0) needs to pass the reserved number of processes to mpirun by the -nr argument. The number passed to -nr includes statically started processes, thus it should not be less than -np.

$ mpirun -np 4 -nr 8 ./a.out

Spawning in GridMPI/YAMPI (GridMPI-2.0) is only supported for the TCP p2p-transport layer, currently. The other p2p-layers such as Myrinet/MX, PM/SCore, and Vendor-MPI do not support spawning. For Vendor-MPI, it is because not all commercial MPI do support spawning. SCore does not support spawning, either. The Myrinet/MX p2p-layer can support spawning but it is simply not implemented yet.


6. Connect/Accept

Connecting/accepting in GridMPI/YAMPI (GridMPI-2.0) needs to pass the distinguishing ID of the worlds to mpirun by the -wid argument. The WID (World ID) is any distinct integer other than -1 (-1 is used for WID disabled to use connecting). Each invocation of mpirun should be given a distinct value, when the processes are to be connected. Otherwise, GridMPI/YAMPI confuses the processes are local or remote. Care be taken because the error is hard to diagnose.

$ mpirun -wid 10 -np 4 ./a.out &
$ mpirun -wid 11 -np 4 ./a.out

A name server is needed to service functions MPI_Publish_name, MPI_Unpublish_name, and MPI_Lookup_name. GridMPI/YAMPI includes a very poor name server mpinamed. It is run with an environment variable _YAMPI_NAMESERVER, which specifies the hostname:port to listen to. mpinamed ignores the hostname part and just uses the port number.

(For sh/ksh/bash)
_YAMPI_NAMESERVER=hostname:port; export _YAMPI_NAMESERVER
$ mpinamed &
(For csh/tcsh)
setenv _YAMPI_NAMESERVER hostname:port
% mpinamed &

Connecting in GridMPI/YAMPI (GridMPI-2.0) is only through the IMPI p2p-transport layer. When IMPI is used, each world is given a WID which is a client ID. Since the client IDs cannot be used to distinguish multiple instances of IMPI invocations, -wid should be given to override WID taken from the client ID.

NOTE: The world of GridMPI (using the IMPI protocol) is created as one that is considered to be connected, merged, and its ranks are reordered during MPI_Init. Each run of mpirun should be given distinct WIDs.


7. IO

MPI-IO in GridMPI/YAMPI (GridMPI-2.0) is written with MPI and threads.

It needs MPI to enable threads by using MPI_Init_thread or using the environment variable _YAMPI_THREADS=1 when applications do use MPI_Init. Put the setting of the environment variable in ".profile", ".bashrc", or ".cshrc" when a program uses MPI_Init.

(For sh/ksh/bash)
_YAMPI_THREADS=1; export _YAMPI_THREADS
(For csh/tcsh)
setenv _YAMPI_THREADS 1

Currently, it only supports shared file systems, such as NFS. It assumes POSIX semantics of file locking to mutex accesses, and normally works with NFS4. Note that file caching is not strictly consistent in NFS3.


($Date: 2007/11/12 06:11:59 $)