Internals of GridMPI and YAMPI

($Date: 2007/05/22 14:51:56 $)

This document describes the basic software organization needed for developers to modify GridMPI and YAMPI.

Table of Contents:



Overview

(in preparation)


Software Layers and Source Files

(MPI API)
fortran.c mpi1funcs.c mpi2funcs.c
(API written in MPI)
coll.c onesided.c fileio1.c fileio2.c fileio3.c
(MPI Core Functions)
communicator.c group.c keyval.c operator.c topology.c errmsg.c misc.c
(Basic Communication)
sendrecv.c datatype.c packunpack.c
(Request Mangement)
request.c
(Initialization)
environment.c rpim.c process.c
(Utilities)
commops.c commdebugops.c prof.c ckpt.c sockutil.c util-backtrace.c utility.c

P2P Layer Interface

(The belows are the extracts from the source. See the comments for up-to-date descriptions).

Interfaces to P2P Layer

Operations in the p2p-layer is invoked through a table in YamConHandle (Connection Handler).

int (*topsend)(struct YamConHandle*, YamReq*, int)
schedules a send of a YamReq. Typically, it enqueues a YamReq in the send queue. The third argument of {topsend} indicates that it is called from {_YampiPostSend}, in which case it is allowed to call {_YampiPollP2P}, otherwise it is already called through it. It is called by any thread.
int (*toprecv)(struct YamConHandle*, YamReq*, YamReq*, int)
is called when a pulled YamReq and a posted YamReq are matched. It may be null, if the p2p-layer does not need it. It is called by any thread.
int (*netpush)(struct YamConHandle*)
is regularly called to push messages from the send queue to the network, when the select system call hits. It may be null, if the p2p-layer redefines the polling routine. It is called by any thread.
int (*netpull)(struct YamConHandle*)
is also regularly called to pull messages from the network, when the select system call hits. It may be null, if the p2p-layer redefines the polling routine. It is called only through a monitor in the polling routine.
int (*netcancel)(struct YamConHandle*, YamReq*)
is called by the request-layer to cancel YamReq. It is called only for send YamReqs. Typically, it removes a YamReq from the send queue.
int (*netabort)(struct YamConHandle*)
shutdowns the peer and makes it abort.
int (*netclose)(struct YamConHandle*);
is called at closure event or at finalization.

Upcall Interfaces from P2P Layer

Upcall interfaces from the p2p-layer are in request.c.

YamReq * _YampiCheckRecv(YamReq *req, void *data, int datasize)
Checks received header for matching recv. It searches entries in the expected queue, and replaces the request with the found one. It returns either a request passed to or the one in the expected queue. It returns null when the request is PULL_RDYREQ. MEMO: For transports of TCP and Vendor-MPI, {data} is {req->yreq_anexdata}, and {datasize} is {req->yreq_realsize}.
int _YampiFinishRecv(YamReq *req)
Finishes a receive on receiving a whole message. It is invoked by the P2P-layer and request-layer. Error condition is retuned (and recorded in YamReq if not freed). YamReq (not PULL) is normally unaccessible after {_YampiFinishRecv}. By convention, P2P-layer never touch the YamReq after {_YampiFinishRecv}. MEMO: Unpacking is performed for a recv request (not for a pull request).
YamReqState _YampiFinishSend(YamReq *req, int syncacked)
Finishes a send request on sending a whole message, or marks {ssend_acked} flag. State of send YamReq changes to SENT/DONE only via this except in cancel cases. YamReq is normally unaccessible after {_YampiFinishSend}. MEMO: Normally sender calls this, but receiver can call this in case at receiving sync-ack or cancel-ack. Both moves state from SENT to DONE and it is safe as not to interfere with the sender. NOTE: {syncacked} can be true, when the mode is not SSEND in IMPI for long messages.

Thread Mutexing

Thread mutexing is giant. Normally the request-layer and the p2p-layer access requests (YamReq) exclusively, and there is not needed to mutex each other. Only the DONE-transition needs mutexing, and they are performed in {_YampiFinishSend} and {_YampiFinishRecv} in normal operation.

Exclusion Monitors

There are two kinds of monitors. {_yampiPollMonitor} restricts threads to one to enter the p2p-poll routine (which does select/poll system calls). {_yampiP2PMonitor} protects recv YamReqs between pull in the p2p-layer and takeover in request-layer.

The p2p-poll routine is mutexed by the request-layer and only one thread can enter it. But, p2p-send (NetPush) must be mutexed by itself. Normally, the first-entered p2p-send handles all requests, and later sends immediately return.

{_yampiPollMonitor} also tracks the state of the p2p-poll routine which may enter sleep state in select/poll. the p2p-layer should call {_YampiPollSleepIn} and {_YampiPollSleepOut} around select/poll.

P2P Layer Interface for Exclusion

(The belows are the extracts from the source. See the comments for up-to-date descriptions).

int _YampiPollSleepIn(void)
Tells request-layer poll goes to sleep. It returns 1 if sleeping is allowed. It should call {_YampiPollSleepOut} when {_YampiPollSleepIn} has been called.
void _YampiPollSleepOut(void)
Tells request-layer poll goes out of sleep or never goes to sleep. It must be called when {_YampiPollSleepIn} have been called. It is allowed not to call both of them.
void (*_yampiP2PPollWake)(void)
Wakes up a polling thread. It should be able to wake up a thread even if it is not in the sleep state yet.

Extension Mechanism (Library Loading)

GridMPI/YAMPI provides dynamic loading of library modules.


IMPI Issues

CID Assignment under Threads

The IMPI-specified CID (Communicator ID) assignment fails in races under multi-thread environment.

Two ALLREDUCEs are used in the algorithm. One reduces CID value. The other checks interference for other communicators. Lower CID of the communicator has precedence. This precedence avoids locking indefinitely. Note that our algorithm assumes mutexing within a communicator. No concurrent operations on the same communicator for dup/create/etc.


Appendix