Internals of GridMPI and YAMPI
($Date: 2007/05/22 14:51:56 $)
This document describes the basic software organization needed for
developers to modify GridMPI and YAMPI.
Table of Contents:
(in preparation)
Software Layers and Source Files
(MPI API)
fortran.c mpi1funcs.c mpi2funcs.c
|
(API written in MPI)
coll.c onesided.c fileio1.c fileio2.c fileio3.c
|
(MPI Core Functions)
communicator.c group.c keyval.c
operator.c topology.c errmsg.c misc.c
|
(Basic Communication)
sendrecv.c datatype.c packunpack.c
|
(Request Mangement)
request.c
|
(Initialization)
environment.c rpim.c process.c
|
(Utilities)
commops.c commdebugops.c prof.c ckpt.c
sockutil.c util-backtrace.c utility.c
|
(The belows are the extracts from the source. See the comments for
up-to-date descriptions).
Interfaces to P2P Layer
Operations in the p2p-layer is invoked through a table in
YamConHandle (Connection Handler).
- int (*topsend)(struct YamConHandle*, YamReq*, int)
- schedules a send of a YamReq. Typically, it enqueues a YamReq in
the send queue. The third argument of {topsend} indicates that it is
called from {_YampiPostSend}, in which case it is allowed to call
{_YampiPollP2P}, otherwise it is already called through it. It is
called by any thread.
- int (*toprecv)(struct YamConHandle*, YamReq*, YamReq*, int)
- is called when a pulled YamReq and a posted YamReq are matched.
It may be null, if the p2p-layer does not need it. It is called by
any thread.
- int (*netpush)(struct YamConHandle*)
- is regularly called to push messages from the send queue to the
network, when the select system call hits. It may be null, if the
p2p-layer redefines the polling routine. It is called by any thread.
- int (*netpull)(struct YamConHandle*)
- is also regularly called to pull messages from the network, when
the select system call hits. It may be null, if the p2p-layer
redefines the polling routine. It is called only through a monitor in
the polling routine.
- int (*netcancel)(struct YamConHandle*, YamReq*)
- is called by the request-layer to cancel YamReq. It is called
only for send YamReqs. Typically, it removes a YamReq from the send
queue.
- int (*netabort)(struct YamConHandle*)
- shutdowns the peer and makes it abort.
- int (*netclose)(struct YamConHandle*);
- is called at closure event or at finalization.
Upcall Interfaces from P2P Layer
Upcall interfaces from the p2p-layer are in request.c.
- YamReq *
_YampiCheckRecv(YamReq *req, void *data, int datasize)
- Checks received header for matching recv. It searches entries in
the expected queue, and replaces the request with the found one. It
returns either a request passed to or the one in the expected queue.
It returns null when the request is PULL_RDYREQ. MEMO: For transports
of TCP and Vendor-MPI, {data} is {req->yreq_anexdata}, and
{datasize} is {req->yreq_realsize}.
- int
_YampiFinishRecv(YamReq *req)
- Finishes a receive on receiving a whole message. It is invoked by
the P2P-layer and request-layer. Error condition is retuned (and
recorded in YamReq if not freed). YamReq (not PULL) is normally
unaccessible after {_YampiFinishRecv}. By convention, P2P-layer never
touch the YamReq after {_YampiFinishRecv}. MEMO: Unpacking is
performed for a recv request (not for a pull request).
- YamReqState
_YampiFinishSend(YamReq *req, int syncacked)
- Finishes a send request on sending a whole message, or marks
{ssend_acked} flag. State of send YamReq changes to SENT/DONE only
via this except in cancel cases. YamReq is normally unaccessible
after {_YampiFinishSend}. MEMO: Normally sender calls this, but
receiver can call this in case at receiving sync-ack or cancel-ack.
Both moves state from SENT to DONE and it is safe as not to interfere
with the sender. NOTE: {syncacked} can be true, when the mode is not
SSEND in IMPI for long messages.
Thread Mutexing
Thread mutexing is giant. Normally the request-layer and the
p2p-layer access requests (YamReq) exclusively, and there is not
needed to mutex each other. Only the DONE-transition needs mutexing,
and they are performed in {_YampiFinishSend} and {_YampiFinishRecv} in
normal operation.
Exclusion Monitors
There are two kinds of monitors. {_yampiPollMonitor} restricts
threads to one to enter the p2p-poll routine (which does select/poll
system calls). {_yampiP2PMonitor} protects recv YamReqs between pull
in the p2p-layer and takeover in request-layer.
The p2p-poll routine is mutexed by the request-layer and only one
thread can enter it. But, p2p-send (NetPush) must be mutexed by
itself. Normally, the first-entered p2p-send handles all requests,
and later sends immediately return.
{_yampiPollMonitor} also tracks the state of the p2p-poll routine
which may enter sleep state in select/poll. the p2p-layer should call
{_YampiPollSleepIn} and {_YampiPollSleepOut} around select/poll.
P2P Layer Interface for Exclusion
(The belows are the extracts from the source. See the comments for
up-to-date descriptions).
- int
_YampiPollSleepIn(void)
- Tells request-layer poll goes to sleep. It returns 1 if sleeping
is allowed. It should call {_YampiPollSleepOut} when
{_YampiPollSleepIn} has been called.
- void
_YampiPollSleepOut(void)
- Tells request-layer poll goes out of sleep or never goes to sleep.
It must be called when {_YampiPollSleepIn} have been called. It is
allowed not to call both of them.
- void (*_yampiP2PPollWake)(void)
- Wakes up a polling thread. It should be able to wake up a thread
even if it is not in the sleep state yet.
GridMPI/YAMPI provides dynamic loading of library modules.
The IMPI-specified CID (Communicator ID) assignment fails in races
under multi-thread environment.
Two ALLREDUCEs are used in the algorithm. One reduces CID value.
The other checks interference for other communicators. Lower CID of
the communicator has precedence. This precedence avoids locking
indefinitely. Note that our algorithm assumes mutexing within a
communicator. No concurrent operations on the same communicator for
dup/create/etc.