Components of CI

BACK
This document describes the various components provided by the Cluster
Infrastructure (CI) package.


ICS CHANNELS (high-level and low-level)
========================================
An ICS channel is basically a TCP/IP connection which is established with
another node for intra-cluster communication.

Channels are the underlying transport through which cluster service 
messages/RPCS are sent.  Therefore, each cluster service needs to map to a 
specific ICS channel.  It is possible to group several cluster services 
together as subservices sharing the same ICS channel (Read more about 
subservices below).

Different ICS channels are created for purposes of flow control (also
known as throttling).  When the system is under high load, channels will be 
throttled according to their per-channel throttling variables.

Throttling does not occur on certain "priority" ICS channels.  These
priority channels exist to prevent deadlock situations.  Often times an
RPC message will be sent from Node A to Node B.  The server-side routine
on Node B will in turn send an RPC/message back to Node A.  However, Node
A's ICS channel could become throttled at this point.  This is unfortunate 
because the RPC/message sent by Node B would help free resources on Node A
but it cannot handle the message due to throttling.  To avoid this kind of
deadlock, the ICS code will automatically raise the priority of an outgoing
RPC/message if it is being sent within the server-side routine of an 
incoming RPC.

[ NOTE: In the current code version, ICS throttling is not fully implemented. ]

An ics_prio data member has been added to the task_struct structure to 
keep track of the current ICS priority.  A priority of 0 will
use the normal ICS channel mapped to that specific cluster service.  Any
priority above 0 will use one of the 4 ICS priority channels defined.

There are 2 ICS channels used for sending RPC replies.  The ics_reply_chan
is used for sending replies to non-priority RPC messages, while the 
ics_reply_prio_chan is used to reply to all high priority RPCs.  Like the
priority ICS channels, neither of these reply channels is ever throttled.

The current code defines a maximum of 12 ICS channels in a cluster.  This
can be increased by tuning ICS_MAX_CHANNELS in the header file ics.h.
The current CI code sets up 8 different ICS channels.  There are 4 priority 
channels and 2 low-level reply channels, as mentioned above.  In addition, 
there are 2 channels defined to support cluster services.  One of these 
supports the CLMS, cluster API, and ICS forwarding services, and the other 
is used specifically for probing the CLMS master on bootup.  To be precise,
the ics_clms_probe_chan is a transient channel that gets setup and torn down 
per probe sent. These probes are sent by coming-up nodes to query who the 
CLMS master is.

For more information on how to add your own ICS channel, please read the 
enhancing.txt document.



CLUSTER SERVICES
=================
A cluster service consists of "service routines" that are run on each node 
in the cluster and "service messages" that are sent amongst cluster nodes for
communication.

The current CI code defines 3 cluster services.  These include the CLMS 
service, the cluster API service, and the ICS signal forwarding service.
Each cluster service will need to send messages across the cluster and 
therefore needs to communicate over an ICS channel.

The 3 cluster services are multiplexed on top of one ICS channel, 
ics_clms_chan.  As mentioned above, it is possible for several cluster 
services to share one ICS channel and they will be flow-controlled as one 
unit.  There is a direct mapping from a cluster service to an ICS 
channel+subservice. Thus, every cluster service is designated to send messages 
over a specific ICS channel and it has a subservice number to distinguish its 
messages from other cluster service messages being sent on the same channel. 
The macro used to do this mapping is:

#define ICS_NSC_SVC(_chan,_subservice)  ((_chan << 16) + _subservice)

You can also determine the ICS service channel/subservice number given
the cluster service using these macros:

#define ICS_CHAN(_cluster_svc)          ((_cluster_svc) >> 16)
#define ICS_SSVC(_cluster_svc)          ((_clustersvc) & ((1 << 16) - 1))

There is a limit of 6 subservices per ICS service channel, set by 
ICS_NUM_SUBSERVICES in the header file ics.h.  Given that there are
6 ICS channels that can be used for regular cluster services 
(12 - 4 priority - 2 reply channels), this puts a limit of 36 cluster services 
(6 ICS service channels * 6 subservices/channel) that can be defined in a 
cluster.

Each cluster service needs to register with ICS so that it can callback the
stub routines.  ICS is also informed of the min/max number of available server
handles that should be maintained to handle incoming messages on this channel.

Icsgen is used to generate client/server stubs for the cluster service 
messages.  The stubs will insert the cluster service number in the ICS 
message headers.  Icsgen also generates a routine which registers the cluster
service with ICS for you.

Please read the ICSGEN document for more information.



CLMS SUBSYSTEMS
================
Cluster services may need to run cleanup or initialization code when nodes 
come up or go down in the cluster. This is done by registering a CLMS 
subsystem.  CLMS subsystems perform callbacks for a cluster service during 
cluster NODEUP and NODEDOWN events.

Each CLMS subsystem calls register_clms_subsys() to register itself with CLMS.
This registration call takes in the name of the CLMS subsystem, function 
pointers to the NODEUP and NODEDOWN callback routines, and the priority band 
of the subsystem.  During NODEUP/NODEDOWN events the subsystem
callbacks are performed (on each surviving cluster node) in order of priority 
band.  Subsystems within the same priority band are called in the order they
were registered with CLMS.  The CI code defines a maximum of 15 CLMS subystems
in a cluster.  This is can be tuned by modifying CLMS_MAX_SUBSYSTEMS in the
header file clms.h.

For more information on adding a subsystem to CLMS, please read the
enhancing.txt document.



CLMS KEY SERVICES
==================
CLMS key services are created for cluster services that need to be centralized.
A key service is designated to run on one node within the cluster at
any point in time. Cluster nodes specify in their boot entries whether or
not they are able to serve as key service nodes for a particular key
service.

Each key service registers a failover routine with CLMS by calling
clms_register_key_service(). When the node running the key service goes
down, CLMS will select another node to take over the key service and run
the registered failover callbacks on that new node. If the failover
key service node needs to pull data from the surviving nodes in the cluster,
it can do so if the key service has registered a pull_data routine. Once
the pulled data has been gathered, it runs a failover_data routine to
process this data. Finally, the key service is set to ready and the
whole cluster is notified of this new key service node.

Key services can be defined as either critical or non-critical. A critical
key service will delay the forming of a cluster until CLMS has designated a
key service node for it. Similarly, when a node serving a critical key service
goes does, the cluster will panic if CLMS cannot find a failover node for
that key service.

For more information on adding a key service to CLMS, please read the
enhancing.txt document.