This document describes the various components provided by the Cluster
Infrastructure (CI) package.
ICS CHANNELS (high-level and low-level)
An ICS channel is basically a TCP/IP connection which is established with
another node for intra-cluster communication.
Channels are the underlying transport through which cluster service
messages/RPCS are sent. Therefore, each cluster service needs to map to a
specific ICS channel. It is possible to group several cluster services
together as subservices sharing the same ICS channel (Read more about
Different ICS channels are created for purposes of flow control (also
known as throttling). When the system is under high load, channels will be
throttled according to their per-channel throttling variables.
Throttling does not occur on certain "priority" ICS channels. These
priority channels exist to prevent deadlock situations. Often times an
RPC message will be sent from Node A to Node B. The server-side routine
on Node B will in turn send an RPC/message back to Node A. However, Node
A's ICS channel could become throttled at this point. This is unfortunate
because the RPC/message sent by Node B would help free resources on Node A
but it cannot handle the message due to throttling. To avoid this kind of
deadlock, the ICS code will automatically raise the priority of an outgoing
RPC/message if it is being sent within the server-side routine of an
[ NOTE: In the current code version, ICS throttling is not fully implemented. ]
An ics_prio data member has been added to the task_struct structure to
keep track of the current ICS priority. A priority of 0 will
use the normal ICS channel mapped to that specific cluster service. Any
priority above 0 will use one of the 4 ICS priority channels defined.
There are 2 ICS channels used for sending RPC replies. The ics_reply_chan
is used for sending replies to non-priority RPC messages, while the
ics_reply_prio_chan is used to reply to all high priority RPCs. Like the
priority ICS channels, neither of these reply channels is ever throttled.
The current code defines a maximum of 12 ICS channels in a cluster. This
can be increased by tuning ICS_MAX_CHANNELS in the header file ics.h.
The current CI code sets up 8 different ICS channels. There are 4 priority
channels and 2 low-level reply channels, as mentioned above. In addition,
there are 2 channels defined to support cluster services. One of these
supports the CLMS, cluster API, and ICS forwarding services, and the other
is used specifically for probing the CLMS master on bootup. To be precise,
the ics_clms_probe_chan is a transient channel that gets setup and torn down
per probe sent. These probes are sent by coming-up nodes to query who the
CLMS master is.
For more information on how to add your own ICS channel, please read the
A cluster service consists of "service routines" that are run on each node
in the cluster and "service messages" that are sent amongst cluster nodes for
The current CI code defines 3 cluster services. These include the CLMS
service, the cluster API service, and the ICS signal forwarding service.
Each cluster service will need to send messages across the cluster and
therefore needs to communicate over an ICS channel.
The 3 cluster services are multiplexed on top of one ICS channel,
ics_clms_chan. As mentioned above, it is possible for several cluster
services to share one ICS channel and they will be flow-controlled as one
unit. There is a direct mapping from a cluster service to an ICS
channel+subservice. Thus, every cluster service is designated to send messages
over a specific ICS channel and it has a subservice number to distinguish its
messages from other cluster service messages being sent on the same channel.
The macro used to do this mapping is:
#define ICS_NSC_SVC(_chan,_subservice) ((_chan << 16) + _subservice)
You can also determine the ICS service channel/subservice number given
the cluster service using these macros:
#define ICS_CHAN(_cluster_svc) ((_cluster_svc) >> 16)
#define ICS_SSVC(_cluster_svc) ((_clustersvc) & ((1 << 16) - 1))
There is a limit of 6 subservices per ICS service channel, set by
ICS_NUM_SUBSERVICES in the header file ics.h. Given that there are
6 ICS channels that can be used for regular cluster services
(12 - 4 priority - 2 reply channels), this puts a limit of 36 cluster services
(6 ICS service channels * 6 subservices/channel) that can be defined in a
Each cluster service needs to register with ICS so that it can callback the
stub routines. ICS is also informed of the min/max number of available server
handles that should be maintained to handle incoming messages on this channel.
Icsgen is used to generate client/server stubs for the cluster service
messages. The stubs will insert the cluster service number in the ICS
message headers. Icsgen also generates a routine which registers the cluster
service with ICS for you.
Please read the ICSGEN document for more information.
Cluster services may need to run cleanup or initialization code when nodes
come up or go down in the cluster. This is done by registering a CLMS
subsystem. CLMS subsystems perform callbacks for a cluster service during
cluster NODEUP and NODEDOWN events.
Each CLMS subsystem calls register_clms_subsys() to register itself with CLMS.
This registration call takes in the name of the CLMS subsystem, function
pointers to the NODEUP and NODEDOWN callback routines, and the priority band
of the subsystem. During NODEUP/NODEDOWN events the subsystem
callbacks are performed (on each surviving cluster node) in order of priority
band. Subsystems within the same priority band are called in the order they
were registered with CLMS. The CI code defines a maximum of 15 CLMS subystems
in a cluster. This is can be tuned by modifying CLMS_MAX_SUBSYSTEMS in the
header file clms.h.
For more information on adding a subsystem to CLMS, please read the
CLMS KEY SERVICES
CLMS key services are created for cluster services that need to be centralized.
A key service is designated to run on one node within the cluster at
any point in time. Cluster nodes specify in their boot entries whether or
not they are able to serve as key service nodes for a particular key
Each key service registers a failover routine with CLMS by calling
clms_register_key_service(). When the node running the key service goes
down, CLMS will select another node to take over the key service and run
the registered failover callbacks on that new node. If the failover
key service node needs to pull data from the surviving nodes in the cluster,
it can do so if the key service has registered a pull_data routine. Once
the pulled data has been gathered, it runs a failover_data routine to
process this data. Finally, the key service is set to ready and the
whole cluster is notified of this new key service node.
Key services can be defined as either critical or non-critical. A critical
key service will delay the forming of a cluster until CLMS has designated a
key service node for it. Similarly, when a node serving a critical key service
goes does, the cluster will panic if CLMS cannot find a failover node for
that key service.
For more information on adding a key service to CLMS, please read the