This project is developing a common infrastructure for
Linux clustering by extending the CLuster Membership Subsystem
("CLMS") and Internode Communication Subsystem ("ICS") of the
The Sourceforge.net project summary page is located
Both the SSI and CI code are being released under the
GNU General Public
Licence (GPL), version 2. This is the same license used by the
- Get Cluster Infrastruture 0.9.9 for
Red Hat 9
- Run a Virtual CI Cluster with UML
and the 22.214.171.124 Kernel
- To checkout CI, first login to the CVS servers. Press Enter when
prompted for a password:
$ cvs -d:pserver:firstname.lastname@example.org:/cvsroot/ci-linux login
- Many developers work with a Fedora Core 1 version of the
You can access it by checking out the OPENSSI-RH branch
of the repository:
$ cvs -z3 -d:pserver:email@example.com:/cvsroot/ci-linux co -r OPENSSI-RH ci
- The future of the CI kernel is on 2.6. You can access it by
checking out the trunk of the repository:
$ cvs -z3 -d:pserver:firstname.lastname@example.org:/cvsroot/ci-linux co ci
- SourceForge has provided some nice documentation about their CVS services.
- Sign up to receive checkin messages.
You can find contributed patches and such here.
See the mailing list archive for context.
- Discussion list for developers and users
- Notification of CVS checkins
- Provide a cluster infrastructure that can be used as the
basis for many different cluster products and projects,
including HA failover, load leveling, parallel filesystem,
HPC and Single
System Image (SSI).
- Provide a membership service functional enough for each of
the cluster environments with an easy way to add subsystems
to the environment and a set of APIs to build cluster
subsystems and cluster-aware applications.
- Modularity, so other instances of nodedown detection, i/o
fencing, split-brain detection and intra-node communication can
- Flexibility so the infrastructure can be tuned for different
performance constraints and/or membership policy choices.
Also flexibility in when the cluster is formed (SSI needs it
formed at boot time and more loosely-coupled clusters want the
ability to join and leave at more arbitrary times.
- Minimal kernel hooks that do not impact the performance of
CI projects with available
Integration of CI with DLM
Integration of CI with UML
Dual path interconnect with transparent failover
Jaideep, En Chiang
Ongoing CI projects
Enforce membership via node numbers and/or IP/MAC addresses
Integration with STOMITH
Split brain detection/avoidance
Jaideep, Roger Tsang
Convert code to be dynamically added modules
CI projects not yet started
Clusters across subnets
Integration of CI with Heartbeat
Integration of CI with DRBD
Integration of CI with Beowulf
Integration of CI with EVMS
There are two major components to the Cluster Infrastructure at this
time - Cluster Membership (CLMS) and Internode Communication
Cluster Membership (CLMS)
Internode Communication Subsystem (ICS)
- Configurable to be initiated before/after root mounting.
- Co-ordinated with ICS to setup connections in the kernel
before/after initialization of TCP/IP stack.
- Handles initial cluster formation and adding/losing
nodes at later times
- Online adding of new nodes
- Strict maintenance of membership, even in the face
or arbitrary node failures
- Set of membership APIs (libcluster), including a
membership transition history guaranteed to go forward and
which is synchronized/replicated amoung all nodes, even
those that boot later
- Coordination of cleanup/teardown when a node goes down
- Ensures a node doesn't rejoin before everyone has
finished all cleanup processing
- Master-driven membership algorithm with rapid failover
- Set of known nodes (known as candidates) which can be
CLMS master (lilo extensions for now)
- Nodedown detection and API driven membership
- Architected to allow a largely separate kernel nodedown
- Architected to allow policy hooks for nodedown and nodeup
- Can signal SIGCLUSTER to processes which want it on
any cluster membership transition
- Optionally manages a node thru a set of states from
booting to appropriate run level to shutdown or failure
- Optionally co-ordinates with init to bring the cluster
as a whole to and thru designed run levels
- Optional registration system so kernel subsystems can
be called for nodeup and nodedown events
- Optional key service management system (more on key
- Could be easily adapted for loosely coupled clusters
(non-SSI, non-shared root HA clusters)
- Run in clusters up to 30 nodes and designed to
accommodate much larger clusters
- Running in a 2.4.x Linux kernel with minimal patches
- Architected as a kernel-to-kernel communication subsystem
- Designed to be able to start up connections before/after
initialization of TCP/IP stack.
- Could be used in more loosely coupled cluster environments
- Works with CLMS to form a tightly coupled (membershipwise)
environment where all nodes agree on the membership list and have
communication with all other nodes
- There is a set of communication channels between each node,
flow control is per channel
- Supports variable message size (at least 64K messages)
- Queueing of outgoing messages
- Dynamic service pool of kernel server processes
- Out-of-line data type for large chunks of data and transports
that support pull or push DMA
- Priority of messages to avoid deadlock
- Incoming message queueing
- Nodedown interfaces and co-ordination with CLMS and subsystems
- Nodedown code to error out outgoing messages, flush incoming
messages and kill/waitfor server processes processing messages from
the node that went down
- Architected with transport independent and dependent pieces
(has run with tcp/ip and ServerNet)
- Supports 3 communication paradigms:
Very simple generation language (ICSgen)
Works with XDR/RPCgen
Handles signal forwarding from a client node to a node providing
service, to allow interruption or job control
Operational in a Linux 2.4.x kernel with minimal patches
- one way messages
- traditional RPCs, where client must synchronously wait
- request/response or async RPC, where requestor can
choose when to wait for response
- The Cluster Infrastructure (CI) download is completely
operational on 32-bit Intel machines (including SMP).
- We have limited testing of clusters up to 10 nodes in size
and with nodes numbered up to 50
- The nodedown detection code is currently primitive and takes
up to 20 seconds to detect a nodedown. Soon we expect to improve
this to subsecond detection. Nodedown cleanup, reconciliation
and notification after detection is measured at <100 milliseconds
for 3 node clusters (larger clusters shouldn't take much longer but
haven't been measured).
- ICS channel flow control / throttling isn't working yet. We
have to figure out the appropriate low memory conditions to cause
- Include cluster directory in make. (#ifdef CONFIG_CLUSTER)
- Documentation for Cluster features.
- Add Cluster features to config menu.
- Turn Clustering features on by default.
- Add sys_ssisys to system call jump table. (should
rename this or use /proc to get/receive information)
- Add arch-independent read/write trylock routines.
- Add arch-specific read/write trylock implementations.
- Add FASTCALLs for read/write trylocks.
- Add library implementations of read/write trylocks.
- #define ssisys system call number. (should rename
this or use /proc to get/receive information)
- Ignore SIGCLUSTER by default.
- #define SIGMIGRATE and SIGCLUSTER signals (SIGMIGRATE
actually for process migration and not needed for this project)
- Ignore SIGCLUSTER by default.
- Added routine do_sigtoallproc_local() to signal all
local processes. Used for SIGCLUSTER notification after a
membership event (#ifdef CONFIG_CLUSTER)
- In count_active_tasks(), don't count task if its
is_kthread flag is set and task is in TASK_UNINTERRUPTIBLE
sleep. (#ifdef CONFIG_CLUSTER) (should be able to do this
another way and eliminate this hook)
- In get_pid(), allocate clusterwide pid numbers.
- Increase PID_MAX for clusterwide pids. (#ifdef CPID)
- Include icsprio member in task_struct. (#ifdef CONFIG_ICS)
- Include is_kthread flag in task_struct. (#ifdef
CONFIG_CLUSTER) (should be able to eliminate these hooks)
- Add hooks to process cluster parameters from command
line. (#ifdef CONFIG_CLUSTER)
- Add hooks to call cluster_main_init_preroot /
cluster_main_init_postroot. (#ifdef CONFIG_CLUSTER)
- Define IFF_ICS interface flag for ICS interfaces.
- Add ics_flags member to net_device structure to hold
ICS-related interface flags. (#ifdef CONFIG_ICS)
- Prevent shutdown of interfaces with IFF_ICS flag set.
- Add ics_recvmsg() routine. (#ifdef CONFIG_ICS)
- Reconciliation and integration with STOMITH and Split-brain code not done
- Kernel hooks could be reduced.
- No clean way for a node to leave the cluster (besides a reboot).
- There is no configuration information on who the allowed cluster members are, just who the possible Membership Masters are.
- Support for overlapping clusters has been requested.
- Support for hierarchical clusters has been requested, where the impact might be that non-core nodes wouldn't be monitored for nodedown nearly as frequently as core nodes containing resources that have to be quickly recovered.
- Have libcluster use a /proc interface instead of or in addition to the system call.
- Clean up SMP locking to not go thru an intermediate set of macros.
- Layer some of the other cluster projects, like