[an error occurred while processing this directive]
This project is developing a common infrastructure for Linux clustering by extending the CLuster Membership Subsystem ("CLMS") and Internode Communication Subsystem ("ICS") of the OpenSSI project.The Sourceforge.net project summary page is located here.
Both the SSI and CI code are being released under the GNU General Public Licence (GPL), version 2. This is the same license used by the Linux kernel.
- Get Cluster Infrastruture 0.9.9 for Red Hat 9
- How to install
- Download now (15M)
- You can download a RPM of the kernel source (38M) for doing things like building third-party drivers against it. Simply install it and you can find the source under /usr/src/linux-2.4-ci.
- Run a Virtual CI Cluster with UML and the 2.6.8.1 Kernel
- To checkout CI, first login to the CVS servers. Press Enter when prompted for a password:
$ cvs -d:pserver:anonymous@ci-linux.cvs.sourceforge.net:/cvsroot/ci-linux login
- Many developers work with a Fedora Core 1 version of the kernel. You can access it by checking out the OPENSSI-RH branch of the repository:
$ cvs -z3 -d:pserver:anonymous@ci-linux.cvs.sourceforge.net:/cvsroot/ci-linux co -r OPENSSI-RH ci
- The future of the CI kernel is on 2.6. You can access it by checking out the trunk of the repository:
$ cvs -z3 -d:pserver:anonymous@ci-linux.cvs.sourceforge.net:/cvsroot/ci-linux co ci
- SourceForge has provided some nice documentation about their CVS services.
- Sign up to receive checkin messages.
You can find contributed patches and such here. See the mailing list archive for context.
- Provide a cluster infrastructure that can be used as the basis for many different cluster products and projects, including HA failover, load leveling, parallel filesystem, HPC and Single System Image (SSI).
- Provide a membership service functional enough for each of the cluster environments with an easy way to add subsystems to the environment and a set of APIs to build cluster subsystems and cluster-aware applications.
- Modularity, so other instances of nodedown detection, i/o fencing, split-brain detection and intra-node communication can be substituted.
- Flexibility so the infrastructure can be tuned for different performance constraints and/or membership policy choices. Also flexibility in when the cluster is formed (SSI needs it formed at boot time and more loosely-coupled clusters want the ability to join and leave at more arbitrary times.
- Minimal kernel hooks that do not impact the performance of normal operations.
Project Assigned to CI projects with available versions Membership John Byrne Internode Communication John Byrne Integration of CI with DLM Aneesh Kumar Integration of CI with UML Krishna Kumar Infiniband interconnect Stan Smith Dual path interconnect with transparent failover Jaideep, En Chiang Ongoing CI projects Enforce membership via node numbers and/or IP/MAC addresses Jaideep Integration with STOMITH Jaideep Quorum integration Jaideep Split brain detection/avoidance Jaideep, Roger Tsang Convert code to be dynamically added modules Laura Ramirez CI projects not yet started Membership Internode Communication Clusters across subnets open Other interconnects open Integration of CI with Heartbeat open Integration of CI with DRBD open Integration of CI with Beowulf open Integration of CI with EVMS open
There are two major components to the Cluster Infrastructure at this time - Cluster Membership (CLMS) and Internode Communication Subsystem (ICS):Cluster Membership (CLMS) [Top]
Internode Communication Subsystem (ICS) [Top]
- Configurable to be initiated before/after root mounting.
- Co-ordinated with ICS to setup connections in the kernel before/after initialization of TCP/IP stack.
- Handles initial cluster formation and adding/losing nodes at later times
- Online adding of new nodes
- Strict maintenance of membership, even in the face or arbitrary node failures
- Set of membership APIs (libcluster), including a membership transition history guaranteed to go forward and which is synchronized/replicated amoung all nodes, even those that boot later
- Coordination of cleanup/teardown when a node goes down
- Ensures a node doesn't rejoin before everyone has finished all cleanup processing
- Master-driven membership algorithm with rapid failover
- Set of known nodes (known as candidates) which can be CLMS master (lilo extensions for now)
- Nodedown detection and API driven membership
- Architected to allow a largely separate kernel nodedown detection technology
- Architected to allow policy hooks for nodedown and nodeup decisions
- Can signal SIGCLUSTER to processes which want it on any cluster membership transition
- Optionally manages a node thru a set of states from booting to appropriate run level to shutdown or failure
- Optionally co-ordinates with init to bring the cluster as a whole to and thru designed run levels
- Optional registration system so kernel subsystems can be called for nodeup and nodedown events
- Optional key service management system (more on key services later)
- Could be easily adapted for loosely coupled clusters (non-SSI, non-shared root HA clusters)
- Run in clusters up to 30 nodes and designed to accommodate much larger clusters
- Running in a 2.4.x Linux kernel with minimal patches
- Architected as a kernel-to-kernel communication subsystem
- Designed to be able to start up connections before/after initialization of TCP/IP stack.
- Could be used in more loosely coupled cluster environments
- Works with CLMS to form a tightly coupled (membershipwise) environment where all nodes agree on the membership list and have communication with all other nodes
- There is a set of communication channels between each node, flow control is per channel
- Supports variable message size (at least 64K messages)
- Queueing of outgoing messages
- Dynamic service pool of kernel server processes
- Out-of-line data type for large chunks of data and transports that support pull or push DMA
- Priority of messages to avoid deadlock
- Incoming message queueing
- Nodedown interfaces and co-ordination with CLMS and subsystems
- Nodedown code to error out outgoing messages, flush incoming messages and kill/waitfor server processes processing messages from the node that went down
- Architected with transport independent and dependent pieces (has run with tcp/ip and ServerNet)
- Supports 3 communication paradigms:
- one way messages
- traditional RPCs, where client must synchronously wait for response
- request/response or async RPC, where requestor can choose when to wait for response
- Very simple generation language (ICSgen)
- Works with XDR/RPCgen
- Handles signal forwarding from a client node to a node providing service, to allow interruption or job control
- Operational in a Linux 2.4.x kernel with minimal patches
- The Cluster Infrastructure (CI) download is completely operational on 32-bit Intel machines (including SMP).
- We have limited testing of clusters up to 10 nodes in size and with nodes numbered up to 50
- The nodedown detection code is currently primitive and takes up to 20 seconds to detect a nodedown. Soon we expect to improve this to subsecond detection. Nodedown cleanup, reconciliation and notification after detection is measured at <100 milliseconds for 3 node clusters (larger clusters shouldn't take much longer but haven't been measured).
- ICS channel flow control / throttling isn't working yet. We have to figure out the appropriate low memory conditions to cause the triggering.
- Makefile
- Include cluster directory in make. (#ifdef CONFIG_CLUSTER)
- Documentation/Configure.help
- Documentation for Cluster features.
- arch/i386/config.in
arch/alpha/config.in
- Add Cluster features to config menu.
- arch/i386/defconfig
arch/alpha/defconfig
- Turn Clustering features on by default.
- arch/i386/kernel/entry.S
arch/alpha/kernel/entry.S
- Add sys_ssisys to system call jump table. (should rename this or use /proc to get/receive information)
- include/linux/rwsem.h
- Add arch-independent read/write trylock routines.
- include/asm-i386/rwsem.h
- Add arch-specific read/write trylock implementations.
- include/linux/rwsem-spinlock.h
- Add FASTCALLs for read/write trylocks.
- lib/rwsem-spinlock.c
- Add library implementations of read/write trylocks.
- include/asm-i386/unistd.h
include/asm-alpha/unistd.h
- #define ssisys system call number. (should rename this or use /proc to get/receive information)
- arch/i386/kernel/signal.c
arch/alpha/kernel/signal.c
- Ignore SIGCLUSTER by default.
- include/asm-i386/signal.h
include/asm-alpha/signal.h
- #define SIGMIGRATE and SIGCLUSTER signals (SIGMIGRATE actually for process migration and not needed for this project)
- kernel/signal.c
- Ignore SIGCLUSTER by default.
- Added routine do_sigtoallproc_local() to signal all local processes. Used for SIGCLUSTER notification after a membership event (#ifdef CONFIG_CLUSTER)
- kernel/timer.c
- In count_active_tasks(), don't count task if its is_kthread flag is set and task is in TASK_UNINTERRUPTIBLE sleep. (#ifdef CONFIG_CLUSTER) (should be able to do this another way and eliminate this hook)
- kernel/fork.c
- In get_pid(), allocate clusterwide pid numbers. (#ifdef CPID)
- include/linux/threads.h
- Increase PID_MAX for clusterwide pids. (#ifdef CPID)
- include/linux/sched.h
- Include icsprio member in task_struct. (#ifdef CONFIG_ICS)
- Include is_kthread flag in task_struct. (#ifdef CONFIG_CLUSTER) (should be able to eliminate these hooks)
- init/main.c
- Add hooks to process cluster parameters from command line. (#ifdef CONFIG_CLUSTER)
- Add hooks to call cluster_main_init_preroot / cluster_main_init_postroot. (#ifdef CONFIG_CLUSTER)
- include/linux/if.h
- Define IFF_ICS interface flag for ICS interfaces. (#ifdef CONFIG_ICS)
- include/linux/netdevice.h
- Add ics_flags member to net_device structure to hold ICS-related interface flags. (#ifdef CONFIG_ICS)
- net/core/dev.c
- Prevent shutdown of interfaces with IFF_ICS flag set.
- net/ipv4/tcp.c
- Add ics_recvmsg() routine. (#ifdef CONFIG_ICS)
- Reconciliation and integration with STOMITH and Split-brain code not done
- Kernel hooks could be reduced.
- No clean way for a node to leave the cluster (besides a reboot).
- There is no configuration information on who the allowed cluster members are, just who the possible Membership Masters are.
- Support for overlapping clusters has been requested.
- Support for hierarchical clusters has been requested, where the impact might be that non-core nodes wouldn't be monitored for nodedown nearly as frequently as core nodes containing resources that have to be quickly recovered.
- Have libcluster use a /proc interface instead of or in addition to the system call.
- Clean up SMP locking to not go thru an intermediate set of macros.
- Layer some of the other cluster projects, like DLM, LVS, FailSafe, GFS and SSI.
The Linux Clustering Information Center
This file last updated on Friday, 21-Sep-2007 16:17:22 UTC [an error occurred while processing this directive]