keepalive(1M)

keepalive -- monitor and respawn processes and daemons

Synopsis

keepalive [-i] [-t interval] [-P node] [-S node]

Description

The keepalive daemon monitors processes and daemons that are registered with it using the spawndaemon(1M) utility. When a registered process or daemon fails, keepalive logs the event using syslog(3G), and normally executes a restart script to restart the process/daemon. For a complete description of the process monitoring and restart features available with keepalive, refer to spawndaemon(1M).

keepalive is more versatile than init(1M) for respawning processes. Advantages of using keepalive include:

Provides a command-line interface (spawndaemon) with many process/daemon restart options.
Monitors real-time processes; it is completely signal driven.
Monitor daemons.
Restarts a process/daemon on any node in the cluster with a user-specified node selection policy.

keepalive, which is started by init(1M), uses a memory-mapped data file, /etc/keepalive.d/keepalive.data (referred to as the monitored process table), to track the state of processes/daemons it is monitoring. Restart keepalive in any script invoked by init(1M) or shutdown(1M). To kill keepalive permanently, you must execute spawndaemon with the -Q option and edit /etc/inittab to remove the keepalive entries.

keepalive uses syslog to log messages. The ident string is keepalive, and the facility string is LOG_DAEMON. This information is provided for customizing syslogd(1M) configuration.

The keepalive daemon uses standard shell scripts to restart processes/daemons that have terminated. These scripts can have any file name, but must be stored in the /etc/keepalive.d directory. Their group and user IDs must be root and their permissions set to 0755 to disallow write access by others.

If the executable for the process/daemon resides in a remote system, you must have root access to start or restart the process. The remote file system must be shared or exported with root permissions enabled. Root permissions must also be enabled on the mount point, the automount table, or the /etc/vfstab file.

Any process/daemon started by keepalive has its stdout and stderr redirected to a logging file called /var/log/keepalive/daemon_basename.process_id. You should call fflush(3S) on stdout and stderr to force data to disk. stdin for the process/daemon is redirected to / (root), causing any attempts to read from stdin to fail.

Files under /var/log/keepalive are not allowed to accumulate. Unregistering a process/daemon causes its logging file to be deleted. If a process/daemon is restarted, the logging file of the previous instance of the process/daemon is deleted. If keepalive is started with the -i option or is shutdown, all logging files not in use are deleted.

If you remove the memory-mapped data file (monitored process table) belonging to the keepalive daemon, you should also shut down the keepalive daemon with the spawndaemon -Q command. To shut down keepalive, but leave the monitored process table intact, send the keepalive daemon a SIGTERM signal so it performs a controlled exit.

Options

The keepalive command uses the following options and arguments:

-i: Removes the memory-mapped data file, /etc/keepalive.d/keepalive.data, before starting the keepalive daemon. Using the -i option causes the keepalive daemon to start up with a clean monitored process table, such that it is not monitoring any processes.
-t interval: Defines the time in seconds that keepalive uses as a polling interval in rare cases where keepalive must use polling (such as when a call to fork(2) fails due to a node resource problem). The default value is 5. Note that if a fork fails on a node, keepalive tries other nodes in the node set for the process/daemon being (re)started. See spawndaemon(1M) for details.
-P node: Specifies the primary node on which the keepalive process is executed. The keepalive process is pinned on the specified node and cannot be migrated by using the load_leveld(1) utility.
-S node: Specifies the secondary node on which the keepalive process is executed when the primary node is unavailable. The keepalive process is pinned on the specified node and cannot be migrated by using the load_leveld(1) utility.

If one of the specified nodes is down or invalid, keepalive logs a warning and continues execution on another available node.

A keepalive process is always pinned on the node on which it is running, regardless of whether or not the user specifies a node. If a primary or secondary node is not specified on the command line, the system chooses any available node.

Files

/dev/keepalivecfg: Named pipe for receiving commands
/etc/keepalive.d: Directory for process/daemon restart scripts
/etc/keepalive.d/keepalive.data: The keepalive memory-mapped data file (monitored process table) for tracking the state of processes/daemons being monitored.
/var/log/keepalive: Directory containing files into which any process/daemon started by keepalive has its stdout and stderr redirected.

References

fflush(3S), init(1M), load_leveld(1), spawndaemon(1M), syslogd(1M), syslog(3G), vfstab(4)

15 August 2001
Copyright 2001 Compaq Computer Corporation
Cluster-Tools Version 0.5.8