[an error occurred while processing this directive]
<TITLE>Documentation for spawndaemon</TITLE>

[an error occurred while processing this directive]
<img src="images/cidocs.png" width="550" height="105">

[an error occurred while processing this directive]
          <td colspan="2"><font face="Arial, Helvetica, sans-serif" color="#CC6600" size="2"><b>CI Project</b></font></td>
        </tr>
        <tr> 
          <td width="6">&nbsp;</td>
          <td><font face="Arial, Helvetica, sans-serif" size="-2"><a href="index.shtml">Overview</a></font></td>
        </tr>
        <tr> 
          <td width="6">&nbsp;</td>
          <td><font face="Arial, Helvetica, sans-serif" size="-2"><a href="index.shtml#license">License</a></font></td>
        </tr>
        <tr> 
          <td width="6">&nbsp;</td>
          <td><font face="Arial, Helvetica, sans-serif" size="-2"><a href="index.shtml#download">Download</a></font></td>
        </tr>
        <tr> 
          <td width="6">&nbsp;</td>
          <td><font face="Arial, Helvetica, sans-serif" size="-2"><a href="index.shtml#cvs">CVS</a></font></td>
        </tr>
        <tr> 
          <td width="6">&nbsp;</td>
          <td><font face="Arial, Helvetica, sans-serif" size="-2"><a href="index.shtml#contrib">Contributed Code</a></font></td>
        </tr>
        <tr> 
          <td width="6">&nbsp;</td>
          <td><font face="Arial, Helvetica, sans-serif" size="-2"><a href="index.shtml#maillist">Mailing Lists</a></font></td>
        </tr>
        <tr> 
          <td width="6">&nbsp;</td>
          <td><font face="Arial, Helvetica, sans-serif" size="-2"><a href="index.shtml#goals">Goals</a></font></td>
        </tr>
        <tr> 
          <td width="6">&nbsp;</td>
          <td><font face="Arial, Helvetica, sans-serif" size="-2"><a href="index.shtml#projects">Project List</a></font></td>
        </tr>
        <tr> 
          <td width="6">&nbsp;</td>
          <td><font face="Arial, Helvetica, sans-serif" size="-2"><a href="index.shtml#features">Features</a></font></td>
        </tr>
        <tr> 
          <td width="6">&nbsp;</td>
          <td><font face="Arial, Helvetica, sans-serif" size="-2"><a href="index.shtml#clms-features">CLMS Features</a></font></td>
        </tr>
        <tr> 
          <td width="6">&nbsp;</td>
          <td><font face="Arial, Helvetica, sans-serif" size="-2"><a href="index.shtml#ics-features">ICS Features</a></font></td>
        </tr>
        <tr> 
          <td width="6">&nbsp;</td>
          <td><font face="Arial, Helvetica, sans-serif" size="-2"><a href="index.shtml#status">Current Status</a></font></td>
        </tr>
        <tr> 
          <td width="6">&nbsp;</td>
          <td><font face="Arial, Helvetica, sans-serif" size="-2"><a href="index.shtml#hooks">Kernel Hooks</a></font></td>
        </tr>
        <tr> 
          <td width="6">&nbsp;</td>
          <td><font face="Arial, Helvetica, sans-serif" size="-2"><a href="index.shtml#limits-enhance">Limitations / Enhancements</a></font></td>
        </tr>
        <tr> 
          <td width="6">&nbsp;</td>
          <td>&nbsp;</td>
        </tr>

        <tr> 
          <td colspan="2"><font face="Arial, Helvetica, sans-serif" color="#CC6600" size="2"><b>CI Documentation</b></font></td>
        </tr>
        <tr> 
          <td width="6">&nbsp;</td>
          <td><font face="Arial, Helvetica, sans-serif" size="-2"><a href="docs.shtml">Kernel Patch</a></font></td>
        </tr>
        <tr> 
          <td width="6">&nbsp;</td>
          <td><font face="Arial, Helvetica, sans-serif" size="-2"><a href="docs.shtml#tools">Cluster Tools</a></font></td>
        </tr>
        <tr> 
          <td width="6">&nbsp;</td>
          <td>&nbsp;</td>
        </tr>

        <tr> 

[an error occurred while processing this directive]

<A HREF="spawndaemon-pf.shtml">Printer-friendly version</A><HR>
<H1>spawndaemon(1M)</H1>
<HR>
<B>spawndaemon --
<!--Meta NM "spawndaemon"-->
user-level interface to the keepalive daemon
</B>
<!--Meta DN "user-level interface to the keepalive daemon"-->
<P>
<H2>Synopsis</H2>
<PRE>spawndaemon [ [-i cluster_wide] | [-i node_list] ]
      [-n <B><I>times</I></B>] [ [-Z last_node] | [-Z round_robin] ] 
      [-o] [-a] 
      [-c down=<B><I>exit_code</I></B>] [-c reject=<B><I>exit_code</I></B>] 
      -r <B><I>process_cfg_file</I></B>

spawndaemon [ [-i cluster_wide] | [-i node_list] ] 
      [ [-Z last_node] | [-Z round_robin] ] 
      [-o] [-a]
      -R <B><I>process_cfg_file pid</I></B> 

spawndaemon [ [-i cluster_wide] | [-i node_list] ] 
      [ [-Z last_node] | [-Z round_robin] ]
      [-o] -r <B><I>group_cfg_file</I></B>

spawndaemon [ [-i cluster_wide] | [-i node_list] ]  
      [-n <B><I>times</I></B>] 
      -F <B><I>node</I></B> [ [-B <B><I>node</I></B>] . . . ] 
      [ [-Z F_node] | [-Z last_node] | [-Z round_robin] ] 
      [-o] [-a] 
      [-c down=<B><I>exit_code</I></B>] [-c reject=<B><I>exit_code</I></B>] 
      -r <B><I>process_cfg_file</I></B> 

spawndaemon [ [-i cluster_wide] | [-i node_list] ]  
      -F <B><I>node</I></B> [ [-B <B><I>node</I></B>] . . . ] 
      [ [-Z F_node] | [-Z last_node] | [-Z round_robin] ] 
      [-o] [-a] 
      -R <B><I>process_cfg_file pid</I></B> 

spawndaemon [ [-i cluster_wide] | [-i node_list] ]
      -F <B><I>node</I></B> [ [-B <B><I>node</I></B>] . . . ] 
      [ [-Z F_node] | [-Z last_node] | [-Z round_robin] ] 
      [-o] -r <B><I>group_cfg_file</I></B>

spawndaemon [-i cluster_wide] [-n <B><I>times</I></B>] 
      -U [-o] [-a] 
      [-c down=<B><I>exit_code</I></B>]
      -r <B><I>process_cfg_file</I></B> 

spawndaemon [-i cluster_wide]
      -U [-o] [-a] 
      -R <B><I>process_cfg_file pid</I></B> 

spawndaemon [-i cluster_wide] 
      -U [-o]
      -r <B><I>group_cfg_file</I></B>

spawndaemon [-k] -x -D <B><I>full_pathname</I></B> 

spawndaemon [-k] -x -P <B><I>pid</I></B>

spawndaemon [-k] -x -S <B><I>slot</I></B>

spawndaemon [-k] [-a] -d <B><I>full_pathname</I></B> [<B><I>arg_list</I></B>]

spawndaemon [-k] -p <B><I>pid</I></B>

spawndaemon [-k] -g <B><I>group_name</I></B>

spawndaemon [-k] -s <B><I>slot</I></B>

spawndaemon -q 

spawndaemon [-k] -Q 

spawndaemon -L [-v human [processes | keepalive] ]

spawndaemon -L [-v machine [processes | keepalive] ]

spawndaemon -X

spawndaemon -z <B><I>max_processes</I></B>
</PRE>

<H2>Description</H2>
The <B>spawndaemon</B> command provides the command-line interface to the
<B>keepalive</B>(1M)
daemon, which monitors processes and daemons, and
restarts those processes/daemons when they die. <B>keepalive</B> can
monitor processes and daemons individually and in groups.
<P>
Specifically, <B>spawndaemon</B> performs the following tasks:
<UL>
<LI>Processes the <B>spawndaemon</B> command-line options and the 
associated configuration files.
<P><LI>Sends messages to the <B>keepalive</B> daemon, which then performs
all process/daemon startup and monitoring tasks.
<P><LI>Reads information from <B>keepalive</B>'s monitored process table and
displays that information for the system administrator.
</UL>
<P>
To configure a process or daemon to be monitored, perform the steps
described in the following paragraphs. For more information about
particular files and directories, see the Files 
section later in this reference manual page. 
<OL>
<LI>Create a process configuration file for the process or daemon
and store it in
the <TT>/etc/spawndaemon.d</TT> directory. For information about
configuration file syntax, owner and permission requirements, see the
Configuration Files section 
later in this reference manual page.
<P><LI>Create a script in one of the <TT>/etc/rc*.d</TT> directories, and
in that script call <B>spawndaemon</B> to register the process/daemon 
with <B>keepalive</B>. Select the appropriate <B>spawndaemon</B> 
syntax from the preceding Synopsis
section; the <B>spawndaemon</B> command references the configuration 
file created in 
the previous step. For information about <B>spawndaemon</B> options, 
see the Options 
section later in this reference manual page.   
<P><LI>Create a startup script in the <TT>/etc/keepalive.d</TT> directory. The
<B>keepalive</B> daemon calls this startup script to start the process
or daemon and (if so configured) to restart the process/daemon if it fails.
<P><LI>Create other (optional) scripts to be used by <B>keepalive</B> 
for handling your process/daemon. Examples of optional scripts include: 
restart on process/daemon failure, restart on node failure, clean up when 
the number of process/daemon failures exceeds the configured limit, and
shut down when directed manually by the <B>spawndaemon</B> command.
These scripts must also reside in the <TT>/etc/keepalive.d</TT> directory.  
<P><LI>Start the process/daemon, either by calling <B>spawndaemon</B> from
the command line of your system, or by restarting your system and
allowing the various scripts in the <TT>/etc/rc*.d</TT> directories and
the <TT>/etc/keepalive.d</TT> directory to start both <B>keepalive</B> and
your processes/daemons.
<P><LI>Verify that your process/daemon is registered correctly by using the
<B>spawndaemon -L</B> and <B>-v</B> options to read the registration
information maintained by <B>keepalive</B>.
</OL>
To configure a process/daemon group for monitoring, each group member must
be registered and have a process configuration file and startup/recovery
scripts similar to an individual process/daemon. In addition, a group
configuration file is required as described under
Configuration Files.  
<H3>Options</H3>
The <B>spawndaemon</B> command uses the following options:
<DL COMPACT>
<DT><B>-a</B><DD>When used for process/daemon registration, the <B>-a</B> option 
specifies that a process/daemon be
registered with <B>keepalive</B> by name and argument list. Including
an argument list in the registration provides a method to distinguish
between processes/daemons having the same name.
The argument list is contained in the process configuration file 
(see <B><I>arg_list</I></B> under 
Configuration Files)
designated with the <B>-r</B> or <B>-R</B> options. 
The <B>-a</B> option is intended to be used with processes that daemonize
themselves. The <B>-a</B> option cannot be used with group registrations.
<P>
When <B>-a</B> is specified, <B>keepalive</B> searches the
system process table using both the process name and <B><I>arg_list</I></B>
to find and register the process ID (PID) of the process after it has
daemonized. When <B>-a</B> is not used, only the process name is used in
the search of the process table to find the PID following daemonization.
<P>
<B><I>arg_list</I></B> does not have to contain all of the arguments
used by the
process/daemon command line(s) for startup and recovery (restart). However,
<B><I>arg_list</I></B> must contain one or more of the arguments in
the same order (with none missing in the sequence provided)
as they appear on the command line(s),
starting at the beginning of the argument list. 
<P>
When the <B>-a</B> option is used to unregister a process/daemon 
(see <B>-d</B> option) that has
been registered with an argument list, the <B><I>arg_list</I></B> 
must be included with the <B>-d</B> option in the 
<B>spawndaemon</B> command line.
<P><DT><B>-B <I>node</I></B><DD>
A single use of the <B>-B</B> option specifies the number of the (backup) node
on which the process/daemon is to be executed if the node specified with the
<B>-F</B> option is unavailable. For more information on the nodes in your 
cluster, see <B>cluster</B>(1M).
The <B>-B</B> option can only be used if the <B>-F</B> option is also 
used.
<P>
The <B>-B</B> option can be used more than once. If the node specified by 
the first use of the <B>-B</B> option is unavailable, <B>keepalive</B> 
uses the node specified by the second use of the 
<B>-B</B> option, and so on. The <B>-B</B> option can be used up to 12 
times. Refer to the <B>-Z</B> option for the available restart policies
involving the nodes specified with the <B>-F</B> and <B>-B</B> options.
<P><DT><B>-c down=<I>exit_code</I></B><DD>
This option tells <B>keepalive</B> to enable the down feature
for the process/daemon being registered. The down feature
allows a process/daemon to instruct <B>keepalive</B> to take that
process/daemon to the <B>keepalive</B> down state. 
<!--This is a <B>keepalive</B> down state and has nothing to do with down-->
<!--states used by Integrity Cluster/Application Recovery Services-->
<!--(IC/ARS).-->
<B>keepalive</B> will not restart any process/daemon in the down
state. In order to restart such a process/daemon, the <B>spawndaemon</B> 
command with the <B>-x</B> option must be used. 
<P>
When a process/daemon exits to the down state, it must return the exit code you
specify in <B><I>exit_code</I></B>. <B><I>exit_code</I></B> must be an 
integer other than zero (0). If <B><I>exit_code</I></B> is 0 (or if the
<B>-c reject</B> option has already been specified with 
the same <B><I>exit_code</I></B>),
registration fails and the process/daemon is not started. <B>keepalive</B>
communicates the down exit code to the process/daemon through the
<B>KEEPALIVE_PROCESS_DOWN</B> environment variable, the value of which
is the <B><I>exit_code</I></B> you specify in the call to <B>spawndaemon</B>. 
<B>KEEPALIVE_PROCESS_DOWN</B> is only set if spawndaemon is called with
the <B>-c down=<I>exit_code</I></B> option. See the 
Configuration Files section for information
about the <B><I>down_script</I></B> and <B><I>down_script_policy</I></B>
fields in the process configuration file used to specify and control
the execution of the script that
<B>keepalive</B> calls when the process/daemon goes to the down
state. This process/daemon down feature is not supported for group 
registrations.
<P><DT><B>-c reject=<I>exit_code</I></B><DD>
This option tells <B>keepalive</B> to enable the node-rejection
feature for the process/daemon being registered. The
node-rejection feature allows a process/daemon having a resource
problem (for example, insufficient memory) to reject the node on which
it is running. In rejecting a node, the process/daemon sends
<B><I>exit_code</I></B> to <B>keepalive</B>, which causes <B>keepalive</B> to
fail the process/daemon over to another node. <B>keepalive</B> chooses
the new node based on the node selection policy specified by the <B>-Z</B>
option. 
<P>
If a process/daemon rejects all of the nodes in the cluster,
<B>keepalive</B> clears the list of rejected nodes except for the most 
recently rejected node. However, if that node is the only available
node, <B>keepalive</B> clears it also. With the rejected node list
cleared, <B>keepalive</B> begins anew trying to move the process/daemon
to another node. The error counter for the process/daemon
is not reset. Each node rejection counts as a failure; therefore,
<B>keepalive</B> eventually takes the process/daemon to the down
state (when <B><I>max_errors</I></B> failures occur within
<B><I>probation_period</I></B> seconds as specified in the process 
configuration file).
<P>
When a process/daemon rejects a node, it returns the exit code that
you specify in <B><I>exit_code</I></B>. <B><I>exit_code</I></B> must be an
integer other than zero (0). If <B><I>exit_code</I></B> is 0 (or
if the <B>-c down</B> option has already been specified with 
the same <B><I>exit_code</I></B>),
registration fails and the process/daemon is not started. <B>keepalive</B>
communicates the node rejection exit code to the process/daemon through
the <B>KEEPALIVE_NODE_REJECT</B> environment variable, the value of which
is the <B><I>exit_code</I></B> you specify on the call to <B>spawndaemon</B>. 
<B>KEEPALIVE_NODE_REJECT</B> is only set if <B>spawndaemon</B> is called
with the <B>-c reject=<I>exit_code</I></B> option. 
<P>
If a node failure recovery script has been specified in the
process configuration file, <B>keepalive</B> runs that script
to recover from a node rejection. If no node failure recovery script
has been specified, <B>keepalive</B> runs the process failure recovery
script (if specified) or the startup script. See the Configuration
Files section for information about specifying the script that
<B>keepalive</B> calls when the process/daemon rejects a node. 
The node-rejection feature is not supported for group registrations.  
<P><DT><B>-d <I>full_pathname</I></B> [<B><I>arg_list</I></B>]<DD>
Unregisters the named process/daemon. When specifying
<B><I>full_pathname</I></B>, 
use a full path name as designated by the <B><I>full_path_to_executable</I></B> 
field in the process configuration file. When <B>-d</B> is used with 
the <B>-a</B> option, <B><I>arg_list</I></B> must be specified. 
See the <B>-a</B> option for details.
<P><DT><B>-D <I>full_pathname</I></B><DD> 
Identifies a process/daemon by name to the <B>-x</B> option. When specifying 
<B><I>full_pathname</I></B>, use a full path name as designated by the 
<B><I>full_path_to_executable</I></B> field in the process configuration file.
<P><DT><B>-F <I>node</I></B><DD>
Specifies the number of the (favored) node on which the process/daemon is to be
executed. <B>keepalive</B> pins the process/daemon on the specified node so
that the process/daemon cannot be migrated by using the
<B>load_leveld</B>(1)
utility; the pinned process/daemon ignores any migration request.  
<B>cluster</B>(1M)
can be used to get information on the nodes in your cluster.  
<P>
The nodes designated with the <B>-F</B> and <B>-B</B> options form the
set of nodes on which the process/daemon is allowed to run. The restart
policy option (<B>-Z</B>) determines how nodes are selected from this set
when the process/daemon needs to be restarted or fails to (re)start
on a selected node.  
<P><DT><B>-g <I>group_name</I></B><DD>
Unregisters the processes/daemons in the group specified by
<B><I>group_name</I></B>, where <B><I>group_name</I></B> is defined in the
group configuration file for the group.
<P><DT><B>-i cluster_wide</B> | <B>-i node_list</B><DD>
Enable idempotency enforcement on a cluster-wide or node-list basis.
The <B>-F</B> option is required and <B>-B</B> is optional when
<B>-i node_list</B> is used. Idempotency enforcement is performed at
registration time and persists after registration is completed.
Idempotency for the registered process/daemon instance continues
to be enforced whether or not the <B>-i</B> option is used
on subsequent attempts to register additional instances of the
same process/daemon. A special spawndaemon exit code of 2
(see Return Values) indicates an
idempotency violation.
<P>
When <B>cluster_wide</B> is specified, <B>keepalive</B>'s monitored
process table is searched. In the case of registering a single process/daemon,
if the process/daemon is already
registered, <B>spawndaemon</B> counts this as an idempotency violation and
does not register the new instance. In the case of registering a group,
if any member of the group is already registered for that group,
then <B>spawndaemon</B>
counts this as an idempotency violation and does not register the new
group instance. Otherwise, in both cases, registration proceeds as normal.
<P>
When <B>node_list</B> is specified, <B>keepalive</B>'s monitored
process table is searched. In the case of registering a single
process/daemon, the node list for the new
registered instance must be mutually exclusive with the existing registered
instances of the same process/daemon. If the node lists are not mutually
exclusive, <B>spawndaemon</B> counts the registration
attempt as an idempotency violation and does not register the new instance.
In the case of registering a group, the monitored process table is searched
to verify that the new group instance has a node list that is mutually
exclusive with any other group instances of the same name already registered
with <B>keepalive</B>.
If a group instance of the same name is already registered with a
node list that is not mutually exclusive with the new group instance,
<B>spawndaemon</B> counts this as an idempotency violation and does
not register the new group instance.
<P>
Group registrations, both cluster wide and node list, are only affected
by other registered instances of the same group.
When <B>spawndaemon</B> attempts to register a group of processes and
discovers that one or more processes within the group have been
previously registered as non-grouped processes, it still registers all
the processes specified in the group. If you want two groups containing
the same set of processes to run at the same time, be sure to specify
different names for the two groups.
<P>
Group registrations can prevent single process registrations. If a group
is registered and the <B>-i cluster_wide</B> option is used to attempt
to register a single process/daemon that is a
member of the group, the registration attempt is considered an idempotency
violation. If a group is
registered and the <B>-i node_list</B> option is used to attempt to
register a single process/daemon with the same node list as another
instance of the process/daemon that belongs to the group, the registration
attempt is considered an idempotency violation.
<P><DT><B>-k</B><DD>
Invokes the shutdown script (named in the process configuration
file) for the process/daemon identified by another option in the command line.
Options for identifying the process/daemon include <B>-d</B>, <B>-p</B>,
<B>-s</B>, <B>-D</B>, <B>-P</B>, and <B>-S</B>.
If no shutdown script exists for the process/daemon, a SIGTERM signal is
sent to the process/daemon.
The <B><I>termwait</I></B> option specified in the process configuration
file can be used to define a time
interval (in seconds) that <B>keepalive</B> gives the process/daemon to
shut down before sending it a
SIGKILL signal. The time interval defaults to two seconds.
<P>
<B>keepalive</B> attempts to run the shutdown script on the node where the
process/daemon was last active. If that node is down, the other nodes
in the cluster are tried to fork/exec the shutdown script until a node
is found or the list of available nodes is exhausted. As
each attempt is made, warning messages are posted to the system log.
If it is impossible to run the shutdown script, an error message is posted
to the system log and system console.
<P>
When <B>-k</B> is used with <B>-g</B> to shut down a group, shutdown scripts
are used for those group members that have them; SIGTERM is used
for those group members that do not have a shutdown script.
<P><DT><B>-L</B><DD>
Displays the state of the monitored process table in an abbreviated form.
If the <B>keepalive</B> daemon has been quiesced (see <B>-q</B> option),
a message to that effect follows the table. The <B>-v</B> (verbose) options
display/read the full details of the monitored process table. 
<P><DT><B>-n <I>times</I></B><DD>
Used with the <B>-r</B> option to specify the number of <B><I>times</I></B>
<B>keepalive</B> registers and starts a process/daemon. If the <B>-i</B>
option is also specified, the processes/daemons are started only if no copies
of the process/daemon are already running (so you can make sure that all old
copies of a process/daemon are gone before starting new ones). The <B>-n</B>
option cannot be used with group registrations.
<P><DT><B>-o</B><DD>
A process registered with the <B>-o</B> option
must not daemonize itself. This option instructs <B>keepalive</B>
to not perform daemonization recovery, thereby optimizing <B>keepalive</B>'s
performance for processes of this type. If <B>-o</B> is used for group
registration, none of the group members can be daemons.
<P><DT><B>-p <I>pid</I></B><DD>
Specifies the process ID of a process/daemon to be unregistered.
<P><DT><B>-P <I>pid</I></B><DD>
Identifies a process/daemon by process ID to the <B>-x</B> option.
<P><DT><B>-q</B><DD>
Quiesces <B>keepalive</B>; that is, prevents <B>keepalive</B> from
monitoring processes or daemons. <B>keepalive</B> continues to maintain
its internal status. <B>spawndaemon</B> is still functional while
<B>keepalive</B> is quiesced; however, monitored processes/daemons are not
restarted if they die with <B>keepalive</B> in this quiesced state.
<P>
Use the <B>-X</B> option to resume normal <B>keepalive</B> operation.

<P><DT><B>-Q</B><DD>
Shuts down the <B>keepalive</B> daemon, cleans up, and exits. As part
of the cleanup procedure, <B>-Q</B> clears <B>keepalive</B>'s monitored
process table, which means that all process/daemon registrations are
lost. This action does not affect the processes or daemons themselves.
They continue to run and are unaffected. Use the <B>-k</B> option
with <B>-Q</B> if you also want to shut down all of the monitored
processes/daemons.
<P>
To shut down <B>keepalive</B>, but leave the monitored process table intact,
send <B>keepalive</B> a SIGTERM signal so it performs a controlled exit.
<P>
For <B>keepalive</B> to remain down, <TT>/etc/inittab</TT> must be edited
to remove the <B>keepalive</B> entries. 
<P><DT><B>-r <I>process_cfg_file</I></B><DD>
Registers the process/daemon named in the process configuration file
specified in <B><I>process_cfg_file</I></B>. After the process/daemon is
registered with <B>keepalive</B>, <B>keepalive</B> starts the process or
daemon by calling its startup script. The process configuration file,
which contains the process/daemon's characteristics, must be stored
in the <TT>/etc/spawndaemon.d</TT> directory and its name must begin with
the prefix <TT>ka_</TT>. The <B>-r</B> and <B>-R</B> options are
mutually exclusive. 
<P><DT><B>-r</B> <B><I>group_cfg_file</I></B><DD>
Registers all the processes/daemons named in the group configuration file
specified by <B><I>group_cfg_file</I></B>. As soon as each process/daemon
is registered, <B>keepalive</B> starts the process/daemon
by calling that process/daemon's startup script. 
<P>
When <B>spawndaemon</B> attempts to register a group of processes/daemons and
discovers that one or more processes/daemons within the group have been
previously registered as non-grouped processes/daemons, it still registers all
the processes/daemons specified in the group. If <B>spawndaemon</B> attempts to
register a group of processes/daemons and discovers that an identically-named
group of processes/daemons has already been registered, it does not register
the new group, but exits with an error (return value of 1).
To register two or more groups containing the same set of
processes/daemons, specify different group names for the groups.
<P>
You must create the group configuration file and a
process configuration file for each group member in the
<TT>/etc/spawndaemon.d</TT> directory. For information about the required
format of the configuration files,
see Configuration Files.
<P><DT><B>-R</B> <B><I>process_cfg_file</I></B> <B><I>pid</I></B><DD>
Registers an already-running process/daemon with <B>keepalive</B>. You must
provide the name of the process/daemon to be registered in the
process configuration file specified by <B><I>process_cfg_file</I></B>. The
configuration file must be stored in the <TT>/etc/spawndaemon.d</TT>
directory. For information about configuration file naming and syntax
requirements, and required access control permissions, see
Configuration Files.  
<P>
You must specify the process ID of the running process/daemon in
<B><I>pid</I></B>. If
the specified process/daemon with that PID is already registered,
<B>spawndaemon</B> logs an error and no new registration takes place.  
<P><DT><B>-s <I>slot</I></B><DD>
Specifies the slot number (in the monitored process table)
of a process/daemon to be unregistered. Process/daemon slot numbers
can be read with the
<B>-v human processes</B> and <B>-v machine processes</B> options.
<P><DT><B>-S <I>slot</I></B><DD>
Identifies a process/daemon by monitored process table slot number
to the <B>-x</B> option. Process/daemon slot numbers can be read with the
<B>-v human processes</B> and <B>-v machine processes</B> options.
<P><DT><B>-U</B><DD>
Directs <B>keepalive</B> to spawn the process/daemon without pinning it to a
particular node. Unpinned processes/daemons can be migrated from one node to
another whenever they are sent a migration request, such as when
the <B>load_leveld</B>(1) utility
is active. If <B>-U</B> is not used, the process/daemon being registered
is pinned to a node when spawned (see <B>-Z</B> option).
<P>
The default migration handler performs the migration; however,
the default handler does not migrate a process/daemon's kernel objects (file
descriptors and so on) to the new node. Those objects remain on the
original node. If the original node fails, the migrated process/daemon can no
longer access its kernel objects, which can lead to unpredictable
behavior from the process/daemon. To preserve your process/daemon's kernel objects
during migration, implement your own migration handler, and
have that handler destroy and rebuild kernel objects (such as file
descriptors) during migration.
<P>
To run multiple instances of the same process/daemon, you must be
consistent in how you use the <B>-U</B>
option. You cannot mix pinned and unpinned instances with the same name;
<B>keepalive</B> requires such consistency in order to perform daemonization
recovery on an instance that has failed. When the <B>-o</B> option is used
to register an instance, this consistency restriction does not apply
(processes registered with <B>-o</B> do not daemonize). 
<P><DT><B>-v human</B> [<B>processes</B> | <B>keepalive</B>]<DD>
When used with the <B>-L</B> option, displays the full contents of
<B>keepalive</B>'s monitored process table in a
human-readable format. The following fields are displayed for each
process/daemon registered with <B>keepalive</B> when the <B>processes</B>
option is specified:
<P>
<TABLE BORDER="BORDER" WIDTH="90%">
<TR>
<TD><CENTER><B>Monitored Process Table Process/Daemon Fields</B></CENTER></TD>
<TD><CENTER><B>Field Description</B></CENTER></TD>
</TR>

<TR>
<TD>state</TD>
<TD>The keepalive state for the process/daemon. See
Process/Daemon States for a description of each state.</TD>
</TR>

<TR><TD>pid
<TD>The current process identification number (PID) of the process/daemon.

<TR><TD>node number
<TD>The number of the node on which the process/daemon is running.</TR>

<TR><TD>full path to process
<TD>The complete path to the process/daemon.</TR>

<TR>
<TD>argument list</TD>
<TD>The arguments (if any) used by keepalive to distinguish this process/daemon
from others by the same name.</TD>
</TR>

<TR><TD>child of keepalive
<TD>Set TRUE if the process is a child of keepalive; otherwise
(such as when daemonization recovery is underway), set FALSE.</TR>

<TR><TD>daemonization recovery
<TD>Set TRUE if the process has daemonized itself; otherwise, set FALSE.

<TR>
<TD>pinned</TD>
<TD>Indicates whether the process/daemon has been designated to run on one or
more specific nodes in the cluster. If set TRUE, the process/daemon is pinned
to one or more nodes. If set FALSE, the process/daemon can
float (migrate) among the nodes in the cluster.</TD></TR>
<TR>
<TD>lastexeced</TD>
<TD>Indicates when the process/daemon was last started.</TD></TR>
<TR>
<TD>process first died</TD>
<TD>Indicates the first time the process/daemon stopped before being
restarted.</TD>
</TR>
<TR>
<TD>process last died</TD>
<TD>Indicates the last time the process/daemon stopped before being
restarted</TD>
</TR>
<TR>
<TD>min. respawn</TD>
<TD>Specifies the number of seconds the process/daemon must run before it
is eligible for restarting.</TD>
</TR>
<TR>
<TD>num. errors</TD>
<TD>The number of errors (such as process/daemon failures) that have
occurred during the current probation period, which starts when the
process/daemon fails and the error count is set to one (1). The error
count includes process/daemon failures, node rejections by the
process/daemon (see <B>-c reject</B> option), and node failures.</TD>
</TR>
<TR>
<TD>total errors</TD>
<TD>Specifies the total number of errors since the process/daemon was
first started. See num_errors for the events included in the error count.</TD>
</TR>
<TR>
<TD>max. errors during probation</TD>
<TD>Specifies the maximum number of errors (process/daemon failures)
allowed before keepalive will no longer respawn the process/daemon
(leaving the process/daemon in the down state). The maximum number
of errors must occur during the specified probation period in order
for the process/daemon to be left in the down state.</TD>
</TR>
<TR>
<TD>probation period</TD>
<TD>Specifies the time, in seconds, during which the number of errors specified
by max_errors_during_probation must occur in order for the process/daemon
to be taken to the down state.</TD>
</TR>
<TR>
<TD>registration policy</TD>
<TD>One of the following methods by which the process/daemon is registered:
Name, meaning <B>keepalive</B> looks for the process/daemon by
name (the <B>-a</B> and <B>-o</B> options were not used to register the
process/daemon); Argument List, meaning <B>keepalive</B> looks for the
process/daemon by name and argument list (the <B>-a</B> option was used);
PID, meaning <B>keepalive</B> looks for the process/daemon by process ID (<B>-o</B>
option was used).  
</TR>
<TR>
<TD>node selection policy</TD>
<TD>Identifies the node selection policy as specified by the <B>-Z</B>
option.</TD>
</TR>
<TR>
<TD>favored node</TD>
<TD>The node on which the process/daemon is executed, as specified
by the <B>-F</B> option. If no node is specified, this value is
None.</TD>

<TR><TD>backup nodes
<TD>Specifies the nodes on which the process/daemon is executed if the
favored node is unavailable. If no nodes are specified, this value is None.

<TR><TD>rejected nodes
<TD>A list of nodes that the process/daemon has rejected or the
<B>keepalive</B> node selection policy has rejected.

<TR><TD>termwait
<TD>The time interval (in seconds) that keepalive gives the
process/daemon to shut down before sending it a SIGKILL signal.</TD>
</TR>
<TR>
<TD>euid</TD>
<TD>The user identification number of the process/daemon.</TD>
</TR>
<TR>
<TD>egid</TD>
<TD>The group identification number of the process/daemon.</TD>
</TR>
<TR>
<TD>startup script</TD>
<TD>The name of the script that <B>keepalive</B> executes when it starts the
process/daemon.</TD>

<TR><TD>shutdown script
<TD>The name of the script that <B>keepalive</B> executes when it shuts down the
process/daemon.

<TR>
<TD>process failure recovery script</TD>
<TD>The name of the script that <B>keepalive</B> executes when it restarts the
process/daemon after it fails.</TD>
</TR>
<TR>
<TD>node failure recovery script</TD>
<TD>The name of the script that <B>keepalive</B> executes when it restarts a
process/daemon whose node has failed.</TD>
</TR>
<TR>
<TD>down script</TD>
<TD>The name of the script that <B>keepalive</B> executes when a process/daemon
enters the down state.</TD>
</TR>
<TR>
<TD>group</TD>
<TD>The name of the registration group to which the process/daemon
belongs. If the process/daemon does not belong to a group, None is displayed.</TD>
</TR>
<TR>
<TD>critical group process</TD>
<TD>Set to TRUE or FALSE to indicate whether or not the process/daemon is
critical to its group. Set to N/A if the process/daemon is not a member of
a group.</TD>
</TR>
<TR>
<TD>reject node exit code</TD>
<TD>Exit code specified by <B>-c reject</B> option. Set to None if the
process/daemon was not registered with the <B>-c reject</B> option.</TD>
</TR>
<TR>
<TD>down exit code</TD>
<TD>Exit code specified by <B>-c down</B> option. Set to None if the
process/daemon was not registered with the <B>-c down</B> option.</TD>

<TR>
<TD>exit status returned
<TD>If a process/daemon is not running, this is the exit code associated
with the most recent failure/exit. If a process/daemon is running, None is
reported.

<TR><TD>last pid
<TD>The PID of the last failed process/daemon in this slot.

<TR><TD>slot
<TD>The slot number of the process/daemon in this table.
</TABLE>
<P>
The following fields are displayed when the <B>keepalive</B>
option is specified:
<P>
<TABLE BORDER="BORDER" WIDTH="90%">
<TR>
<TD><CENTER><B>Monitored Process Table Keepalive Fields</B></CENTER></TD>
<TD><CENTER><B>Field Description</B></CENTER></TD>
</TR>

<TR><TD>running</TD>
<TD>Set TRUE if <B>keepalive</B> is running; otherwise, set FALSE.</TD></TR>

<TR><TD>quiesce flag
<TD>Set TRUE if <B>keepalive</B> is currently quiesced with the <B>-q</B>
option; otherwise, set FALSE.</TR>

<TR><TD>pid
<TD>The process identification number for <B>keepalive</B>.</TR>

<TR><TD>node number
<TD>The number of the node on which <B>keepalive</B> is running.</TR>

<TR><TD>registered processes
<TD>The total number of processes/daemons currently registered with
<B>keepalive</B>, which equates to the number of entries in the
monitored process table.</TR>

<TR><TD>table size
<TD>The current number of slots in the monitored process table. Each registered
process/daemon uses one slot in the table. "table size" is less than or
equal to "max. possible processes"; if less than "max. possible processes,"
it is because <B>keepalive</B> has not allocated memory for the maximum
number of slots.</TR>

<TR><TD>max. possible processes
<TD>The maximum number of processes/daemons that can be registered, which
equates to the maximum size of the monitored process table. This is a minimum
of 200 and can be increased with the <B>-z</B> option.</TR>

<TR><TD>polling
<TD>Set TRUE or FALSE, indicating whether <B>keepalive</B> detects a
process/daemon failure by polling rather than via child process adoption
(that is, on receipt of the SIGCHLD signal).</TR>

<TR><TD>polling interval
<TD>The time in seconds that <B>keepalive</B> uses as a polling interval.
This can be controlled with the <B>-t</B> option of
<B>keepalive</B>(1M).</TR>

<TR><TD>primary node
<TD>The primary node on which the <B>keepalive</B> process is executed.
This can be controlled with the <B>-P</B> option of
<B>keepalive</B>(1M).</TR>

<TR><TD>secondary node
<TD>The secondary node on which the <B>keepalive</B> process is executed
when the primary node is unavailable.
This can be controlled with the <B>-P</B> option of
<B>keepalive</B>(1M).</TR>
</TABLE>
<P><DT><B>-v machine</B> [<B>processes</B> | <B>keepalive</B>]<DD>
Displays the full contents of <B>keepalive</B>'s monitored process table in a
format that is parsible by shell utilities or by a C program. The
record for each process/daemon is listed on a single line and is
terminated with a newline character. Each field in the record appears in the
format <TT>FIELD="VALUE"</TT> or
<TT>FIELD=VALUE</TT>. Each field/value pair is terminated by
a semi-colon (;). <TT>VALUE</TT> is enclosed within double quotes when
<B>keepalive</B> provides string values (as when the semi-colon
delimiter appears in the value of a field and the field is a string).
In fields that contain lists of values, a comma separates each
<TT>VALUE</TT> or <TT>"VALUE"</TT>. The following field/value pairs are
returned for each process/daemon when the <B>processes</B> option is specified:
<P>
<TABLE BORDER="BORDER" WIDTH="90%">
<TR>
<TD><CENTER><B>Field No.</B></CENTER></TD>
<TD><CENTER><B>Field="Value" or Field=Value</B></CENTER></TD>
</TR>

<TR><TD>1
<TD>state="&lt;current state of the process/daemon&gt;";

<TR><TD>2
<TD>pid="&lt;pid of process/daemon&gt;" or "None";

<TR><TD>3
<TD>node_number="&lt;number of node running process/daemon&gt;" or "None";

<TR><TD>4
<TD>full_path_to_process="&lt;full pathname for process/daemon&gt;";

<TR><TD>5
<TD>arg_list="&lt;list of arguments&gt;" or "" (null string);

<TR><TD>6
<TD>child_of_keepalive=TRUE or FALSE;

<TR><TD>7
<TD>daemonization_recovery=TRUE or FALSE;

<TR><TD>8
<TD>pinned=TRUE or FALSE;

<TR><TD>9
<TD>lastexeced="&lt;time process/daemon was last exec'ed&gt;";

<TR><TD>10
<TD>process_first_died="&lt;time process/daemon first died&gt;" or "Never";

<TR><TD>11
<TD>process_last_died="&lt;time process/daemon last died&gt;" or "Never";

<TR><TD>12
<TD>minrespawn=&lt;minimum time allowed between respawns&gt;;

<TR><TD>13
<TD>num_errors=&lt;number of process/daemon failures within current
probation_period&gt;;

<TR><TD>14
<TD>total_errors=&lt;total number of failures for this process/daemon&gt;;

<TR><TD>15
<TD>max_errors_during_probation=&lt;maximum number of failures allowed within
probation_period&gt;;

<TR><TD>16
<TD>probation_period=&lt;probation period in seconds&gt;;

<TR><TD>17
<TD>registration_policy="&lt;method by which process/daemon is registered&gt;";

<TR><TD>18
<TD>node_selection_policy="&lt;node selection policy&gt;";

<TR><TD>19
<TD>favored_node="&lt;node designated with <B>-F</B> option&gt;" or "None";

<TR><TD>20
<TD>backup_nodes="&lt;node designated with <B>-B</B> option&gt;,
&lt;node designated with <B>-B</B> option&gt;, ..." or "None";

<TR><TD>21
<TD>rejected_nodes="&lt;rejected node&gt;, &lt;rejected node&gt;, ..." or "None";

<TR><TD>22
<TD>termwait=&lt;time allowed (in seconds) to shut down before sending
SIGKILL&gt;;

<TR><TD>23
<TD>euid=&lt;user ID of process/daemon&gt;;

<TR><TD>24
<TD>egid=&lt;group ID of process/daemon&gt;;

<TR><TD>25
<TD>startup_script="&lt;pathname of startup script&gt;";

<TR><TD>26
<TD>shutdown_script="&lt;pathname of shutdown script&gt;" or "None";

<TR><TD>27
<TD>process_failure_recovery_script="&lt;pathname of process failure recovery
script&gt;" or "None";

<TR><TD>28
<TD>node_failure_recovery_script="&lt;pathname of node failure recovery script&gt;"
or "None";

<TR><TD>29
<TD>down_script="&lt;pathname of down script&gt;" or "None";

<TR><TD>30
<TD>group="&lt;group name&gt;" or "None";

<TR><TD>31
<TD>critical_group_process="TRUE" or "FALSE" (if a group member) or "N/A";

<TR><TD>32
<TD>reject_exit_code="&lt;exit code used to reject host node&gt;" or "None";

<TR><TD>33
<TD>down_exit_code="&lt;exit code used to take process/daemon down&gt;" or "None";

<TR><TD>34
<TD>exit_status_returned="&lt;exit code if process/daemon not running&gt;"
or "None";

<TR><TD>35
<TD>last_pid="&lt;pid of last failed process/daemon&gt;" or "None";

<TR><TD>36
<TD>slot=&lt;monitored process table slot number of this process/daemon&gt;;

</TABLE>
<P>
Refer to the <B>-v human</B> option for a description of the fields for
each process/daemon.
The following field/value pairs are returned when the <B>keepalive</B>
option is specified:
<P>
<TABLE BORDER="BORDER" WIDTH="90%">
<TR>
<TD><CENTER><B>Field No.</B></CENTER></TD>
<TD><CENTER><B>Field="Value" or Field=Value</B></CENTER></TD>
</TR>

<TR><TD>1
<TD>running=TRUE or FALSE;

<TR><TD>2
<TD>quiesce_flag=TRUE if keepalive quiesced, FALSE otherwise;

<TR><TD>3
<TD>pid="&lt;pid of <B>keepalive</B>&gt;" or "None";

<TR><TD>4
<TD>node_number="&lt;number of node running <B>keepalive</B>&gt;" or "None";

<TR><TD>5
<TD>registered_processes=&lt;current number of registered processes&gt;;

<TR><TD>6
<TD>table_size=&lt;current size of monitored process table&gt;;

<TR><TD>7
<TD>max_possible_processes=&lt;maximum number of processes/daemons that
can be registered (maximum size of monitored process table)&gt;;

<TR><TD>8
<TD>polling=TRUE if keepalive detects process/daemon failure via polling,
FALSE if SIGCHLD is used to detect failure;

<TR><TD>9
<TD>polling_interval=&lt;keepalive polling interval&gt;;

<TR><TD>10
<TD>primary_node="&lt;primary node on which <B>keepalive</B> is executed&gt;" or "None";

<TR><TD>11
<TD>secondary_node="&lt;secondary node on which <B>keepalive</B> is executed&gt;" or "None";
</TABLE>
<P>
Refer to the <B>-v human</B> option for a description of the fields for
<B>keepalive</B>.
<P><DT><B>-x</B><DD>
If a process/daemon is down for any reason (for example, because it has
reached its maximum allowed error count within the specified probation period),
its error (failure) count is cleared and the process/daemon is restarted.
For information about <B><I>max_errors</I></B> and
<B><I>probation_period</I></B>, see the 
Configuration Files
section of this reference manual page.
<P>
If a process/daemon is running, the <B>-x</B> option clears the error count
so it appears the process/daemon has not failed. If <B>-k</B> is used
with <B>-x</B>, the process/daemon is shut down and restarted, which can
be used to restart a process/daemon that appears hung.
<P><DT><B>-X</B><DD>
Resumes normal <B>keepalive</B> operation. Call <B>spawndaemon</B> with
this option after you quiesce <B>keepalive</B> with the <B>-q</B> option. 
<P><DT><B>-z</B> <B><I>max_processes</I></B><DD>
Increases the size of the monitored process table (the maximum number of
processes/daemons that <B>keepalive</B> can monitor) to the value
specified in <B><I>max_processes</I></B>. By default, <B>keepalive</B>
can monitor 200 processes/daemons; this option can only be used to
increase the maximum size of the table.
<P><DT><B>-Z</B> <B>F_node</B> | <B>round_robin</B> | <B>last_node</B><DD>
Specifies a node selection policy for the process/daemon.
A node selection policy only has 
meaning if the spawned process/daemon is to be pinned to a node; therefore,
do not use this option if the spawned process/daemon is not to be pinned
(see <B>-U</B> option). If the <B>-Z</B> and <B>-U</B> options are
not used to register individual processes/daemons, the node selection policy
defaults to <B>round_robin</B>. For group registrations, the node selection policy
defaults to <B>last_node</B>.
<P>
If you specify <B>F_node</B>, the node specified with the <B>-F</B> option
is used first for the restart attempt. If this favored node is not available,
the node(s) designated with the <B>-B</B> option are tried next. If none
of the designated nodes are available, the restart attempt is delayed
until one of the nodes becomes available.
Use of <B>F_node</B> is not valid unless the <B>-F</B> option is also used. 
<P>
If you specify <B>round_robin</B> without using the <B>-F</B> and <B>-B</B>
options, <B>keepalive</B> picks the first
available node in the cluster on which to restart the process/daemon,
unless the last restart attempt on that node failed, in which
case <B>keepalive</B> picks the next available node. <B>keepalive</B> maintains
a list of visited nodes; when all nodes
in the list have been visited, the list is cleared and reused until the
process/daemon reaches <B><I>max_errors</I></B> within the
<B><I>probation_period</I></B>. Specify <B>round_robin</B> if you do not
care which node runs your process/daemon. Note however, that the <B>-F</B> and
<B>-B</B> options can be used with round robin to designate the
nodes used in the restart attempts. The <B>round_robin</B> node selection policy
offers the highest level of availability since the node on 
which the process/daemon most recently failed/exited is avoided.  
<P>
If you specify <B>last_node</B>, <B>keepalive</B> first tries to restart 
the process on the last node on which it was running before trying to 
restart it on any other node. If the restart attempt fails, the nodes
designated with the <B>-F</B> and <B>-B</B> options are used on subsequent
restart attempts in the order specified on the <B>spawndaemon</B> command line.
If the <B>-F</B> option is used without the <B>-B</B> option, the restart
attempts are repeated on the <B>-F</B> node as long as it is available. If
no <B>-F</B> (or <B>-B</B>) option is used, the restart is attempted on
an available node in a round robin fashion. The <B>last_node</B> restart
policy is useful in those situations where resources (such as a shared 
memory segment) may persist across process/daemon failures.
</DL>
<H2>Return Values</H2>
The <B>spawndaemon</B> command returns the following values:
<UL>
<LI>0 - The operation has completed successfully.</LI>
<LI>1 - A fatal error has occurred.</LI>
<LI>2 - An idempotency violation has occurred.</LI>
</UL>
 
<H2>Files</H2>
<DL COMPACT>
<DT><TT>/dev/keepalivecfg</TT><DD> 
Named pipe that <B>keepalive</B> uses for receiving commands from
the <B>spawndaemon</B> utility. 
<P><DT>/<TT>etc/keepalive.d</TT><DD>
Directory where you store the scripts for managing monitored processes/daemons.
For each monitored process/daemon, you <I>must</I>
provide a startup script.  
<P>
Optionally, you can provide additional scripts for <B>keepalive</B> to
call when other events occur, such as process/daemon failure, node
failure, or when the process/daemon goes to the <B>keepalive</B> down
state (meaning, for example, that the monitored process/daemon has produced
more errors than allowed within its specified probation period and,
therefore, is not restarted by <B>keepalive</B>). For more
information about optional scripts, see the 
Configuration Files
section later in this reference manual page.
<P>
The user and group ID must be <TT>root</TT> for all scripts.
In addition, access control permission
for each script must be set to <TT>rwx r-x r-x</TT> (755). 
<P><DT>/<TT>etc/keepalive.d/keepalive.data</TT><DD>
<B>keepalive</B> stores the state of all monitored processes and
daemons in this memory-mapped data file referred to as the monitored
process table. If you remove
<TT>/etc/keepalive.d/keepalive.data</TT>, the <B>keepalive</B> daemon
should be shut down with the <B>spawndaemon -Q</B> option.
When no longer able to communicate with
<B>keepalive</B>, <B>spawndaemon</B> displays a message indicating that
it cannot find <B>keepalive</B>.    
<P><DT><TT>/etc/rc*.d</TT><DD>
Set of directories where you register processes/daemons with <B>keepalive</B>
and store the start and stop scripts for system processes/daemons. 
<P><DT><TT>/etc/spawndaemon.d</TT><DD>
Directory where you store configuration files for each monitored
process/daemon. The name of all process and group configuration files
must begin with the prefix <TT>ka_</TT>. The following section
describes configuration files in detail.
</DL> 
<H3>Configuration Files</H3>
All configuration files must reside in the <TT>/etc/spawndaemon.d</TT>
directory and their names must begin with the prefix <TT>ka_</TT>. The user
and group ID must
be <TT>root</TT> for all configuration files. In addition,
access control permission for each configuration file must be set to
<TT>rw- r-- r--</TT> (644). 
<P>
The <B>spawndaemon</B> command uses different configuration
file formats depending on whether an individual process/daemon or a group of
processes/daemons is to be registered. However, all members of a group must
have an individual process configuration file as well as being identified
as a group member in a group configuration file.

<H4>Process Configuration Files</H4>
Each process configuration file for registering a single process/daemon
must be formatted as follows:
<P>
[<B><I>group_name</I></B>]:<B><I>full_path_to_executable</I></B>:[<B><I>arg_list</I></B>]:[<B><I>termwait</I></B>]:<B><I>uid</I></B>:<B><I>gid</I></B>:[<B><I>max_errors</I></B>]:<BR>
[<B><I>probation_period</I></B>]:[<B><I>minrespawn</I></B>]:<B><I>startup_script</I></B>:[<B><I>shutdown_script</I></B>]:<BR>
[<B><I>process_failure_recovery_script</I></B>]:[<B><I>node_failure_recovery_script</I></B>]:[<B><I>down_script</I></B>]:[<B><I>down_script_policy</I></B>]
<P>
Each field must be separated by a colon; all fields must be listed on the
same line. 
<P>
<B><I>group_name</I></B> specifies the name of the <B>keepalive</B>
group. This field is required for a process/daemon that is a member
of a group and must be left blank if the process/daemon is
not a member of a group. The <B><I>group_name</I></B> specified here
must be identical to the group's <B><I>group_name</I></B> field in the group
configuration file. The value you specify for <B><I>group_name</I></B>
cannot exceed 16 characters in length.
<P>
<B><I>full_path_to_executable</I></B> specifies the full pathname of the 
process/daemon to be monitored. 
<P>
<B><I>arg_list</I></B> is supplied when the <B>-a</B> option is used for
registration. <B><I>arg_list</I></B> is all (or a subset of) the arguments
used to distinguish this instance of the process/daemon from others of the same
name. <B><I>arg_list</I></B> specifies all (or some) of the arguments used
on the process/daemon command line(s) in the startup and recovery (restart)
scripts, in the same order (with none missing in the sequence provided)
as they appear on the command line(s), starting at the beginning of the
argument list.
<P>
<B><I>termwait</I></B> is the time interval (in seconds) that <B>keepalive</B>
gives the process/daemon to shut down
before sending it a SIGKILL signal. The time interval defaults to
two (2) seconds.
<P>
<B><I>uid</I></B> and <B><I>gid</I></B> specify the name of the user ID
and group ID, respectively, which <B>keepalive</B> uses to spawn the registered
process/daemon.
<B>keepalive</B> calls 
<B>setuid</B>(2)
with <B><I>uid</I></B> and 
<B>setgid</B>(2)
with <B><I>gid</I></B>, respectively, when it forks the monitored
process/daemon.
<P>
<B><I>max_errors</I></B> specifies the maximum number of errors allowed for a
process/daemon before the <B>keepalive</B> daemon stops respawning
it (thereby leaving the process/daemon in a down state). The number of
errors must occur within the number of seconds specified by the
<B><I>probation_period</I></B> field. Specifying zero (0) for either
<B><I>max_errors</I></B> or <B><I>probation_period</I></B> causes
<B>keepalive</B> to
restart the process/daemon an infinite number of times. The default
value for <B><I>max_errors</I></B> is 10 errors. The default value for
<B><I>probation_period</I></B> is 300 seconds. A registered
process/daemon's first error triggers its probation-period timer. If
that process/daemon has fewer than <B><I>max_errors</I></B> errors occur
within its probation period, the period expires. <B>keepalive</B>
resets the process/daemon's error count to one (1) and the probation-period
timer on the next failure.  
<P>
<B><I>minrespawn</I></B> is the number of seconds that is considered the 
minimum respawn time. The minrespawn timer starts when the process/daemon
is spawned. If the process/daemon exits in less than the
respawn time, <B>keepalive</B> does not restart the process/daemon
until the respawn
time elapses. The default for <B><I>minrespawn</I></B> is zero (0) seconds. 
<P>
<B><I>startup_script</I></B> is the name of the script that <B>keepalive</B>
runs to start the process/daemon. However, <B>keepalive</B> also
executes this script in the following situations: the process/daemon
fails and no <B><I>process_failure_recovery_script</I></B> has been
specified in the configuration file; a node failure occurs and neither a
<B><I>process_failure_recovery_script</I></B> nor a
<B><I>node_failure_recovery_script</I></B> has been
specified in the configuration file. <B><I>startup_script</I></B>
is the only required script.
The script must reside in the <TT>/etc/keepalive.d</TT> directory.
<P>
<B><I>shutdown_script</I></B> is the name of an optional script that
<B>keepalive</B> executes when the administrator terminates the
process/daemon by running <B>spawndaemon</B> with the <B>-k</B> option. The
shutdown script must perform all steps necessary for a controlled
shut down. To support the shutdown script, <B>keepalive</B> sets
$KEEPALIVE_ACTIVE_PID to the PID of the process/daemon for which the
shutdown has been requested. If no shutdown script is specified,
<B>keepalive</B> issues a SIGTERM signal, which the process/daemon
must handle. If the process/daemon still exists after the
<B><I>termwait</I></B> interval, keepalive sends the process/daemon
a SIGKILL signal. The
shutdown script must reside in the <TT>/etc/keepalive.d</TT> directory. 
<P>
<B><I>process_failure_recovery_script</I></B> is the name of an optional
script that <B>keepalive</B> executes if the process/daemon fails.
<B>keepalive</B> also runs this script if the host node for the
process/daemon fails, but no
<B><I>node_failure_recovery_script</I></B> has been specified in the
process configuration file. The process_failure_recovery_script
must reside in the <TT>/etc/keepalive.d</TT> directory. 
<P>
<B><I>node_failure_recovery_script</I></B> is the name of an optional script
that <B>keepalive</B> executes in the event the host node for the
process/daemon fails. The node failure recovery script must reside
in the <TT>/etc/keepalive.d</TT> directory.
<P>
<B><I>down_script</I></B> is the name of the script that <B>keepalive</B> runs
if the monitored process/daemon goes to the <B>keepalive</B> down
state. Processes/daemons enter the down state when they fail more times
than allowed within their specified probation period or they take
themselves down with a down exit code (see <B>-c down</B> option). A down script
should return the failed process/daemon's resources and perform any
other cleanup-related tasks. <B>keepalive</B> attempts to run the down
script on the same node on which the fail process/daemon was last
running. However, if that node is down, <B>keepalive</B> tries the
other nodes in turn until it exhausts the list of available nodes
(logging each failure in the system log). To support the down script,
<B>keepalive</B> sets
$KEEPALIVE_LAST_PID to <B>keepalive</B>'s last known PID for the process/daemon
before it went to the down state. The <B><I>down_script</I></B> script
must reside in the <TT>/etc/keepalive.d</TT> directory.
<P>
<B><I>down_script_policy</I></B> specifies whether <B>keepalive</B> should run
the down script if the node on which the monitored process/daemon
is running fails (thereby causing the process/daemon to go to the
<B>keepalive</B> down state). A value of zero (0) for
<B><I>down_script_policy</I></B>
means that <B>keepalive</B> does not run the down script. A value of
one (1), the default, indicates that <B>keepalive</B> runs the down
script.
<P>
See Examples for an example of a
process configuration file.

<H4>Group Configuration Files</H4>
A group configuration file must be formatted as follows:
<P>
<B>&lt;keepalive_group&gt;</B>:<B><I>group_name</I></B><BR>
<B><I>member_file</I></B>:[<B><I>wait_time</I></B>]:[<B><I>critical</I></B>]<BR>
<B><I>member_file</I></B>:[<B><I>wait_time</I></B>]:[<B><I>critical</I></B>]<BR>
...<BR> 
where <B>&lt;keepalive_group&gt;</B> and <B><I>member_file</I></B> each
start a new line.
<P>
The first item in the first line of the group configuration
file must be the string <B>&lt;keepalive_group&gt;</B> (the <B><</B> and
<B>></B> symbols are required).
<P>
The <B><I>group_name</I></B>, which is limited to no more than 16
characters in length, specifies the name you want to assign the
process/daemon group. The string you specify for <B><I>group_name</I></B>
must match the <B><I>group_name</I></B> string found in the
process configuration files for each of the group members.   
<P>
Each line following the first line applies to a process/daemon that belongs
to the group.
<P>
<B><I>member_file</I></B> specifies the name of a process
configuration file for the group member.
If you specify a group configuration file as a <B><I>member_file</I></B>,
<B>spawndaemon</B> aborts the registration of the group.
<P>
If specified, <B><I>wait_time</I></B> is the number of seconds that
<B>keepalive</B> delays before starting the next process/daemon in
the group. When specified, <B><I>wait_time</I></B> must be a non-negative
integer. If you do not specify <B><I>wait_time</I></B>, <B>spawndaemon</B>
accepts the default value of 0 (zero seconds), meaning that there is
to be no delay. Specifying a delay for the last process/daemon in
the list has no effect.
<P>
<B><I>critical</I></B> is the criticality of the process/daemon within the
<B>keepalive</B> group. Valid criticalities are zero (0) and one (1). A
criticality of 0 (the default) indicates that <B>keepalive</B>
should respawn only this process/daemon if it 
terminates. A value of 1 indicates that <B>keepalive</B> should
respawn the entire group if this process/daemon terminates.
 
<H3>Examples</H3>
The content of the process configuration file for
<B>cron</B>(1M)
found at <TT>/etc/spawndaemon.d/ka_cron</TT> is as follows: 
<P>
<TT>:/usr/sbin/cron:::root:sys::::cron_startup::cron_restart:::</TT>
<P>
This line indicates that <B>cron</B>:
<UL>
<LI>Is not part of a group,
<LI>Has a full pathname of <TT>/usr/sbin/cron</TT>,
<LI>Uses <TT>root</TT> for the uid and <TT>sys</TT> for the gid,
<LI>Has both a startup_script and a process_failure_recovery_script,
<LI>Has no argument list, shutdown_script,
node_failure_recovery_script, or down_script,
<LI>Takes the default values for termwait, max_errors,
probation_period, and minrespawn.
</UL>
For examples of startup scripts and other types of scripts, refer to the
files located in <TT>/etc/keepalive.d</TT> on your cluster.

<H2>Process/Daemon States</H2>
Processes/daemons monitored by keepalive exist in one of the following states
at all times:
<DL>
<DT><B>start</B>
<DD>The process/daemon is being started and goes to the <B>ok</B> state if the
start is successful. If the start fails on all available members of the
node set, the process/daemon goes to the
<B>dead</B> state. If no members of the node set are available,
the process/daemon remains in the <B>start</B> state.
<DT><B>ok</B>
<DD>The process/daemon is running. A process/daemon in this state goes
to the <B>daemonize</B> state if it has failed and has been registered
without the <B>-o</B> option, to <B>dead</B> if it has failed
and has been registered with the <B>-o</B> option or has rejected
the node, to <B>shutdown</B> if spawndaemon is used
to shutdown the process/daemon or if a critical group member has failed
and it is a member of that group, or to <B>down</B> if it is a member of
a group that includes a process/daemon that has gone to <B>down</B> or if
it exits with a down exit code and the down feature (see <B>-c down</B> option)
has been enabled. 
<DT><B>dead</B>
<DD>The process/daemon has failed and is not running. It goes to the
<B>respawn</B> state if
its max_errors have not been exceeded within the probation_period;
if max_errors have been exceeded, it goes to <B>down</B>.
<DT><B>down</B>
<DD>The process/daemon is not running. It has exceeded its max_errors
within the probation_period or is part of a group that includes
a member that has gone down for exceeding its max_errors.
The <B>-x</B> option is used to go to the <B>respawn</B> state.
<DT><B>respawn</B>
<DD>The process/daemon is being restarted. It goes to the <B>ok</B> state
if the restart is successful or to <B>dead</B> if not successful on any of
the available nodes in the node set. If no members of the node set
are available, the process/daemon remains in the <B>respawn</B> state.
<DT><B>shutdown</B>
<DD>The process/daemon is not running and has been shut down from the
<B>ok</B> state (see <B>-k</B> option). The process/daemon either goes to the
<B>respawn</B> state (if <B>-x</B> used with <B>-k</B>) or is
unregistered (see <B>-d</B>, <B>-p</B>, <B>-s</B>, and <B>-Q</B> options). 
<DT><B>daemonize</B>
<DD>The process/daemon has failed and may have daemonized itself,
so keepalive runs the
daemonization recovery algorithm. If daemonization recovery succeeds,
the process/daemon goes to the <B>ok</B> state; if it fails, the process/daemon
goes to the <B>dead</B> state.
</DL>
  
<H2>References</H2>
<B>load_leveld</B>(1),
<B>migrate</B>(1),
<B>node_self</B>(1),
<B>onnode</B>(1),
<B>cluster</B>(1M),
<B>init</B>(1M),
<B>keepalive</B>(1M),
<B>syslogd</B>(1M),
<B>setuid</B>(2) 
<B>setgid</B>(2)
<B>signal</B>(3bsd),
<B>syslog</B>(3G)
<!-- NAVBEGIN -->
<HR>
<I>
<SMALL>
15 August 2001
<BR>
Copyright 2001 Compaq Computer Corporation
<BR>
Cluster-Tools Version 0.5.8
</SMALL>
</I>


    </blockquote></td>
  </tr>
</table>
<br>
<a href="http://sourceforge.net"><img src="http://sourceforge.net/sflogo.php?group_id=32543&type=5" width="210" height="62" border="0" alt="SourceForge Logo"></a><p>

<a href="http://opensource.hp.com/">Opensource.hp.com</a><p>

<a href="http://www.hp.com/wwsolutions/linux/">HP Linux solutions</a><p>

<a href="http://lcic.org">The Linux Clustering Information Center</a><p>
<font face="Arial, Helvetica, sans-serif" size="1">This file last updated on</font> 
<font size="-5" face="Arial, Helvetica, sans-serif">

Tuesday, 14-May-2002 09:35:07 UTC
[an error occurred while processing this directive]