[CI] problem when I removed the network cable from one node

David Teigland teigland@sistina.com
Mon, 13 Aug 2001 17:51:30 -0500

On Mon, Aug 13, 2001 at 11:34:16AM -0700, Bruce Walker wrote:
> Dave,
>   Perhaps I think of STOMITH in an expanded way but I see it as an
> SBA tool.  Considering just the 2-node case, Split-brain can occur
> for the following reasons:
>   a. on startup the two nodes don't see each other and assume the 
>     other is dead and set themselves up;
>   b. one node is up as a cluster;  the other boots but does not
>     see the first node and sets itself up as the cluster;
>   c. both nodes are up and talking and communication ceases so
>     each thinks the other is down and "runs" the cluster.

> I think we could/should have a longer discussion on how to use the
> disk subsystem or disk pathways for cluster membership but for now
> my point is that for many configurations you can avoid split-brain
> by either shooting the other node or preventing him from accessing
> a critical resource (eg. disk).  An appropriate application of
> STOMITH can do this.  Used as a membership tool, how do you like
> the following algorithm for case "c" above:
>    a. having noticed we can't talk to the other node, try an
> 	alternate communication path.
>    b. if the alternate fails of there is none, try a serial cable
> 	just to see if the other guy is alive;  if he is, decide which
> 	of you is going to continue the cluster and which is going to
> 	reboot/halt.
>    c. still can't talk to him? shoot him or take away his disks.  
> 	Obviously if both nodes do this at exactly the same time they
> 	might both die so staggering helps;  my assumption here is
> 	that taking away someone disks will cause them to notice and
> 	die.

Ok, I'm learning the difference between SBA and Quorum algorithms.  Quorum
would solve this problem with majority of expected votes.  It makes sense to
use STOMITH like this within the SBA system.

> My assumption is that you can't release any locks the down node had on the
> resource until you are sure the node is really down (no more i/o's going to
> happen);  then you have to "recover" the resource (replay the log) before you
> release so other nodes can't do conflicting operations on the filesystem
> while the log is being replayed.  Am I correct?

What happens with locks is clearly important but not at the core of why STOMITH
is required.  In general, locks from anything but the recovery process will be
blocked during the entire recovery sequence (in a CI/DLM configuration).

STOMITH must precede fs recovery (log replay) simply to be certain that no i/o
will arrive at the disk from the victim once recovery has started.  Before
stomith returns, it even waits an extra few seconds to be sure any final
output on the wire (possibly sent right before the stomith) has arrived at
the disk.

> > In an arrangement where a software layer can be programmed on the side of
> > the shared resource, STOMITH simplifies to a situation where the software
> > in front of the shared resource blocks access from a STOMITH victim.  This
> > is so simple that it's often not even explicitly pointed out in systems
> > which do it.

> Can't we consider that this software is participating in the membership
> descision or at least the membership enforcement.  With the assumption that
> a node will die or at least not act in an split-brain fashion when
> i/o fenced away from the shared resource, I see this STOMITH variant as
> an important SBA tool.

Ok, but this only applies to the case where you have a machine in front of the
storage, like GNBD.  When using GNBD for the shared storage, the GNBD server is
told the STOMITH victim and the server then rejects any i/o operations from
that machine.  GNBD storage will probably be relatively uncommon for GFS use.

> > So, a cluster where STOMITH happens to be a part of a recovery step still
> > requires some sort of SBA or Quorum in the cluster manager.
> I think there is synergy to allow STOMITH to be part of SBA.  Do you
> still disagree?

I don't think we disagreed; I just didn't understand that STOMITH would also
be important in the SBA system.  

I was trying to separate:

1. the necessity of a cluster manager having an SBA or Quorum capability 
   (which may or may not use STOMITH itself) 

2. the specific recovery proceedure of gfs which must involve STOMITH
   (if the cluster manager is doing stomith first, that should be sufficient
    for subsequent gfs recovery)

Dave Teigland  <teigland@sistina.com>