[CI] problem when I removed the network cable from one node
Wed, 08 Aug 2001 18:23:38 -0700
What you've done is simulate a Split-Brain scenario. Unfortunately,
CI currently doesn't have any Split-Brain avoidance code. After you
disconnected Node 2's cable, Node 2 thought Node 1 went down and
therefore failed over to become the CLMS master. Node 1, on the other
hand, thought Node 2 went down and processed a Nodedown event for it.
At this point, you have two CLMS masters (also known as Split-Brain).
When network connectivity is re-established, each node probes the other
and realizes their view of the CLMS master is different. Without
split-brain avoidance code, the algorithm currently favors the lower
numbered node as the CLMS master. Therefore, the panic you're seeing on
Node 2 is the correct behavior. The original Unixware Non-Stop Clusters
code had support for split-brain avoidance, however this code has not
yet been ported. If anyone is interested in tackling this as a side
project, I'd be happy to send you some of the code.
Aneesh Kumar wrote:
> Hi ,
> Today something 'strange' happened. I was actually
> writing some code that will inform me about the adding
> and removal of nodes in the cluster. To test the same
> i ran the binary on one machine . It showed both the
> node up . Now i removed the network cable from node 2.
> My monitoring program which was running on node one
> showed node 2 has gone. Fine happy . But then this
> 'strange' thing happened. For node 2 it is node one
> that is gone. So it became the root node by itself.
> Now when i try to put the network cable back node 2
> gave me a kernel panic !!!!!!!!!!!!!. I know it is an
> expected behaviour. But then how will we take care of
> network failur like the above. If in a cluster some
> one remove the cable of one of the machine, what will
> happen ?
> Do You Yahoo!?
> Get your free @yahoo.co.uk address at http://mail.yahoo.co.uk
> or your free @yahoo.ie address at http://mail.yahoo.ie
> ci-linux-devel mailing list