Listing: takeover
#!/bin/bash
OTHER="brain"
PUBLIC="208.201.239.37"
PAUSE=3
PATH=/bin:/usr/bin:/sbin:/usr/sbin:/usr/local/sbin
MISSED=0
while true; do
if ! ping -c 1 -w 1 $OTHER > /dev/null; then
((MISSED++))
else
if [ $MISSED -gt 2 ]; then
ifconfig eth0:$OTHER down
fi
MISSED=0
fi;
if [ $MISSED -eq 2 ]; then
ifconfig eth0:$OTHER $PUBLIC
#
# ...but see discussion below...
#
fi
sleep $PAUSE;
done
Naturally, set OTHER to "pinky" and
PUBLIC to "208.201.239.36" on the
copy that runs on Brain.
Let's suppose that Brain suddenly stops responding
on 208.201.239.17 (say a network tech accidentally pulled the wrong
plug when working on the rack). After missing 3 pings in a row, Pinky
will leap into action, bringing up eth0:brain up as 208.201.239.37,
the public IP that Brain is supposed to be serving. It will then
continue to watch Brain's real IP address, and
relinquish control when it is back online. The ping -c 1 -w
1 means "send one ping packet, and time
out after one second, no matter what happens." ping
will return non-zero if the packet didn't come back
in the one second time limit.
But this isn't quite
the entire solution. Although Pinky is now answering for Brain, any
machines on the same network as the two servers (notably, the router
just upstream at your ISP) will have the wrong MAC address cached for
208.201.239.37. With the wrong MAC address cached, no traffic will
flow to Pinky, since it will only respond to packets that bear its
own MAC address. How can we tell all of the machines on the
208.201.239.0 network that the MAC address for 208.201.239.37 has
been updated?
One way is to use the
send_arp
utility from the High Availability Linux
project. This very handy (and tiny) utility will craft an ARP packet
to your specifications and send it to a MAC address of your choice on
the local network. If we specify all ones (i.e.,
ff:ff:ff:ff:ff:ff) for the destination, then it
effectively becomes a broadcast ARP packet. While most routers
won't update their ARP tables when they see
unrequested ARP broadcasts, such a packet will signal them to resend
an ARP request, to which Pinky will obligingly reply. The advantage
of using broadcast is that it will signal all machines on the subnet
simultaneously, instead of having to track all of the MAC addresses
of machines that need updating.
The syntax of send_arp is send_arp
[Source IP] [Source MAC] [Target IP] [Target MAC]. For
example, our simple monitoring script above should run the following
when it detects that Brain is down:
send_arp 208.201.239.37 00:11:22:aa:bb:cc 208.201.239.37 fffffffffff
(Where 00:11:22:aa:bb:cc is the hardware MAC
address of Pinky's eth0.) The script can continue to
watch to watch when Brain's real IP address
(208.201.239.17) becomes available. When it does, we can bring
eth0:brain back down and let Brain worry about updating the ARP cache
again (which it should be set to do on boot).
There are a number of improvements that could be made to this
technique. For one thing, just because 208.201.239.17 is up
doesn't guarantee that 208.201.239.37 is also
available. Also, ping isn't the best test for
service availability (a better test might be to actually request a
web page from the other machine and make sure that it has a closing
</html> tag).
These improvements are left as an exercise to you, dear reader. Every
site is different, so you'll need to find the
technique that works best with the tools that you at hand. After all,
that's exactly what a hack is,
isn't it?