Credit: Nicola Larosa
When we have a number of computers connected by a TCP/IP network, we are often interested in monitoring their working state. The pair of programs presented in Example 10-1 and Example 10-2 help you detect when a computer stops working, while having minimal impact on network traffic and requiring very little setup. Note that this does not monitor the working state of single, specific services running on a machine, just that of the TCP/IP stack and the underlying operating system and hardware components.
PyHeartBeat
is made up of two files:
PyHBClient.py
sends UDP packets, while
PyHBServer.py
listens for such packets and detects inactive clients. The client
program, running on any number of computers, periodically sends an
UDP packet to the server program that runs on one central computer.
In the server program, one thread dynamically builds and updates a
dictionary that stores the IP numbers of the client computers and the
timestamp of the last packet received from each. At the same time,
the main thread of the server program periodically checks the
dictionary, noting whether any of the timestamps is older than a
defined timeout.
In this kind of application, there is no need to use reliable TCP connections, since the loss of a packet now and then does not produce false alarms, given that the server-checking timeout is kept suitably larger than the client-sending period. On the other hand, if we have hundreds of computers to monitor, it is best to keep the bandwidth used and the load on the server at a minimum. We do this by periodically sending a small UDP packet, instead of setting up a relatively expensive TCP connection per client.
The packets are sent from each client every 10 seconds, while the server checks the dictionary every 30 seconds, and its timeout defaults to the same interval. These parameters, along with the server IP number and port used, can be adapted to one’s needs.
Also note that the debug printouts can be turned off by using the
-O
option of the Python interpreter, as that
option sets the _ _debug_ _
variable to
0
. However, some would consider this usage
overcute and prefer a more straightforward and obvious approach: have
the scripts accept either a -q
flag (to keep the
script quiet, with verbosity as the default) or a
-v
flag (to make it verbose, with quiet as the
default). The getopt
standard module makes it easy
for a Python script to accept optional flags of this kind.
Example 10-1 shows the
PyHBClient.py
heartbeat client program, which
should be run on every computer on the network, while Example 10-2 shows the heartbeat server program,
PyHBServer.py
, which should be run on the server
computer only.
Example 10-1. PyHeartBeat client
""" PyHeartBeat client: sends an UDP packet to a given server every 10 seconds. Adjust the constant parameters as needed, or call as: PyHBClient.py serverip [udpport] """ from socket import socket, AF_INET, SOCK_DGRAM from time import time, ctime, sleep import sys SERVERIP = '127.0.0.1' # local host, just for testing HBPORT = 43278 # an arbitrary UDP port BEATWAIT = 10 # number of seconds between heartbeats if len(sys.argv)>1: SERVERIP=sys.argv[1] if len(sys.argv)>2: HBPORT=sys.argv[2] hbsocket = socket(AF_INET, SOCK_DGRAM) print "PyHeartBeat client sending to IP %s , port %d"%(SERVERIP, HBPORT) print "\n*** Press Ctrl-C to terminate ***\n" while 1: hbsocket.sendto('Thump!', (SERVERIP, HBPORT)) if _ _debug_ _: print "Time: %s" % ctime(time( )) sleep(BEATWAIT)
Example 10-2. PyHeartBeat server
""" PyHeartBeat server: receives and tracks UDP packets from all clients. While the BeatLog thread logs each UDP packet in a dictionary, the main thread periodically scans the dictionary and prints the IP addresses of the clients that sent at least one packet during the run, but have not sent any packet since a time longer than the definition of the timeout. Adjust the constant parameters as needed, or call as: PyHBServer.py [timeout [udpport]] """ HBPORT = 43278 CHECKWAIT = 30 from socket import socket, gethostbyname, AF_INET, SOCK_DGRAM from threading import Lock, Thread, Event from time import time, ctime, sleep import sys class BeatDict: "Manage heartbeat dictionary" def _ _init_ _(self): self.beatDict = {} if _ _debug_ _: self.beatDict['127.0.0.1'] = time( ) self.dictLock = Lock( ) def _ _repr_ _(self): list = '' self.dictLock.acquire( ) for key in self.beatDict.keys( ): list = "%sIP address: %s - Last time: %s\n" % ( list, key, ctime(self.beatDict[key])) self.dictLock.release( ) return list def update(self, entry): "Create or update a dictionary entry" self.dictLock.acquire( ) self.beatDict[entry] = time( ) self.dictLock.release( ) def extractSilent(self, howPast): "Returns a list of entries older than howPast" silent = [] when = time( ) - howPast self.dictLock.acquire( ) for key in self.beatDict.keys( ): if self.beatDict[key] < when: silent.append(key) self.dictLock.release( ) return silent class BeatRec(Thread): "Receive UDP packets, log them in heartbeat dictionary" def _ _init_ _(self, goOnEvent, updateDictFunc, port): Thread._ _init_ _(self) self.goOnEvent = goOnEvent self.updateDictFunc = updateDictFunc self.port = port self.recSocket = socket(AF_INET, SOCK_DGRAM) self.recSocket.bind(('', port)) def _ _repr_ _(self): return "Heartbeat Server on port: %d\n" % self.port def run(self): while self.goOnEvent.isSet( ): if _ _debug_ _: print "Waiting to receive..." data, addr = self.recSocket.recvfrom(6) if _ _debug_ _: print "Received packet from " + `addr` self.updateDictFunc(addr[0]) def main( ): "Listen to the heartbeats and detect inactive clients" global HBPORT, CHECKWAIT if len(sys.argv)>1: HBPORT=sys.argv[1] if len(sys.argv)>2: CHECKWAIT=sys.argv[2] beatRecGoOnEvent = Event( ) beatRecGoOnEvent.set( ) beatDictObject = BeatDict( ) beatRecThread = BeatRec(beatRecGoOnEvent, beatDictObject.update, HBPORT) if _ _debug_ _: print beatRecThread beatRecThread.start( ) print "PyHeartBeat server listening on port %d" % HBPORT print "\n*** Press Ctrl-C to stop ***\n" while 1: try: if _ _debug_ _: print "Beat Dictionary" print `beatDictObject` silent = beatDictObject.extractSilent(CHECKWAIT) if silent: print "Silent clients" print `silent` sleep(CHECKWAIT) except KeyboardInterrupt: print "Exiting." beatRecGoOnEvent.clear( ) beatRecThread.join( ) if _ _name_ _ == '_ _main_ _': main( )
Documentation for the standard library modules
socket
, threading
, and
time
in the Library Reference; Jeff Bauer has a related program using UDP for
logging information known as Mr. Creosote (http://starship.python.net/crew/jbauer/creosote/);
UDP is described in UNIX Network Programming, Volume 1: Networking APIs - Sockets and XTI, Second Edition, by W.
Richard Stevens (Prentice-Hall, 1998); for the truly curious, the UDP
protocol is described in the two-page RFC 768 (http://www.ietf.org/rfc/rfc768.txt), which,
when compared with current RFCs, shows how much the Internet
infrastructure has evolved in 20 years.
Get Python Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.