Documentation | Source


NodeWatch

Documentation

Source


NodeWatch

Introduction

NodeWatch is an open source TCP/IP network monitoring tool written in Perl for UNIX.  It will watch, i.e. poll, a set of network nodes and react to node connectivity changes by making entries to the syslog and executing user defined commands.  NodeWatch was written from the perspective of a network manager; it only keeps track of the node's ability to respond to ICMP echo request datagrams with ICMP echo reply datagrams.

Key Features

  • On-call group: the operator can define one group as 'special', escalating unusual events to this group and keeping this group informed via crier pages
  • Crier Pages: at intervals (typically, twice/daily), NodeWatch announces to the 'special' group the current status: typically "all is well" or "nodes are down", followed by a list of nodes which are not answering pings
  • Dampening: down/up status changes occur only after n successive missed pings or n successive hit pings
  • Partitioning: when appropriately flagged devices go down, NodeWatch suppresses actions for all but critical devices, effectively suppressing notification of devices whose connectivity depend on the now dead node. This capability is enabled trivially via the configuration file.
  • Redundancy: when devices are named according to a particular naming convention, NodeWatch can determine which devices compromise a redundant set and behave differently (enter partition mode or escalate to the 'special' group) when all members of a redundant set have gone down.
  • Scheduled downtime: this supports scheduled maintenance. Scheduled downtime can be periodic, due to NodeWatch's use of the Perl Time::Period module. If nodes go down during a such a window, NodeWatch will notify for this event at the end of the window.

Obtaining the Software

NodeWatch is available via HTTP. It requires Perl, version 5.6.1 or later. QuickPage makes a superb adjunct to NodeWatch's capabilities.

Configuring NodeWatch

See the documentation page for a detailed discussion of how NodeWatch works and how to configure it. Default values are kept in the daemon itself.  There is a configuration file, and if NodeWatch can't find it, then the defaults are used.  Otherwise the values in the configuration file are used.  Beyond the configuration file, there are three other files requiring configuration: the node database, the period database, and the action database.  There is a description in the manual and hints in each database.

Operating NodeWatch

Starting NodeWatch is simple.  Simply run it.  It is a daemon, and doesn't recognize any command line arguments.  It can receive signals to get it to produce some statistics.  It will automatically recognize changes in configuration files, so there is no need to send it a signal to reload its configuration data.

Companion Software

There are several tools included in the distribution: nodewatch_up and n_status_watch.  nodewatch_up makes sure that NodeWatch is running.  n_status_watch monitors the syslog to make sure NodeWatch status messages are getting through.

Support

The foremost support is the manual page, comments in the configuration files, and the daemon itself.  Also, the current maintainer, Stuart Kendrick, sbk {insert 'at' sign here} skendric {insert '.' here} com, will respond to queries, time permitting.

Feedback

Comments or questions regarding NodeWatch are welcome.  Please direct feedback to the maintainer.

Authors

Ron Hood wrote the first version of NodeWatch in 1994; the core design and principles remain his creation. In 1996, Patrick Ryan rewrote NodeWatch from scratch, adding many features. In the fall of 2000, Stuart Kendrick adopted the role of maintainer

Reference

monitoring Lists monitors Notification Software


Prepared by:
Stuart Kendrick

Last modified: 19-Sep-2005