Swatch and TOC Tips

Overview

Swatch is an open-source package which reads a configuration file (/opt/local/etc/swatch/swatch.conf). After reading the config file, swatch generates a new program (which it writes to /home/tocops/.swatch_script.xxxx) and launches it. That program, in turn, launches a 'tail -f /var/log/syslog' process and parses the output, looking for the specified 'watchfor' strings and performing the associated actions.


Our environment: Where Stuff is Located

FHCRC

At the Hutch, klamath hosts the loghost function (aka 'syslog'), whereas swatch runs on zodiac. zodiac mounts klamath:/var/log to /loghost/log. Thus, swatch watches /loghost/log/syslog. The TOC is tightly bound to swatch, so it runs on zodiac, too.

SCCA

At the SCCA, bombard hosts the loghost function (aka 'syslog') and also runs swatch: no need for mounting a remote file system. Thus, swatch watches /var/log/syslog. The TOC is tightly bound to swatch, so it runs on bombard, too.

Both

Swatch's configuration file is /opt/local/etc/swatch/swatch.conf. Swatch itself lives in /opt/local/script. Swatch-related programs, like ping-swatch, pong-swatch, toclogd, page_em, and mail_em, live in /home/tocops/bin


Configuration file

Examine swatch.conf


Tell Swatch to quit paging you about something

usage:  silence-swatch {string}
Where {string} is the string which you would like swatch to ignore.

Change watchfor strings and associated actions

usage:  vi /opt/local/etc/swatch/swatch.conf

Tell Swatch to re-read its configuration file

usage:  /etc/init.d/swatch restart

Verify that Swatch is working

usage:  ping-swatch -s yes

Fix Swatch when syslog is frozen

Sometimes, syslog isn't growing. This can happen if syslog on klamath (or bombard) is dead and not accepting incoming messages. It can also happen when zodiac is unable to mount klamath:/loghost/log In that case, syslog is growing just fine, but zodiac, and thus swatch, cannot see it. ping-swatch will announce this problem by saying "{nodename} sees syslog as frozen". If you see this, your task is to figure out why: is syslog on loghost (klamath or bombard) broken? Or is zodiac's NFS-mount of klamath:/loghost/log broken? (Try running /etc/init.d/netfs restart in the latter case.)


Fix Swatch when it is hung

Sometimes, Swatch is in the process table but isn't doing anything useful. When this happens, you may see only one swatch process, rather than the usual two. Or perhaps three swatch processes, rather than the usual two. In any case, if you suspect that swatch is moribund, first kill all swatch processes, then restart Swatch. I start this process by running '/etc/init.d/swatch stop' a few times and then checking the process table. If the relevant 'swatch' and 'tail' processes are gone, then i start with swatch with '/etc/init.d/swatch start'. If any of these processes remain, then I manually send them the 'kill' command. In the example below, notice how the regular 'kill' did not work; I had to resort to 'kill -9'.


Peruse Swatch's documentation

usage:  perldoc /opt/local/script/swatch

What does toclogd do?

toclogd is a daemon running as the pseudo-user tocops. This is the program which populates the frames of the Tech Operations Console. swatch.conf contains watchfor strings with associated actions which instruct swatch to copy lines to /home/tocops/.tocpipe ... and toclogd spends its life in an endless loop, reading /home/tocops/.tocpipe, parsing what it finds there, and writing the results to various text files located in /home/tocops/logs ... which the TOC web page loads into the frames which your browser displays. If Swatch is dead, then the information the TOC displays is not longer up-to-date.


Kick toclogd

If the TOC isn't updating, check to see if Swatch itself is hung. If Swatch is functioning, then perhaps toclogd is hung. Stop and restart toclogd.

usage:  /etc/init.d/toclogd restart