Swatch is an open-source package which reads a configuration file (/opt/local/etc/swatch/swatch.conf). After reading the config file, swatch generates a new program (which it writes to /home/tocops/.swatch_script.xxxx) and launches it. That program, in turn, launches a 'tail -f /var/log/syslog' process and parses the output, looking for the specified 'watchfor' strings and performing the associated actions.
tocops 14807 1 0 09:35 pts/7 00:00:00 /opt/local/bin/perl /opt/local/script/swatch -c /opt/local/etc/swatch/swatch.conf -t /var/log/syslog --awk-field-syntax
tocops 14808 14807 12 09:35 pts/7 00:01:06 /opt/local/bin/perl /home/tocops/.swatch_script.14807
tocops 14840 14808 0 09:35 pts/7 00:00:00 /usr/bin/tail -n 0 -f /var/log/syslog
At the Hutch, klamath hosts the loghost function (aka 'syslog'), whereas swatch runs on zodiac. zodiac mounts klamath:/var/log to /loghost/log. Thus, swatch watches /loghost/log/syslog. The TOC is tightly bound to swatch, so it runs on zodiac, too.
At the SCCA, bombard hosts the loghost function (aka 'syslog') and also runs swatch: no need for mounting a remote file system. Thus, swatch watches /var/log/syslog. The TOC is tightly bound to swatch, so it runs on bombard, too.
Swatch's configuration file is /opt/local/etc/swatch/swatch.conf. Swatch itself lives in /opt/local/script. Swatch-related programs, like ping-swatch, pong-swatch, toclogd, page_em, and mail_em, live in /home/tocops/bin
Examine swatch.conf
usage: silence-swatch {string}
Where {string} is the string which you would like swatch to ignore.
zodiac> sudo su tocops Password: zodiac> silence-swatch foozle Inserted 'ignore foozle' in swatch.conf Shutting down swatch done Starting swatch done zodiac> *** swatch version 3.2.2 (pid:26432) started at Wed Mar 26 08:22:52 PDT 2008
zodiac> silence-swatch foozle
Removed 'ignore foozle' from swatch.conf
Shutting down swatch Caught a SIGTERM -- sending a TERM signal to 14553
done
Starting swatch done
zodiac>
*** swatch version 3.2.2 (pid:27944) started at Wed Mar 26 08:44:13 PDT 2008
usage: vi /opt/local/etc/swatch/swatch.conf
cp /opt/local/etc/swatch/swatch.conf /opt/local/config/root/opt/local/etc/swatch/swatch.conf.{yourname}.2008-03-25.08:15:00
usage: /etc/init.d/swatch restart
zodiac> sudo su tocops Password: zodiac> /etc/init.d/swatch restart Shutting down swatch done Starting swatch done zodiac> *** swatch version 3.2.2 (pid:28247) started at Wed Mar 26 08:47:14 PDT 2008
usage: ping-swatch -s yes
zodiac> ./ping-swatch -s yes
Starting ./ping-swatch v1.2.0
Swatch is alive
Ending ./ping-swatch v1.2.0
zodiac> whoami
skendric
zodiac> sudo su tocops
Password:
zodiac> whoami
tocops
zodiac> /etc/init.d/swatch stop
Shutting down swatch Caught a SIGTERM -- sending a TERM signal to 10337
done
zodiac> ./ping-swatch -s yes
Starting ./ping-swatch v1.2.0
Swatch is not in the process table
Invoking /etc/init.d/swatch restart
Shutting down swatch
[FAILED]
Starting swatch
*** swatch version 3.2.2 (pid:28353) started at Wed Mar 26 08:48:33 PDT 2008
Swatch has been revived
Ending ./ping-swatch v1.2.0
zodiac>
zodiac> ./ping-swatch -s yes
Starting ./ping-swatch v1.2.0
Swatch is alive
Ending ./ping-swatch v1.2.0
zodiac>
Sometimes, syslog isn't growing. This can happen if syslog on klamath (or bombard) is dead and not accepting incoming messages. It can also happen when zodiac is unable to mount klamath:/loghost/log In that case, syslog is growing just fine, but zodiac, and thus swatch, cannot see it. ping-swatch will announce this problem by saying "{nodename} sees syslog as frozen". If you see this, your task is to figure out why: is syslog on loghost (klamath or bombard) broken? Or is zodiac's NFS-mount of klamath:/loghost/log broken? (Try running /etc/init.d/netfs restart in the latter case.)
Sometimes, Swatch is in the process table but isn't doing anything useful. When this happens, you may see only one swatch process, rather than the usual two. Or perhaps three swatch processes, rather than the usual two. In any case, if you suspect that swatch is moribund, first kill all swatch processes, then restart Swatch. I start this process by running '/etc/init.d/swatch stop' a few times and then checking the process table. If the relevant 'swatch' and 'tail' processes are gone, then i start with swatch with '/etc/init.d/swatch start'. If any of these processes remain, then I manually send them the 'kill' command. In the example below, notice how the regular 'kill' did not work; I had to resort to 'kill -9'.
zodiac> whoami skendric zodiac> sudo su tocops Password: zodiac> whoami tocops zodiac> /etc/init.d/swatch stop; /etc/init.d/swatch stop; /etc/init.d/swatch stop zodiac> ps -ef | grep swatch tocops 1548 1 99 09:07 ? 04:27:46 /opt/local/bin/perl /opt/local/s cript/swatch -c /opt/local/etc/swatch/swatch.conf -t /var/log/syslog --awk-field -syntax tocops 29932 29874 0 13:36 pts/0 00:00:00 grep swatch zodiac> kill 1548 zodiac> ps -ef | grep swatch tocops 1548 1 99 09:07 ? 04:27:55 /opt/local/bin/perl /opt/local/s cript/swatch -c /opt/local/etc/swatch/swatch.conf -t /var/log/syslog --awk-field -syntax tocops 29980 29874 0 13:36 pts/0 00:00:00 grep swatch zodiac> kill -9 1548 zodiac> ps -ef | grep swatch tocops 30012 29874 0 13:37 pts/0 00:00:00 grep swatch zodiac> ps -ef | grep tail tocops 3453 3451 0 00:14 ? 00:00:03 /usr/bin/tail -n 0 -f /var/log/s yslog skendric 25047 22919 0 04:53 pts/11 00:00:00 tail -f syslog skendric 25841 22846 0 05:02 pts/2 00:00:00 grep tail zodiac> kill 3453 zodiac> ps -ef | grep tail skendric 25047 22919 0 04:53 pts/11 00:00:00 tail -f syslog skendric 26051 22846 0 05:02 pts/2 00:00:00 grep tail zodiac> zodiac> /etc/init.d/swatch start Starting swatch done zodiac> *** swatch version 3.2.2 (pid:28594) started at Wed Mar 26 08:51:38 PDT 2008 zodiac> exit exit zodiac> whoami skendric zodiac>
usage: perldoc /opt/local/script/swatch
toclogd is a daemon running as the pseudo-user tocops. This is the program which populates the frames of the Tech Operations Console. swatch.conf contains watchfor strings with associated actions which instruct swatch to copy lines to /home/tocops/.tocpipe ... and toclogd spends its life in an endless loop, reading /home/tocops/.tocpipe, parsing what it finds there, and writing the results to various text files located in /home/tocops/logs ... which the TOC web page loads into the frames which your browser displays. If Swatch is dead, then the information the TOC displays is not longer up-to-date.
If the TOC isn't updating, check to see if Swatch itself is hung. If Swatch is functioning, then perhaps toclogd is hung. Stop and restart toclogd.
usage: /etc/init.d/toclogd restart
zodiac> sudo su tocops Password: zodiac> /etc/init.d/toclogd restart Shutting down toclogd done Starting toclogd done zodiac>