Yesterday the top BDII stopped working rather than the site BDII. It crashed. The pid file was still there but the process was not running.
So I adjusted the script to use a different query that works on all levels of bdii (resource, site, top) looking for o=infosys rather than o=grid and some specific attribute.
I also looked at the bdii startup script and it does a good job at cleaning up processes and lock/pid files in the stop function so I just use service bdii restart whether the process is there or not only the alert remains different in the two cases.
New version is still in
http://www.sysadmin.hep.ac.uk/svn/fabric-management/processes/monitoring/testbdii.sh
No comments:
Post a Comment