Finding out black holes is always a pain... However the pbs accounting records can be of help. A simple script that counts the number of jobs a node swallows makes some difference:
http://www.sysadmin.hep.ac.uk/svn/fabric-management/torque/jobs/black-holes-finder.sh
I post it just in case other people are interested.
An example of the output:
# black-holes-finder.sh
Using accounting file 20081127
[...]
bohr5029: 1330
bohr5030: 1803
clearly the two nodes above have a problem.
No comments:
Post a Comment