Saturday 3 November 2007

Manchester CEs and RGMA problems

Still don't know what happened to ce02 and why ops test didn't work and my jobs hanged forever while anybody else could run (atlas claims 88 percent efficiency in the last 24 hours). Anyway I updated ce02 manually (rpm -ihv) to the same set of rpms that are on ce01 and now the problem I had, globus hanging, has disappeared . The ops tests are successful again and we got out of the atlas blacklisting. I fixed also ce01 that yesterday picked up the wrong java version. I need to change a couple of things on the kickstart server so that these incidents don't happen.

Also I had to manually kill tomcat which was not responding and restart it on the MON box. Accounting published successfully after this.

No comments: