Wednesday, 29 April 2009

NFS bug in older SL5 kernel

As mentioned previously ( http://northgrid-tech.blogspot.com/2009/03/replaced-nfs-servers.html ) we have recently upgraded our NFS servers and they now run on SL5. Shortly after going into production all LHCb jobs stalled at Manchester and we were blacklisted by the VO.

We were advised that it may be a lockd error, and asked to use the following python code to diagnose this:

-------------------------------------------------------------------
import fcntl
fp = open("lock-test.txt", "a")
fcntl.lockf(fp.fileno(), fcntl.LOCK_EX|fcntl.LOCK_NB)
-------------------------------------------------------------------


The code did not give any errors and we therefore discounted this as the problem. Wind the clock on a fortnight (including a week's holiday over Easter) and we still have not found the problem so I tried the above code again, and bingo lockd was the problem. A quick search of the SL mailing list pointed me to this kernel bug
https://bugzilla.redhat.com/show_bug.cgi?id=459083

A quick update of the kernel and reboot and the problem was fixed.

Friday, 3 April 2009

Fixed MPI installation

Few months ago we installed MPI using glite packages and YAIM.

http://northgrid-tech.blogspot.com/2008/11/mpi-enabled.html

We never really tested it though until now. We have found few problems with YAIM:

YAIM creates an mpirun script that assumes ./ is in the path so the job was landing on WN but mpirun couldn't find the user script/executable. I corrected it prepending `pwd`/ in front of the script arguments at the end of the sript so it runs `pwd`/$@ instead of $@. I added this using yaim post functionality.

The if else statement that if used to build MPIEXEC_PATH is written in a contorted way and needs to be corrected. For example:

1) MPI_MPIEXEC_PATH is used in the if but YAIM doesn't write it in any system file that sets the env variable like grid-env.sh where the other MPI_* variable are set.

2) In the else statement there is an hardcoded path which atcually is chosen splitting the mpiexec executable MPI_MPICH_MPIEXEC points to from its directory.

3) YAIM doesn't rewrite mpirun once it's written so the hardcoded path can't be changed reconfiguring the node without manually deleting mpirun before. This make difficult to update or correct mistakes.

4) The existence of MPIEXEC_PATH is not checked and it should.

Anyway eventually we managed to run mpi jobs and we reported to the new TMB MPI working group what we have done because another site was experiencing the same problems. Hopefully they will correct these problems. Special thanks go to Chris Glasman who hunted down the inital problem with the path and patiently tested the changes we applied.