Friday 3 April 2009

Fixed MPI installation

Few months ago we installed MPI using glite packages and YAIM.

http://northgrid-tech.blogspot.com/2008/11/mpi-enabled.html

We never really tested it though until now. We have found few problems with YAIM:

YAIM creates an mpirun script that assumes ./ is in the path so the job was landing on WN but mpirun couldn't find the user script/executable. I corrected it prepending `pwd`/ in front of the script arguments at the end of the sript so it runs `pwd`/$@ instead of $@. I added this using yaim post functionality.

The if else statement that if used to build MPIEXEC_PATH is written in a contorted way and needs to be corrected. For example:

1) MPI_MPIEXEC_PATH is used in the if but YAIM doesn't write it in any system file that sets the env variable like grid-env.sh where the other MPI_* variable are set.

2) In the else statement there is an hardcoded path which atcually is chosen splitting the mpiexec executable MPI_MPICH_MPIEXEC points to from its directory.

3) YAIM doesn't rewrite mpirun once it's written so the hardcoded path can't be changed reconfiguring the node without manually deleting mpirun before. This make difficult to update or correct mistakes.

4) The existence of MPIEXEC_PATH is not checked and it should.

Anyway eventually we managed to run mpi jobs and we reported to the new TMB MPI working group what we have done because another site was experiencing the same problems. Hopefully they will correct these problems. Special thanks go to Chris Glasman who hunted down the inital problem with the path and patiently tested the changes we applied.

No comments: