Tuesday, 27 July 2010

Moving APEL to SL5

We have moved APEL from the SL4 MON box to the SL5 version that works standalone without RGMA (finally!). The site BDII has also been transferred on this machine from the MON box. This is how it is set up.

*) Request a certificate for the machine if you don't have one already.
*) Kickstart a machine vanilla SL5, two raid1 disks.
*) Install mysql-server-5.0.77-3.el5 (it's in the SL5 repository)
*) Remove /var/lib/mysql and recreated it empty (you can skip this but I messed around with it earlier and needed a clean dir).
*) Start mysqld

service mysqld start

It will tell you at this point to create the root password.

/usr/bin/mysqladmin -u root password 'pwd-in-site-info.def''
/usr/bin/mysqladmin -u root -h <machine-fqdn> password 'pwd-in-site-info.def'


*) Install the certificate (we have it directly in cfengine).
*) Setup the yum repositories if your configuration tool doesn't do it already

cd /etc/yum.repos.d/
wget http://grid-deployment.web.cern.ch/grid-deployment/glite/repos/3.2/glite-APEL.repo


*) Install glite-APEL

yum install glite-APEL

*) Run yaim: it sets up the database and most of all the ACL, if you have more than one CE to publish you need to run it for each CE changing the name of the CE in site-info.def or if you are skilled with SQL you need to set the permissions for each CE to have write access.

/opt/glite/yaim/bin/yaim -s /opt/glite/yaim/etc/site-info.def -c -n glite-APEL

*) BUG: APEL still uses JAVA. Anytime it is run it creates a JAVA key store with all the CAs and host certificate added to it. It might happen that on your machine you get the OS JAVA version and the one you install (normally 1.6). The tool used to create the keystore file is called by a script without setting the path so if you have both versions of the command it is likely that the OS one is called because it resides in /usr/bin. Useless to say the OS version is older and doesn't have all the options used in the APEL script. There are a number of ways to fix this I modified the script to insert absolute path, you can change the link target in /usr/bin or you can add a modified path to the apel cron job. The culprit script is this:

/opt/glite/share/glite-apel-publisher/scripts/key_trust_store_maker.sh

and belongs to

glite-apel-publisher-2.0.12-7.rpm

The problem is known and apparently a fix is in certification. My ticket is here

https://gus.fzk.de/ws/ticket_info.php?ticket=60452

*) Register the machine in GOCDB making sure you tick glite-APEL and not APEL to mark it as a service.

*) BUG: UK host certificates have an email attribute. This email has a different format in the output of different clients. When you register the machine put the host DN as it is. Then open a GGUS ticket for APEL so they can change it internally. This is also known and followed in this savannah bug but at the moment they have to change it manually. Below the savannah bug.

https://savannah.cern.ch/bugs/?70628


*) Dump the DB on on the old MON box with mysqldump. I thought I could tar it up but it didn't like it so I used this instead.

mysqldump -C -Q -u root -p accounting | gzip -c > accounting.sql.gz

*) Copy to and reload on the new machine

zcat accounting.sql.gz | mysql -u root -p accounting

*) Run APEL manually and see how it goes (command is in the cron job).

If you are happy with it go on with the last two steps, otherwise you have found an additional problem I haven't found.

*) Disable the publisher on the old machine, i.e. remove the cron job.

*) Modify parser-config-yaim.xml for all the CEs so they point to the new machine. The line to modify is

<DBURL>jdbc:mysql://<new-machine-fqdn>:3306/accounting</DBURL>

SWITCHING OFF RGMA

When I was happy with the new APEL machine I turned off RGMA and removed it from the services published by the BDII and the GOCDB. This caused the GlueSite object to disappear from our site BDII. You need to have the site BDII in the list of services published before you remove RGMA.