So Manchester starting point to upgrade was
- EMI-2 APEL node
- EMI-2 APEL parsers on EMI-3 cream CEs
- We have 1 batch system per CE so I haven't tried a configuration in which there is only 1 batch system and multiple CEs
- In few months we may move to ARC-CE so configuration was done mostly manually
- Install a new EMI-3 APEL node
- Configure it
- Upgrade the CEs parsers to EMI-3 and point them the new node
- Disable the old EMI-2 APEL node and backup its DB
- Run the parsers and fill the new APEL node DB
- Publish all records for the previous month from the new APEL machine
Install a new EMI-3 APEL node
Installed a vanilla VM with
- EMI-3 repositories
- Mysql DB
- Host certificates
- ca-policy-egi-core
- yum install --nogpg emi-release
- yum install apel-ssm apel-client apel-lib
Configure EMI-3 APEL node
I followed the instructions on the official EMI-3 APEL server guide.
There are no tips here I've only changed the obvious fields Like site_name and password plus few others like the top BDII because we have a local one and the location of the hostcertificate because we have a different name.
I didn't install install the publisher cron job at this stage because the machine was not ready yet to publish
Upgrade the CEs parsers to EMI-3 and point them the new node
The CEs as I said are already on EMI-3, only the APEL parsers were still EMI-2 so I disabled the EMI-2 cron job
- rm /etc/cron.d/glite-apel-pbs-parser
- yum install apel-parser
NOTE: the parser configuration file for me is a bit confusing regarding the batch system name it states
# Batch system hostname. This does not need to be a definitive hostname,
# but it should uniquely identify the batch system.
# Example: pbs.gridpp.rl.ac.uk
lrms_server =
It seems you can use any name. You are of course better off using your batch system server name. We have one for each CE so the configuration file on each contains that. In the database this will identify the records from each machine CE. I'm not sure about what happens with 1 batch system and several CEs. Following literally one should put only the batch system but then there is no distinction between CEs.
Disable the old EMI-2 APEL node and backup its DB
Just removed the old cron job the machine is still running but it isn't doing anything while waiting to be decomissioned.
Run the parsers and fill the new APEL node DB
You will need to publish an entire month prior to when you are installing. For example for us it was publish all the June records, but since I didn't want to republish everything we had in the log files I moved the batch system and blah log files prior to mid May to a backup subdirectory and parsed only the log files for end of May June. May days were needed because some jobs that finished in June early days had started in May and one wants the complete record. The first jobs to finish in June in Manchester started on the 25th of May so you may want to go back a bit with the parsing.
Publish all records for the previous month from the new APEL machine
Finally on the new machine now filled with the June records plus some May I've done a bit of DB clean up as suggested by the APEL team. If you don't do this step the APEL team will do it centrally before stitching the old EMI-2 record and the new ones
- Delete from JobRecords where EndTime<"2014-06-01";
- Delete from SuperSummaries where Month="5";
No comments:
Post a Comment