Friday, 20 May 2011

BDII again

A couple of weeks ago I upgraded the site BDII and top BDII from a very old version without reinstalling as described in this post. Few days ago I noticed that not all was working as well as I thought and the BDII was reporting stale numbers in the dynamic attributes causing few problems among which biomed submitting an unhealthy 12k jobs.

There were two reasons for this:

1) the unprivileged user that runs the BDII is edguser anymore but ldap. Consequently there were some ownership issues in /opt/glite/var subdirectories and files. This was highlighted in /var/log/bdii/bdii-update.log by permission denied errors which I overlooked for a bit too long. Permissions should be as follow: /opt/glite/var /opt/glite/var/lock, /opt/glite/var/tmp and /opt/glite/var/cache should belong to root and anything below them should belong to ldap. You can check if there is anything that doesn't belong to ldap running

find /opt/glite/var/ ! -user ldap -ls


this will include the top directories above which you can ignore.

2) bdii-update doesn't use anymore glite-info-wrapper and glite-info-generic which used to write the .ldif files in the same directory tree above. It now writes what it needs in /var/run/bdii databases and one unique file new.ldif file calling directly the scripts in /opt/glite/etc/gip/provider and /opt/glite/etc/gip/plugin. I upgraded from an older version and the old providers weren't deleted but continued to be executed by bdii-update. Some of them still read what now are obsolete .ldif. files under /opt/glite/var/cache tree. I deleted all the .ldif files with an additional numeric extension under /opt/glite/var.

With these two changes, i.e. fixing the ownership of the directories and deleting osolete .ldif files (or the old providers if one is sure of which ones) the site bdii restarted to update correctly the dynamic attributes.

Finally a note on making it easier to reinstall: in the previous post I suggested to add manually SLAPD=/usr/sbin/slapd2.4 to change slapd version to the newly installed /opt/bdii/etc/bdii.conf. However an easier way to maintain the service in case it needs reinstallation is to add SLAPD=/usr/sbin/slapd2.4 to site-info.def so that when YAIM runs it gets added to /etc/sysconfig/bdii and doesn't need a manual step is the machine is reinstalled.

Wednesday, 4 May 2011

BDII follow up

To decrease the need of restarting the BDII and following the discussion on tb-support I decided to upgrade to openldap2.4. Since I was at it I also updated both glite-BDII_site and glite-BDII_top (below the list of new rpms) to the latest repositories division since we still had the older common glite-BDII repo. The newest version of BDII has also new paths for most things. For example some config files have been moved to /etc/bdii and /var/run/bdii is the new SLAPD_VAR_DIR. The setting up of the repos are peculiar to Manchester where we mirror a latest version every day but the machines pick up from a stable repository that is updated when needed.

1) rsync glite-BDII_site and glite-BDII_top from Glite-3.2-latest to Glite-3.2 stable

2) Added the rpm to the local external repository from the BDII_top RPMS.external dir so it can be picked up also by BDII_site and if the case also CEs and SE.

3) Create new repo files and added them to cvs

4) Edited cf.yaim-repos to copy them

5) Installed manually (yum install) the rpms openldap2.4 openldap2.4-servers and their dependencies lib64ldap2.4 openldap2.4-extraschemas on BDII_site. In the glite-BDII_top case they are called in as dependencies so there is no need for this.
# This step can be added in cfengine at a later stage if needed.

6) mv /opt/bdii/etc/bdii.conf.rpmnew /opt/bdii/etc/bdii.conf
# Contains the pointer to the new bdii-slapd.conf which contains the new paths. bdii/slapd won't restart with the old bdii.conf.

7) Add SLAPD=/usr/sbin/slapd2.4 to the new /opt/bdii/etc/bdii.conf
# This can go in yaim post function if one really wants.

8) Rerun YAIM

9) Reduced the rate the cron job checks the bdii from 5 to 20 mins. Top bdii seemed to take longer to rebuild probably due to an expired cache causing a loop.

Crossing fingers it will work and stop the BDII periodically hanging.

New Site BDII RPMS

bdii-5.1.22-1
bdii-config-site-0.9.1-1
glite-BDII_site-3.2.11-1.sl5
glite-yaim-bdii-4.1.12-1

New Top BDII RPMS

bdii-5.1.22-1
bdii-config-top-0.0.9-1
glite-BDII_top-3.2.11-1.sl5
glite-yaim-bdii-4.1.12-1

Openldap2.4 RPMS

lib64ldap2.4_2-2.4.22-1.el5
openldap2.4-2.4.22-1.el5
openldap2.4-extra-schemas-1.3-10.el5
openldap2.4-servers-2.4.22-1.el5

UPDATE 20/