The synchronisation of the DPM database with the data servers file systems has been a long standing issue. Last week we had a crash that made more imperative to check all the files and I eventually wrote a bash script that makes use of the GridPP DPM admin tools. I don't think this should be the final version but I'm quicker with bash than with python and therefore I started with that. Hopefully later in the year I'll have more time to write a cleaner version in python that can be inserted in the admin tools based on this one. It does the following:
1) Create a list of files that are in the DB but not on disk
2) Create a list of files that are on disk but not in the DB
3) Create a list of SURLs from the list of files in the DB but not on disk to declare lost (this is mostly for atlas but could be used by LFC administrators for other VOs)
4) If not in dry run mode proceed to delete the orphan files and the orphan entries in the DB.
5) Print stats of how many files were in either list.
Although I put few protections this script should be run with care and unless in dry run mode shouldn't be run automatically AT ALL. However in dry run mode it will tell you how many files are lost and it is a good metric to monitor regularly as well as when there is a big crash.
If you want to run it, it has to run on the data servers where there is access to the file system. As it is now it requires a modified version of /opt/lcg/etc/DPMINFO that point to the head node rather than localhost because one of the admin tools used does a direct mysql query. For the same reason it also requires dpminfo user to have mysql select privileges from the data servers. This is the part that really could benefit from a rewriting in python and perhaps a proper API use as the other tool does. I also had to heavily parse the output of the tools which weren't created exactly for this purpose and this could also be avoided in a python script. There are no options but all the variables that could be options to customize the script with your local settings (head node, fs mount point, dry_run) are easily found at the top.
To create the lists it takes really little time no more than 3 minutes on my system but it depends mostly on how busy is your head node.
If you want to do a cleanup instead it is proportional to how many files have been lost and can take several hours since it does one DB operation per file. The time to delete the orphan files also depends on how many and how big they are but should take less than DB cleanup.
The script is here: http://www.sysadmin.hep.ac.uk/svn/fabric-management/dpm/dpm-synchronise-disk-db.sh
Northgrid-tech
Saturday, 14 January 2012
Wednesday, 30 November 2011
DPM upgrade 1.7.4 -> 1.8.2 (glite 3.2)
Last week I upgraded our DPM installation. It was a major change because I upgraded not only the DPM version but also the hardware and the backend mysql version.
I didn't take any measures this time before and after. I knew that becoming an alpha site in atlas was taking its toll on the old hardware and many of the timeouts were from gridftp but there had been a reappearance of the mysql ones I talked about in previous posts at the level that even restarting the service was hard.
[ ~]# service mysqld restart
Timeout error occurred trying to stop MySQL Daemon.
Stopping MySQL: [FAILED]
Timeout error occurred trying to start MySQL Daemon.
So I decided that the situation had become unsustainable and it was time to move to better hardware and software versions.
* Hardware: 2 cpu, 4GB mem, 2x250 GB raid1 -> 4 cores (HT on = 8 job slots), 24GB mem, 2x2TB raid1
There is no why here it was ok when we had limited access but the recent load was really too much for the old machine even with all the tuning. Suspected bad blocks on disks could be possible but no red leds nor hardware errors were reported by the machine.
* Mysql: 5.0.77 -> 5.5.10
Why mysql 5.5? Because InnoDB is the default engine and they have improved performance and instrumentation. On top of other things that we might actually start to use. A good blog article about the 5 reasons to move is this one: 5 good reasons to upgrade to mysql 5.5.
MySQL 5.5 is not in EPEL yet, but I found this CentOS community site that has the rpms and the instructions to install them.
After the installation I've also optimized the database partially with what I had already done in July, partly running a handy script mysqltuner.pl. This last one helps with variable you might not even know and even if you know them it tells you if they are too small. You need to be patient and let pass few hours before run it again.
* DPM: 1.7.4 -> 1.8.2
Why DPM 1.8.2 from glite 3.2? I would have gone for the UMD release or even the EMI one but then glite 3.2 was moved to production earlier than those and since I waited for this release since at least April I didn't think about it twice when I saw the escape route. It was really good timing too as it happened when I really couldn't postpone an upgrade anymore. You can find more info in the release notes. Among other reasons to upgrade: srmv2.2 in 1.7.4 has a memory leak which wasn't noticeable until the load was contained but for us exploded in October and is the reason I had to restart it every two days in the past few weeks.
Below the steps I took to reinstall the head node
On the old head node
* Set the site in downtime, drain the queues and kill all the remaining jobs.
* Turn off all the dpm and bdii services on the old head node
* Make a dump of the current database for backup
mysqldump -C -Q -u root -p -B dpm_db cns_db > dpm.sql-20111125.gz
* Download dpm-drop-requests-tables.sql supplied by Jean Philippe last July
wget http://www.sysadmin.hep.ac.uk/svn/fabric-management/dpm/dpm-drop-requests-tables.sql
* Drop the requests tables. This step is really useful to avoid painful reload times as I said in this other post about DPM optimization and because it drastically reduces the size of ibdata1 when you reload which has also benefits (my ibdata1 was reduced from 26GB to 1.7GB). Still you need to plan because it might take few hours depending on the system. On my old hardware it took around 7 hours.
mysql -p < dpm-drop-requests-tables.sql
* Dump reduced version of the database
mysqldump -C -Q -u root -p -B dpm_db cns_db > dpm.sql-20111125-v2.gz
* Copy both to a WEB server where they can be downloaded from in a later stage.
* Update the local repository for DPM head node and DPM disk servers. Since it is still glite I just had to rsync the latest mirror to the static area.
On the new head node
* Install the new machines with a DPM head node profile. This was again easy since it is still glite no changes were required in cfengine.
* Most of the following is not standard and I put it in a script. If you have problems with users IDs created by avahi packages you can uninstall them with yum removing all the dependencies and let them be reinstalled by the bdii dependency chain. It should work also uninstalling them with rpm -e --nodeps. This leaves redhat-lsb (which is what the bdii depends on) untouched but I haven't tried this last method. Here are the commands I executed:
# Get the dpm DB file
rm -rf dpm.sql-20111125-v2.gz*
wget http://ks.tier2.hep.manchester.ac.uk/T2/tmp/dpm.sql-20111125-v2.gz
# Install mysql5.5
rpm -Uvh http://repo.webtatic.com/yum/centos/5/latest.rpm
yum -y remove libmysqlclient5 mysql mysql-*
yum -y clean all
yum -y install mysql55 mysql55-server libmysqlclient5 --enablerepo=webtatic
service mysql stop
rm -rf /var/lib/mysql/*
# Get the local my.cnf
cfagent -vq
service mysqld start
# Install the DPM rpms
yum -y remove cups avahi avahi-compat-libdns_sd avahi-glib
yum -y install glite-SE_dpm_mysql lcg-CA
# Modify sql scripts for mysql5.5
cd
/opt/lcg/share/DPM/ for a in create_dp*.sql; do sed -i.old 's/TYPE/ENGINE/g' $a;done
grep ENGINE *
# Run YAIM and upload old DB
cd
/opt/glite/yaim/bin/yaim -c -s /opt/glite/yaim/etc/site-info.def -n glite-SE_dpm_mysql
mysql -u root -p -C < /root/dpm.sql-20111125-v2.gz
# NECESSARY FOR THE FINAL UPDATES
/opt/glite/yaim/bin/yaim -c -s /opt/glite/yaim/etc/site-info.def -n glite-SE_dpm_mysql
* You will need to install the dpm-contrib-admintool rm because it is not in the glite repository it might be in the EMI one. Last time I heard it made it to ETICS. If you can't find it there's still the sysadmin repo version and related notes on the GridPP wiki (Sam or Wahid welcome to leave an update on this one).
* To upgrade the disk servers I just updated the repository, upgraded the rpms and rerun yaim.
I didn't take any measures this time before and after. I knew that becoming an alpha site in atlas was taking its toll on the old hardware and many of the timeouts were from gridftp but there had been a reappearance of the mysql ones I talked about in previous posts at the level that even restarting the service was hard.
[ ~]# service mysqld restart
Timeout error occurred trying to stop MySQL Daemon.
Stopping MySQL: [FAILED]
Timeout error occurred trying to start MySQL Daemon.
So I decided that the situation had become unsustainable and it was time to move to better hardware and software versions.
* Hardware: 2 cpu, 4GB mem, 2x250 GB raid1 -> 4 cores (HT on = 8 job slots), 24GB mem, 2x2TB raid1
There is no why here it was ok when we had limited access but the recent load was really too much for the old machine even with all the tuning. Suspected bad blocks on disks could be possible but no red leds nor hardware errors were reported by the machine.
* Mysql: 5.0.77 -> 5.5.10
Why mysql 5.5? Because InnoDB is the default engine and they have improved performance and instrumentation. On top of other things that we might actually start to use. A good blog article about the 5 reasons to move is this one: 5 good reasons to upgrade to mysql 5.5.
MySQL 5.5 is not in EPEL yet, but I found this CentOS community site that has the rpms and the instructions to install them.
After the installation I've also optimized the database partially with what I had already done in July, partly running a handy script mysqltuner.pl. This last one helps with variable you might not even know and even if you know them it tells you if they are too small. You need to be patient and let pass few hours before run it again.
* DPM: 1.7.4 -> 1.8.2
Why DPM 1.8.2 from glite 3.2? I would have gone for the UMD release or even the EMI one but then glite 3.2 was moved to production earlier than those and since I waited for this release since at least April I didn't think about it twice when I saw the escape route. It was really good timing too as it happened when I really couldn't postpone an upgrade anymore. You can find more info in the release notes. Among other reasons to upgrade: srmv2.2 in 1.7.4 has a memory leak which wasn't noticeable until the load was contained but for us exploded in October and is the reason I had to restart it every two days in the past few weeks.
Below the steps I took to reinstall the head node
On the old head node
* Set the site in downtime, drain the queues and kill all the remaining jobs.
* Turn off all the dpm and bdii services on the old head node
* Make a dump of the current database for backup
mysqldump -C -Q -u root -p -B dpm_db cns_db > dpm.sql-20111125.gz
* Download dpm-drop-requests-tables.sql supplied by Jean Philippe last July
wget http://www.sysadmin.hep.ac.uk/svn/fabric-management/dpm/dpm-drop-requests-tables.sql
* Drop the requests tables. This step is really useful to avoid painful reload times as I said in this other post about DPM optimization and because it drastically reduces the size of ibdata1 when you reload which has also benefits (my ibdata1 was reduced from 26GB to 1.7GB). Still you need to plan because it might take few hours depending on the system. On my old hardware it took around 7 hours.
mysql -p < dpm-drop-requests-tables.sql
* Dump reduced version of the database
mysqldump -C -Q -u root -p -B dpm_db cns_db > dpm.sql-20111125-v2.gz
* Copy both to a WEB server where they can be downloaded from in a later stage.
* Update the local repository for DPM head node and DPM disk servers. Since it is still glite I just had to rsync the latest mirror to the static area.
On the new head node
* Install the new machines with a DPM head node profile. This was again easy since it is still glite no changes were required in cfengine.
* Most of the following is not standard and I put it in a script. If you have problems with users IDs created by avahi packages you can uninstall them with yum removing all the dependencies and let them be reinstalled by the bdii dependency chain. It should work also uninstalling them with rpm -e --nodeps. This leaves redhat-lsb (which is what the bdii depends on) untouched but I haven't tried this last method. Here are the commands I executed:
# Get the dpm DB file
rm -rf dpm.sql-20111125-v2.gz*
wget http://ks.tier2.hep.manchester.ac.uk/T2/tmp/dpm.sql-20111125-v2.gz
# Install mysql5.5
rpm -Uvh http://repo.webtatic.com/yum/centos/5/latest.rpm
yum -y remove libmysqlclient5 mysql mysql-*
yum -y clean all
yum -y install mysql55 mysql55-server libmysqlclient5 --enablerepo=webtatic
service mysql stop
rm -rf /var/lib/mysql/*
# Get the local my.cnf
cfagent -vq
service mysqld start
# Install the DPM rpms
yum -y remove cups avahi avahi-compat-libdns_sd avahi-glib
yum -y install glite-SE_dpm_mysql lcg-CA
# Modify sql scripts for mysql5.5
cd
/opt/lcg/share/DPM/ for a in create_dp*.sql; do sed -i.old 's/TYPE/ENGINE/g' $a;done
grep ENGINE *
# Run YAIM and upload old DB
cd
/opt/glite/yaim/bin/yaim -c -s /opt/glite/yaim/etc/site-info.def -n glite-SE_dpm_mysql
mysql -u root -p -C < /root/dpm.sql-20111125-v2.gz
# NECESSARY FOR THE FINAL UPDATES
/opt/glite/yaim/bin/yaim -c -s /opt/glite/yaim/etc/site-info.def -n glite-SE_dpm_mysql
* You will need to install the dpm-contrib-admintool rm because it is not in the glite repository it might be in the EMI one. Last time I heard it made it to ETICS. If you can't find it there's still the sysadmin repo version and related notes on the GridPP wiki (Sam or Wahid welcome to leave an update on this one).
* To upgrade the disk servers I just updated the repository, upgraded the rpms and rerun yaim.
Wednesday, 28 September 2011
Friday, 9 September 2011
cvmfs upgrade to 2.0.3
Last week I upgraded the cvmfs on all the WN to cvmfs-2.0.3. The upgrade for us required two steps.
1) change of repository: since Manchester was the first to use the new atlas setup we were pointing to CERN repository. The new setup has now become standard so I just had to remove the override variable CVMFS_SERVER_URL from atlas.cern.ch.local. The file is distributed by cfengine so I just changed it in cvs.
2) rpms upgrade: I had some initial difficulties because I was following the instructions for atlas T3 - which normally work also for T2 - that suggested to install cvmfs-auto-setup rpm. This rpm runs service cvmfs restartautofs and in the instructions it was suggested also to rerun it manually. This on busy machines causes the repositories to disappear and requires a service cvmfs restartclean which wipes the cache off and is not really recommended in production. In reality none of this is really necessary and a simple
yum -y update cvmfs cvmfs-init-scripts
is sufficient. I could add the rpms version in cfengine and that was enough. The change from one version to another happens at the first unmount. Forcing this with a restartautofs is counterproductive (thanks to Ian for pointing this out).
Next week there should be a bug fix version that will take care of slow mount and some slow client tools routines on busy machines.
http://savannah.cern.ch/bugs/?86349
But since the upgrade procedure is so easy and the corrupted files problem
http://savannah.cern.ch/support/?122564
is fixed in cvmfs >2.0.2 I decided to upgrade anyway on Wednesday to avoid further errors in atlas and possibly lhcb.
NOTE: Of course I tested each step on few nodes to check everything worked before rolling out with cfengine on all nodes. Always a good practice not to follow recipes blindly!
1) change of repository: since Manchester was the first to use the new atlas setup we were pointing to CERN repository. The new setup has now become standard so I just had to remove the override variable CVMFS_SERVER_URL from atlas.cern.ch.local. The file is distributed by cfengine so I just changed it in cvs.
2) rpms upgrade: I had some initial difficulties because I was following the instructions for atlas T3 - which normally work also for T2 - that suggested to install cvmfs-auto-setup rpm. This rpm runs service cvmfs restartautofs and in the instructions it was suggested also to rerun it manually. This on busy machines causes the repositories to disappear and requires a service cvmfs restartclean which wipes the cache off and is not really recommended in production. In reality none of this is really necessary and a simple
yum -y update cvmfs cvmfs-init-scripts
is sufficient. I could add the rpms version in cfengine and that was enough. The change from one version to another happens at the first unmount. Forcing this with a restartautofs is counterproductive (thanks to Ian for pointing this out).
Next week there should be a bug fix version that will take care of slow mount and some slow client tools routines on busy machines.
http://savannah.cern.ch/bugs/?86349
But since the upgrade procedure is so easy and the corrupted files problem
http://savannah.cern.ch/support/?122564
is fixed in cvmfs >2.0.2 I decided to upgrade anyway on Wednesday to avoid further errors in atlas and possibly lhcb.
NOTE: Of course I tested each step on few nodes to check everything worked before rolling out with cfengine on all nodes. Always a good practice not to follow recipes blindly!
Wednesday, 6 July 2011
cvmfs installation
Last week after few months delay I finally installed cvmfs. It's since 2002-2003 that I advocate the use of a shared file system for the input sandbox with locally cached data. AFS was successfully used in grid and non grid environment by BaBar users and is still used by local non-LHC users in Manchester for small work. So I'm pretty happy that a light weight caching file system is now available for more robust traffic. This is a really good moment to install cvmfs for two reasons:
1) Lhcb asked for it too.
2) Atlas has moved its condb files from the HOTDISK space token to cvmfs.
And it should reduce drastically errors for both NFS and SE load.
These are my installation notes:
* Install cernvm.repo: you can find it here or you can copy the rpms in your local and install from there. I distribute the file with cfengine but otherwise
cd /etc/yum.repos.d/
wget http://cvmrepo.web.cern.ch/cvmrepo/yum/cernvm.repo
* Install the gpg key: yum didn't like the key and was giving errors. I don't know if the problem is only mine (possible) I anyway told the developers and in the meantime I had to remove the key check from the repo file and trust the rpms. But if you want to try it, it might work for you:
cd /etc/pki/rpm-gpg/
wget http://cvmrepo.web.cern.ch/cvmrepo/yum/RPM-GPG-KEY-CernVM
* Install the rpms. In the documents there is an additional rpm cvmfs-auto-setup which is not really necessary and was also causing problems due to some migration lines devised for upgrades. Other than that it runs a setup and a restart command that can be run by your configuration tool of choice. S. Traylen also suggested to install SL_no_colorls to avoid ls /cvmfs mounting all the file systems that's why it's in the list.
yum install -y fuse cvmfs−keys cvmfs cvmfs−init−scripts SL_no_colorls
* Install configuration files. Below is what I added. For atlas there is in the docs a nightlies repository but that's not ready yet and isn't going to work. The default QUOTA_LIMIT set in default.local can be overridden in the experiment configuration. For each of this files there is a .conf file and a .local you should edit only .local. If they are not there just create them.
You need to override the CVMFS_SERVER_URL for atlas otherwise you don't get the new setup. While in cern.ch.local I simply inverted the order of the server to get RAL first and then the other two if RAL fails. I also removed CERNVM_SERVER_URL which appears in cern.ch.conf otherwise it goes to CERN first even though it's not apparently defined anywhere.
/etc/cvmfs/default.local
CVMFS_REPOSITORIES=atlas,atlas-condb,lhcb
CVMFS_CACHE_BASE=/scratch/var/cache/cvmfs2
CVMFS_QUOTA_LIMIT=2000
CVMFS_HTTP_PROXY="http://[YOUR-SQUID-CACHE]:3128"
/etc/cvmfs/config.d/atlas.cern.ch.local
CVMFS_QUOTA_LIMIT=10000
CVMFS_SERVER_URL=http://cvmfs-stratum-one.cern.ch/opt/atlas-newns
/etc/cvmfs/config.d/lhcb.cern.ch.local
CVMFS_QUOTA_LIMIT=5000
/etc/cvmfs/domain.d/cern.ch.local
CVMFS_SERVER_URL="http://cernvmfs.gridpp.rl.ac.uk/opt/@org@;http://cvmfs-stratum-one.cern.ch/opt/@org@;http://cvmfs.racf.bnl.gov/opt/@org@"
CVMFS_PUBLIC_KEY=/etc/cvmfs/keys/cern.ch.pub
* Create the cache space. By default it's in /var/cache. However I moved it to the /scratch partition which is bigger.
mkdir -p /scratch/var/cache/cvmfs2
chown cvmfs:cvmfs /scratch/var/cache/cvmfs2
chmod 2755 /scratch/var/cache/cvmfs2
* Run the setup. These are the commands the cvmfs-auto-setup would run at installation time. They also configure fuse although that's only one line added to fuse.conf.
/usr/bin/cvmfs_config setup
service cvmfs restartautofs
chkconfig cvmfs on
service cvmfs restart
* Some parameters need to change for squid. Below is what the documentation suggests. I tuned it to the size of my machine. For example the maximum_object_size and cache_mem were too big and I checked which other parameters were already set to evaluate if it was the case to change them.
collapsed_forwarding on
max_filedesc 8192
maximum_object_size 4096 MB
cache_mem 4096 MB
maximum_object_size_in_memory 32 KB
cache_dir ufs /var/spool/squid 50000 16 256
* Apply changes for Lhcb the VO_LHCB_SW_DIR needs to point to cvmfs. You can change it in YAIM and rerun it or you can do as I've done (still making sure to change YAIM so that freshly installed nodes don't need this hack). Lhcb with this change is good to go.
sed -i.sed.bak 's%/nfs/lhcb%/cvmfs/lhcb.cern.ch%' /etc/profile.d/grid-env.sh
mv /etc/profile.d/grid-env.sh.sed.bak /root
* Apply changes for Atlas. A similar change to VO_ATLAS_SW_DIR is required and you need to set an additional variable that is not handled by YAIM. For now I added it to grid-env.sh but it be better placed in another file not touched by YAIM or a snippet should be added to YAIM to handle the variable. This is enough for the jobs to start using the software area. However you still have to contact the atlas sw team to do their validation tests and enable the condb use. They'll propose a long way and a short way. I took the short because I didn't want to go in downtime and jobs were already running using the new setup.
sed -i.sed.2 's%"/nfs/atlas"%"/cvmfs/atlas.cern.ch/repo/sw"\ngridenv_set "ATLAS_LOCAL_AREA" "/nfs/atlas/local"%' /etc/profile.d/grid-env.sh
mv /etc/profile.d/grid-env.sh.sed.bak /root
* Always for Atlas remove some installed .conf files which install a link in /opt which is not necessary anymore. Second file might not exist, but there is an atlas-nightly.cern.ch.conf. This will surely change in future cvmfs releases.
service cvmfs stop
rm /etc/cvmfs/config.d/atlas.cern.ch.conf
rm /etc/cvmfs/config.d/atlas-condb.cern.ch.conf
service cvmfs start
Update 12/7/2011: Using YAIM
cfengine only installs the rpms and the configuration files (*.local). All the rest is now carried out by a YAIM function I created (config_cvmfs). I put a tar file here.To make it work I also added a node description in node-info.d/cvmfs (also in the tar file) that contains it. In this way I don't have to touch any already existing YAIM files and I can just add -n CVMFS to the YAIM command line we use to configure the WNs. It requires ATLAS_LOCAL_AREA and CVMFS_CACHE_DIR variables to be set in your site-info.def.
CVMFS docs are here
Release Notes
Init Scripts Overview
Examples
Technical Report
RAL T1
Atlas T2/T3 setup
Atlas latest changes
1) Lhcb asked for it too.
2) Atlas has moved its condb files from the HOTDISK space token to cvmfs.
And it should reduce drastically errors for both NFS and SE load.
These are my installation notes:
* Install cernvm.repo: you can find it here or you can copy the rpms in your local and install from there. I distribute the file with cfengine but otherwise
cd /etc/yum.repos.d/
wget http://cvmrepo.web.cern.ch/cvmrepo/yum/cernvm.repo
* Install the gpg key: yum didn't like the key and was giving errors. I don't know if the problem is only mine (possible) I anyway told the developers and in the meantime I had to remove the key check from the repo file and trust the rpms. But if you want to try it, it might work for you:
cd /etc/pki/rpm-gpg/
wget http://cvmrepo.web.cern.ch/cvmrepo/yum/RPM-GPG-KEY-CernVM
* Install the rpms. In the documents there is an additional rpm cvmfs-auto-setup which is not really necessary and was also causing problems due to some migration lines devised for upgrades. Other than that it runs a setup and a restart command that can be run by your configuration tool of choice. S. Traylen also suggested to install SL_no_colorls to avoid ls /cvmfs mounting all the file systems that's why it's in the list.
yum install -y fuse cvmfs−keys cvmfs cvmfs−init−scripts SL_no_colorls
* Install configuration files. Below is what I added. For atlas there is in the docs a nightlies repository but that's not ready yet and isn't going to work. The default QUOTA_LIMIT set in default.local can be overridden in the experiment configuration. For each of this files there is a .conf file and a .local you should edit only .local. If they are not there just create them.
You need to override the CVMFS_SERVER_URL for atlas otherwise you don't get the new setup. While in cern.ch.local I simply inverted the order of the server to get RAL first and then the other two if RAL fails. I also removed CERNVM_SERVER_URL which appears in cern.ch.conf otherwise it goes to CERN first even though it's not apparently defined anywhere.
/etc/cvmfs/default.local
CVMFS_REPOSITORIES=atlas,atlas-condb,lhcb
CVMFS_CACHE_BASE=/scratch/var/cache/cvmfs2
CVMFS_QUOTA_LIMIT=2000
CVMFS_HTTP_PROXY="http://[YOUR-SQUID-CACHE]:3128"
/etc/cvmfs/config.d/atlas.cern.ch.local
CVMFS_QUOTA_LIMIT=10000
CVMFS_SERVER_URL=http://cvmfs-stratum-one.cern.ch/opt/atlas-newns
/etc/cvmfs/config.d/lhcb.cern.ch.local
CVMFS_QUOTA_LIMIT=5000
/etc/cvmfs/domain.d/cern.ch.local
CVMFS_SERVER_URL="http://cernvmfs.gridpp.rl.ac.uk/opt/@org@;http://cvmfs-stratum-one.cern.ch/opt/@org@;http://cvmfs.racf.bnl.gov/opt/@org@"
CVMFS_PUBLIC_KEY=/etc/cvmfs/keys/cern.ch.pub
* Create the cache space. By default it's in /var/cache. However I moved it to the /scratch partition which is bigger.
mkdir -p /scratch/var/cache/cvmfs2
chown cvmfs:cvmfs /scratch/var/cache/cvmfs2
chmod 2755 /scratch/var/cache/cvmfs2
* Run the setup. These are the commands the cvmfs-auto-setup would run at installation time. They also configure fuse although that's only one line added to fuse.conf.
/usr/bin/cvmfs_config setup
service cvmfs restartautofs
chkconfig cvmfs on
service cvmfs restart
* Some parameters need to change for squid. Below is what the documentation suggests. I tuned it to the size of my machine. For example the maximum_object_size and cache_mem were too big and I checked which other parameters were already set to evaluate if it was the case to change them.
collapsed_forwarding on
max_filedesc 8192
maximum_object_size 4096 MB
cache_mem 4096 MB
maximum_object_size_in_memory 32 KB
cache_dir ufs /var/spool/squid 50000 16 256
* Apply changes for Lhcb the VO_LHCB_SW_DIR needs to point to cvmfs. You can change it in YAIM and rerun it or you can do as I've done (still making sure to change YAIM so that freshly installed nodes don't need this hack). Lhcb with this change is good to go.
sed -i.sed.bak 's%/nfs/lhcb%/cvmfs/lhcb.cern.ch%' /etc/profile.d/grid-env.sh
mv /etc/profile.d/grid-env.sh.sed.bak /root
* Apply changes for Atlas. A similar change to VO_ATLAS_SW_DIR is required and you need to set an additional variable that is not handled by YAIM. For now I added it to grid-env.sh but it be better placed in another file not touched by YAIM or a snippet should be added to YAIM to handle the variable. This is enough for the jobs to start using the software area. However you still have to contact the atlas sw team to do their validation tests and enable the condb use. They'll propose a long way and a short way. I took the short because I didn't want to go in downtime and jobs were already running using the new setup.
sed -i.sed.2 's%"/nfs/atlas"%"/cvmfs/atlas.cern.ch/repo/sw"\ngridenv_set "ATLAS_LOCAL_AREA" "/nfs/atlas/local"%' /etc/profile.d/grid-env.sh
mv /etc/profile.d/grid-env.sh.sed.bak /root
* Always for Atlas remove some installed .conf files which install a link in /opt which is not necessary anymore. Second file might not exist, but there is an atlas-nightly.cern.ch.conf. This will surely change in future cvmfs releases.
service cvmfs stop
rm /etc/cvmfs/config.d/atlas.cern.ch.conf
rm /etc/cvmfs/config.d/atlas-condb.cern.ch.conf
service cvmfs start
Update 12/7/2011: Using YAIM
cfengine only installs the rpms and the configuration files (*.local). All the rest is now carried out by a YAIM function I created (config_cvmfs). I put a tar file here.To make it work I also added a node description in node-info.d/cvmfs (also in the tar file) that contains it. In this way I don't have to touch any already existing YAIM files and I can just add -n CVMFS to the YAIM command line we use to configure the WNs. It requires ATLAS_LOCAL_AREA and CVMFS_CACHE_DIR variables to be set in your site-info.def.
CVMFS docs are here
Release Notes
Init Scripts Overview
Examples
Technical Report
RAL T1
Atlas T2/T3 setup
Atlas latest changes
Wednesday, 22 June 2011
How to remove apel warnings and avoid nagios alerts
Quite few sites have few entries in APEL that don't quite match. They can appear with two messages
OK [ Minor discrepancy in even numbers ]
WARN [ Missing data detected ]
They don't look good on the Sync page and nagios also sends alerts for this problem which is even more annoying.
The problem is caused by few records with the wrong time stamp (StartTime=01-01-1970). These records need to be deleted from the local database and the period were they appear republished with the gap publisher. To delete the records connect to your local APEL mysql and run:
mysql> delete from LcgRecords where StartTimeEpoch = 0;
Then for each month were the entries appear rerun the gap publisher. And finally rerun the publisher in missing records mode to update the SYNC page or you can wait the next proper run if you are not impatient.
Thanks to Cristina for this useful tip she gave me in this ticket.
OK [ Minor discrepancy in even numbers ]
WARN [ Missing data detected ]
They don't look good on the Sync page and nagios also sends alerts for this problem which is even more annoying.
The problem is caused by few records with the wrong time stamp (StartTime=01-01-1970). These records need to be deleted from the local database and the period were they appear republished with the gap publisher. To delete the records connect to your local APEL mysql and run:
mysql> delete from LcgRecords where StartTimeEpoch = 0;
Then for each month were the entries appear rerun the gap publisher. And finally rerun the publisher in missing records mode to update the SYNC page or you can wait the next proper run if you are not impatient.
Thanks to Cristina for this useful tip she gave me in this ticket.
Tuesday, 14 June 2011
DPM optimization next round
After I applied 3 of the mysql parameters changes I talk about in this post I didn't see the improvement I was hoping with atlas jobs time outs.
This is another set of optimizations I put together after further search
First of all I started to systematically count the time TIME_WAIT connections every five minutes. I also correlated them in the same log file to the number of concurrent threads the server keeps mostly in sleep mode. You can get the last bit running mysqladmin -p proc stat or from within a mysql command line. The number of threads was near to the max allowed default value in mysql so I doubled that in my.cnf
max_connections=200
then I halved the kernel time out for TIME_WAIT connections
sysctl -w net.ipv4.tcp_fin_timeout=30
the default value is 60 sec. If you add it to /etc/sysctl.conf it becomes permanent.
Finally I found this article which explicitly talks about mysql tunings to reduce connection timeouts: Mysql Connection Timeouts and I set the following
sysctl -w net.ipv4.tcp_max_syn_backlog=8192
sysctl -w net.core.somaxconn=512
again add to /etc/sysctl.conf to make it permanent; and added in my.cnf
back_log=500
I calculated my numbers on 500 connections/s because that's what I have observed when I did all this (I obeserved even larger numbers). Admittedly now they are stable at 330 connections per second but we haven't had any heavy ramp up since Saturday. Only a mild one but that didn't cause any time out. I'm waiting for a serious ramp as definitive test. Said that since Saturday we haven't seen any timeout errors not even the low background that was always present. So there is already an improvement.
Update 16/06/2011
Today there was an atlas ramp from almost 0 to >1400 jobs and no time outs so far.
Few timeouts were seen yesterday but they were due to authentication between the head node and a couple of data servers which I will have to investigate but they are a handful, nowhere near the scale observed before and not due to mysql. I will still keep things under observation for a while longer. Just in case.
This is another set of optimizations I put together after further search
First of all I started to systematically count the time TIME_WAIT connections every five minutes. I also correlated them in the same log file to the number of concurrent threads the server keeps mostly in sleep mode. You can get the last bit running mysqladmin -p proc stat or from within a mysql command line. The number of threads was near to the max allowed default value in mysql so I doubled that in my.cnf
max_connections=200
then I halved the kernel time out for TIME_WAIT connections
sysctl -w net.ipv4.tcp_fin_timeout=30
the default value is 60 sec. If you add it to /etc/sysctl.conf it becomes permanent.
Finally I found this article which explicitly talks about mysql tunings to reduce connection timeouts: Mysql Connection Timeouts and I set the following
sysctl -w net.ipv4.tcp_max_syn_backlog=8192
sysctl -w net.core.somaxconn=512
again add to /etc/sysctl.conf to make it permanent; and added in my.cnf
back_log=500
I calculated my numbers on 500 connections/s because that's what I have observed when I did all this (I obeserved even larger numbers). Admittedly now they are stable at 330 connections per second but we haven't had any heavy ramp up since Saturday. Only a mild one but that didn't cause any time out. I'm waiting for a serious ramp as definitive test. Said that since Saturday we haven't seen any timeout errors not even the low background that was always present. So there is already an improvement.
Update 16/06/2011
Today there was an atlas ramp from almost 0 to >1400 jobs and no time outs so far.
Few timeouts were seen yesterday but they were due to authentication between the head node and a couple of data servers which I will have to investigate but they are a handful, nowhere near the scale observed before and not due to mysql. I will still keep things under observation for a while longer. Just in case.
Subscribe to:
Posts (Atom)
