Elena asked about it few days ago on TB-SUPPORT. Today I investigated a bit further and the result is that for glite-yaim-core versions >4.0.4-1:
* Even if it still works the syntax with VO= and GROUP= is obsolete. The new syntax is much simpler as it uses directly the FQANs as reported in the VO cards (if they are maintained).
* The syntax in /opt/glite/yaim/examples/groups.conf.example is correct and the files in the directory are kept up to date with the correct syntax although the examples might not be valid.
* Further information can be found either in
/opt/glite/examples/groups.conf.README
or
https://twiki.cern.ch/twiki/bin/view/LCG/YaimGuide400#Group_configuration_in_YAIM
which is worth to periodically review for changes.
Showing posts with label NorthGrid. Show all posts
Showing posts with label NorthGrid. Show all posts
Monday, 15 December 2008
Tuesday, 25 November 2008
Regional nagios update
I reinstalled the regional nagios with Nagios3 and it works now.
https://niels004.tier2.hep.manchester.ac.uk/nagios
As suggested by Steve I'm also trying the nagios checker plugin
https://addons.mozilla.org/en-US/firefox/addon/3607
instead of the email notification but I still have to configure things properly. At the moment firefox makes some noise every ~30 seconds and there is also a visual alert on the bottom right corner of the firefox window with the number of services in critical state which expands to show the services when the cursor points at it. Really nice. :)
https://niels004.tier2.hep.manchester.ac.uk/nagios
As suggested by Steve I'm also trying the nagios checker plugin
https://addons.mozilla.org/en-US/firefox/addon/3607
instead of the email notification but I still have to configure things properly. At the moment firefox makes some noise every ~30 seconds and there is also a visual alert on the bottom right corner of the firefox window with the number of services in critical state which expands to show the services when the cursor points at it. Really nice. :)
Thursday, 20 November 2008
WMS talk
I gave a talk about the WMS for the Manchester users benefit. It might be of interest to other people.
The talk can be found here:
WMS Overview
The talk can be found here:
WMS Overview
Friday, 7 November 2008
Regional nagios
I installed a Regional nagios yesterday it turns out to be actually quite easy and the nagios group being quite helpful. I followed the tutorial given by Steve at EGEE08
https://twiki.cern.ch/twiki/bin/view/EGEE/GridMonitoringNcgYaimTutorial
I updated it while I was going along instead of writing a parallel document.
Below the URL of the test installation. It might get reinstalled few times to test other features in the next few days.
https://niels004.tier2.hep.manchester.ac.uk/nagios
https://twiki.cern.ch/twiki/bin/view/EGEE/GridMonitoringNcgYaimTutorial
I updated it while I was going along instead of writing a parallel document.
Below the URL of the test installation. It might get reinstalled few times to test other features in the next few days.
https://niels004.tier2.hep.manchester.ac.uk/nagios
Monday, 4 February 2008
Tuesday, 13 November 2007
Some SAM tests don't respect downtime
Sheffield is shown to fail the CE-host-cert-valid test while in downtime. SAM tests should all behave the same. This is on top of the very confusing display of the results in alternate lines. I opened a ticket.
https://gus.fzk.de/ws/ticket_info.php?ticket=28983
https://gus.fzk.de/ws/ticket_info.php?ticket=28983
Tuesday, 30 October 2007
user certificates: p12 to pem
since I was renewing my certificate I added a small script (p12topem.sh) to the subversion repository to convert users p12 certificates into pem format and set their unix permission correctly. i lnked it from here:
https://www.sysadmin.hep.ac.uk/wiki/CA_Certificates_Maintenance
It assumes $HOME/.globus/user*.pem names. it doesn't therefore handle host certificates but could be easily extended.
https://www.sysadmin.hep.ac.uk/wiki/CA_Certificates_Maintenance
It assumes $HOME/.globus/user*.pem names. it doesn't therefore handle host certificates but could be easily extended.
Monday, 29 October 2007
Links to monitoring pages update
I added three links to the FCR one per experiment with all the UK sites selected. It hopefully will make easier to find out who has been blacklisted.
http://www.gridpp.ac.uk/wiki/Links_Monitoring_pages
I also added GridMap link and linked Steve monitoring both as generic dteam and atlas plus the quarters summary plots.
http://www.gridpp.ac.uk/wiki/Links_Monitoring_pages
I also added GridMap link and linked Steve monitoring both as generic dteam and atlas plus the quarters summary plots.
Monday, 15 October 2007
Availability update
Lancs site availability looks OK for the last month at 94% which is 13% above the GridPP average, and this includes a couple of weekends lost due to dCache problems. The record from July-September has been updated on Jeremy's page. We still get the occasional failed SAM submission, no idea what causes these but they serve to deny the availability reaching high nineties.
- June-July instability was dCache issue with the pnfs mount options, this only affected SAM tests where files were created and immediately removed.
- mid-August were SL4 upgrade problems, caused by a few blackhole WNs. This was tracked to the jpackage repository being down which screwed with the auto-install of some WNs.
- mid-September problems were caused by adding a new dCache pool, not bringing online until the issue is understood.

Friday, 12 October 2007
Sys Admin Requests wiki pages
YAIM has a new wiki page for sys admins requests. Maria has sent an announcement to the LCG-ROLLOUT. I added, for bookkeeping, a link and explanations in the sys admin wiki wishlist page where also the ROCs admins management tools requests is linked from.
http://www.sysadmin.hep.ac.uk/wiki/Wishlist
http://www.sysadmin.hep.ac.uk/wiki/Wishlist
Tuesday, 9 October 2007
BDII doc page
After the trouble sheffield went through with the BDII I started a BDII page in the sysadmin wiki.
http://www.sysadmin.hep.ac.uk/wiki/BDII
http://www.sysadmin.hep.ac.uk/wiki/BDII
Monday, 8 October 2007
Manchester RGMA fixed
Fixed RGMA in Manchester. It had, for still obscure reasons, wrong permissions on the host key files. Started a RGMA troubleshooting page on sysadmin wiki:
http://www.sysadmin.hep.ac.uk/wiki/RGMA#RGMA
http://www.sysadmin.hep.ac.uk/wiki/RGMA#RGMA
EGEE '07
EGEE conference. I've given a talk in the SA1-JRA1 session which seems to have had a positive result which will hopefully have some follow up.
Talk can be found at
http://indico.cern.ch/materialDisplay.py?contribId=30&sessionId=49&materialId=slides&confId=18714
and is the sibling of the one we gave in Stockholm at the Ops workshop on problems with SA3 and within SA1.
http://indico.cern.ch/contributionDisplay.py?contribId=25&confId=12807
which had some follow up with SA3 that can be found here
https://savannah.cern.ch/task/?5267
Talk can be found at
http://indico.cern.ch/materialDisplay.py?contribId=30&sessionId=49&materialId=slides&confId=18714
and is the sibling of the one we gave in Stockholm at the Ops workshop on problems with SA3 and within SA1.
http://indico.cern.ch/contributionDisplay.py?contribId=25&confId=12807
which had some follow up with SA3 that can be found here
https://savannah.cern.ch/task/?5267
Sunday, 26 August 2007
Some new links about security
This article is an interesting example of how even someone with very little experience can still do some basic forensic.
http://blog.gnist.org/article.php?story=HollidayCracking
I added the link under the forensic section on the sys admin wiki
http://www.sysadmin.hep.ac.uk/wiki/Basic_Security#Forensic
Since I was at it I added a firewall section to
http://www.sysadmin.hep.ac.uk/wiki/Grid_Security#Firewall_configuration_and_Services_ports
http://blog.gnist.org/article.php?story=HollidayCracking
I added the link under the forensic section on the sys admin wiki
http://www.sysadmin.hep.ac.uk/wiki/Basic_Security#Forensic
Since I was at it I added a firewall section to
http://www.sysadmin.hep.ac.uk/wiki/Grid_Security#Firewall_configuration_and_Services_ports
Dcache Troubleshooting page
My tests on dcache started to fail for obscure reasons due to gsidcap doors misbehaving. I started a trouble shooting page for dcache
http://www.sysadmin.hep.ac.uk/wiki/DCache_Troubleshooting
http://www.sysadmin.hep.ac.uk/wiki/DCache_Troubleshooting
Tuesday, 14 August 2007
GOCDB3 permission denied
I can't edit NorthGrid sites anymore. I opened a ticket.
https://gus.fzk.de/pages/ticket_details.php?ticket=25846
I would be mildly curious to know if other people are experiencing the same or if I'm the only one.
https://gus.fzk.de/pages/ticket_details.php?ticket=25846
I would be mildly curious to know if other people are experiencing the same or if I'm the only one.
Monday, 13 August 2007
lcg_utils bug closed
Ticket about lcg_util bugs has been answered and closed
https://gus.fzk.de/pages/ticket_details.php?ticket=25406&from=allt
Correct version of rpms to install is
[aforti@niels003 aforti]$ rpm -qa GFAL-client lcg_util
lcg_util-1.5.1-1
https://gus.fzk.de/pages/ticket_details.php?ticket=25406&from=allt
Correct version of rpms to install is
[aforti@niels003 aforti]$ rpm -qa GFAL-client lcg_util
lcg_util-1.5.1-1
GFAL-client-1.9.0-2
Update (2007/08/17): The problem was incorrect dependencies expressed in the meta rpms. Maarten opened a savannah bug.
https://savannah.cern.ch/bugs/?28738
Update (2007/08/17): The problem was incorrect dependencies expressed in the meta rpms. Maarten opened a savannah bug.
https://savannah.cern.ch/bugs/?28738
Friday, 10 August 2007
Updating Glue schema
This is sort of old news as the request of updating the BDII is one month old.
To update the Glue schema you need to update the BDII on the BDII machine and on the CE and SE (dcache and classic). DPM SE uses BDII instead of globus-mds now so you should check the recipe for that.
The first problem I found was that
yum update glite-BDII
doesn't update the dependencies but only the meta-rpm. Apparently it works for apt-get but not for yum. So if you use yum you have 3 alternatives
1) yum -y update and risk to screw your machine
2) yum update and check each rpm
3) Look the list of rpms here
http://glite.web.cern.ch/glite/packages/R3.0/deployment/glite-BDII/3.0.2-12/glite-BDII-3.0.2-12.html
yum update
Reconfiguring the BDII doesn't pose a threat so you can
cd
./scripts/configure_node BDII_site
On the CE and SE... you can upgrade the CE and SE and reconfigure the nodes. But I didn't want to do that because you never know what might happen and with the farm full of jobs and the SE being dcache I don't see the point to risk it for a schema upgrade. So what follows is a simple recipe to upgrade the glue schema on CE and SE other than DPM without reconfiguing the nodes.
service globus-mds stop
yum update glue-schema
cd /opt/glue/schema
ln -s openldap-2.0 ldap
service globus-mds start
To check that it worked:
ps -afx -o etime,args | grep slapd
if your BDII is not on the CE and you find slapd instances on ports 2171-2173 it means you are running site BDIIs also on your CE and you should turn it off and remove it from the startup services.
The ldap link is needed because the schema path has changed and unless you want to edit the configuration file (/opt/globus/etc/grid-info-slapd.conf) the easiest thing is to add a link.
Most of this is in this ticket
https://gus.fzk.de/pages/ticket_details.php?ticket=24586&from=allt
including where to find the new schema documentation.
To update the Glue schema you need to update the BDII on the BDII machine and on the CE and SE (dcache and classic). DPM SE uses BDII instead of globus-mds now so you should check the recipe for that.
The first problem I found was that
yum update glite-BDII
doesn't update the dependencies but only the meta-rpm. Apparently it works for apt-get but not for yum. So if you use yum you have 3 alternatives
1) yum -y update and risk to screw your machine
2) yum update and check each rpm
3) Look the list of rpms here
http://glite.web.cern.ch/glite/packages/R3.0/deployment/glite-BDII/3.0.2-12/glite-BDII-3.0.2-12.html
yum update
Reconfiguring the BDII doesn't pose a threat so you can
cd
./scripts/configure_node
On the CE and SE... you can upgrade the CE and SE and reconfigure the nodes. But I didn't want to do that because you never know what might happen and with the farm full of jobs and the SE being dcache I don't see the point to risk it for a schema upgrade. So what follows is a simple recipe to upgrade the glue schema on CE and SE other than DPM without reconfiguing the nodes.
service globus-mds stop
yum update glue-schema
cd /opt/glue/schema
ln -s openldap-2.0 ldap
service globus-mds start
To check that it worked:
ps -afx -o etime,args | grep slapd
if your BDII is not on the CE and you find slapd instances on ports 2171-2173 it means you are running site BDIIs also on your CE and you should turn it off and remove it from the startup services.
The ldap link is needed because the schema path has changed and unless you want to edit the configuration file (/opt/globus/etc/grid-info-slapd.conf) the easiest thing is to add a link.
Most of this is in this ticket
https://gus.fzk.de/pages/ticket_details.php?ticket=24586&from=allt
including where to find the new schema documentation.
Wednesday, 8 August 2007
How to check accounting is working properly
Obviously when you look at the accounting pages at the bottom there is a graph showing running VOs, but that is not straightforward. Other two ways are
The accounting enforcement page showing sites that are not publishing and for how many days they haven't published.
http://www3.egee.cesga.es/acctenfor
which I linked from
https://www.gridpp.ac.uk/wiki/Links_Monitoring_pages#Accounting
or you could setup RSS feeds as suggested in the Apel FAQ.
I also created an Apel page with this information on the sysadmin wiki
http://www.sysadmin.hep.ac.uk/wiki/Apel
The accounting enforcement page showing sites that are not publishing and for how many days they haven't published.
http://www3.egee.cesga.es/acctenfor
which I linked from
https://www.gridpp.ac.uk/wiki/Links_Monitoring_pages#Accounting
or you could setup RSS feeds as suggested in the Apel FAQ.
I also created an Apel page with this information on the sysadmin wiki
http://www.sysadmin.hep.ac.uk/wiki/Apel
Thursday, 2 August 2007
Subscribe to:
Posts (Atom)