Tuesday, 16 December 2008

Phasing out VO queues

I've started to phase out VO queues and to create VO shared queues. The plan is eventually to have 4 queues called with a leap of imagination short, medium, long and test with the following characteristics:

test: 3h/4h; all local VOs
short: 6h/12h; ops,lhcbsgm,atlasgm
medium: 12h/24h all VOs and roles but those in short queue and production
long: 24h/48h all VOs and roles but those that can access the short queue

Installing the queues, adding the groups ACLs and publishing them is not difficult. YAIM (glite-yaim-core-4.0.4-1, glite-yaim-lcg-ce-4.0.4-2 or higher) can do it for you. Otherwise it can be done by hand which is still easy but is more difficult to maintain (the risk to override is always high and files need to be maintained in cfengine or cvs or else).

The problem for me is that this scheme works only if the users select the correct ACLs and a suitable queue with the right length for their jobs in their JDL. If they don't the queue chosen by the WMS is random with high probability of jobs failing because they end up in a queue that is too short or into a queue that doesn't have the right ACLs. So I'm not sure if it's really a good idea even if it is much easier to maintain and allows a bit more sophisticated setups.

Anyway if you do it by YAIM all you have to do is to add the queue to

QUEUES="my-new-queue other-queues"

add the right VO/FQAN to the new queue _GROUP_ENABLE variable (remember to convert . and - into _

MY_NEW_QUEUE_GROUP_ENABLE="atlas /atlas/ROLE=pilot other-vos-or-fqans"

the syntax of GROUP_ENABLE has to be the same as the one you have used in group.conf (see previous post http://northgrid-tech.blogspot.com/2008/12/groupsconf-syntax.html)

And finally add to site-info.def


to enable publishing of the ACL in the GIP.

Rerun YAIM on the CE as normal.

To check everything is ok on the CE

qmgr -c 'p q my-new-queue'

ldapsearch -x -H ldap://MY-CE.MY-DOMAIN:2170 -b GlueCEUniqueID=MY-CE.MY-DOMAIN:2119/jobmanager-lcgpbs-my-new-queue,Mds-Vo-name=resource,o=grid

among other things, if correctly configured it should list the GlueCEAccessControlBaseRules for each VO and FQAN you have listed in _GROUP_ENABLE.

If a
GlueCEAccessControlBaseRule: DENY:FQAN field appears that's the ACL for VOViews not the access to the queue.

Thanks to Steve and Maria for pointing me to the right combination of YAIM packages and confirming the randomness WMS matchmaking.

No comments: