Friday, 19 March 2010

Latest HC tests in Manchester

While waiting for the storage that we are buying with the current tender that will bring us to 320 TB of usable space we are fixing the configuration to optimise the access on the current 80TB.
So we have cabled the 4 data servers in the configuration they and their peers will have eventually. The last test

was showing some progress.

We run it for 12 hours and we had 99% overall efficiency. In particular if compared to test

the other metrics look slightly better. The most noticeable thing, rather than the plain mean values, is the histogram shape of cpu/wall clock time and events/wallclock. They are much healthier with a bell shape instead of a U one. (i.e. especially in the cpu/wall clock we have a more predictable behaviour. In this test the tail of jobs towards zero is drastically reduced). This is only one test and we are still affected by a bad distribution of data in DPM as they are still mostly concentrated on 2 servers over 4. There are also other things we can tweak to optimize access. The next steps to do with the same test (for comparison) are:

1) Spread the data more evenly on the data servers if we can se04 was hammered for a good while and had load 80-100 for few hours according to nagios.

2) Increase the number of jobs that can run at the same time

3) Look at the distribution of jobs on the WN.This might be useful to know how to do it when we will have 8 cores rather than two.

4) Look at the job distribution in time.