Tuesday, 17 February 2015

Replacing the Condor Defrag Daemon

I've replaced the standard DEFRAG daemon released with Condor with a simpler version that contains a proportional integral (PI) controller. I hoped this would give us better control over multicore slots. Preliminary results with the proportional part of the controller show that it fails to keep accurate control over the provision of slots. It is subject to hunting due to the long time lags between the onset of drainin and the eventual change in the controlled variable (which is 'running mcore jobs'). The rate of provision was unexpectedly stable at first, considering the simplicity of the algorithm employed, but degraded over time as the controlled variable became more random.

The graph below shows the very preliminary picture, with a temporary period of stable control shown by the green line on the right of the plot. The setpoint is 250.

I have also now included an Integral component to the controller, and I'm in the process of tuning the reset rate on this. I hope to show the results of this test soon.