Tuesday, 14 October 2014

Tired of full /var ?

This is how I prevent /var from getting full on any of our servers. I wrote these two scripts, spacemonc.py and spacemond.py. spacemonc.py is a client, and it is installed on each grid system and worker node as a cronjob:
# crontab -l | grep spacemonc.py
50 18 * * * /root/bin/spacemonc.py
Because it's going to be an (almost) single threaded server, I use puppet to make it run at a random time on each system (I say "almost" because it actually uses method level locking to hold each thread in a sleep state, so it's actually a queueing server, I think; it won't drop simultaneous incoming connections, but it's unwise to allow too many of them to occur at once.)
        cron { "spacemonc":
          #ensure => absent,
          command => "/root/bin/spacemonc.py",
          user    => root,
          hour    => fqdn_rand(24),
          minute  => fqdn_rand(60),
        }
And it's pretty small:
/usr/bin/python

import xmlrpclib
import os
import subprocess
from socket import gethostname

proc = subprocess.Popen(["df | perl -p00e 's/\n\s//g' | grep -v ^cvmfs  | grep -v hepraid[0-9][0-9]*_[0-9]"], stdout=subprocess.PIPE, shell=True)
(dfReport, err) = proc.communicate()

s = xmlrpclib.ServerProxy('http://SOMESERVEROROTHER.COM.ph.liv.ac.uk:8000')

status = s.post_report(gethostname(),dfReport)
if (status != 1):
  print("Client failed");
The strange piece of perl in the middle is to stop a bad habit in df of breaking lines that have long fields (I hate that; ldapsearch and qstat also do it.) I don't want to know about cvmfs partitions, nor raid storage mounts.

spacemond.py is installed as a service; you'll have to pinch a /etc/init.d script to start and stop it properly (or do it from the command line to start with.) And the code for spacemond.py is pretty small, too:
#!/usr/local/bin/python2.4

import sys
from SimpleXMLRPCServer import SimpleXMLRPCServer
from SimpleXMLRPCServer import SimpleXMLRPCRequestHandler
import time
import smtplib
import logging

if (len(sys.argv) == 2):
  limit = int(sys.argv[1])
else:
  limit = 90

# Maybe put logging in some time
logging.basicConfig(level=logging.DEBUG,
  format='%(asctime)s %(levelname)s %(message)s',
  filename="/var/log/spacemon/log",
  filemode='a')

# Email details
smtpserver = 'hep.ph.liv.ac.uk'
recipients = ['sjones@hep.ph.liv.ac.uk','sjones@hep.ph.liv.ac.uk']
sender = 'root@SOMESERVEROROTHER.COM.ph.liv.ac.uk'
msgheader = "From: root@SOMESERVEROROTHER.COM.ph.liv.ac.uk\r\nTo: YOURNAME@hep.ph.liv.ac.uk\r\nSubject: spacemon report\r\n\r\n"

# Test the server started
session = smtplib.SMTP(smtpserver)
smtpresult = session.sendmail(sender, recipients, msgheader + "spacemond server started\n")
session.quit()

# Restrict to a particular path.
class RequestHandler(SimpleXMLRPCRequestHandler):
  rpc_paths = ('/RPC2',)

# Create server
server = SimpleXMLRPCServer(("SOMESERVEROROTHER.COM", 8000), requestHandler=RequestHandler)
server.logRequests = 0
server.register_introspection_functions()

# Class with a method to process incoming reports
class SpaceMon:
  def post_report(address,hostname,report):
    full_messages = []
    full_messages[:] = []            # Always empty it

    lines = report.split('\n')
    for l in lines[1:]:
      fields = l.split()
      if (len(fields) >= 5):
        fs = fields[0]
        pc = fields[4][:-1]
        ipc = int(pc)
        if (ipc  >= limit ):
          full_messages.append("File system " + fs + " on " + hostname + " is getting full at " + pc + " percent.\n")
    if (len(full_messages) > 0):
      session = smtplib.SMTP(smtpserver)
      smtpresult = session.sendmail(sender, recipients, msgheader + ("").join(full_messages))
      session.quit()
      logging.info(("").join(full_messages))
    else:
      logging.info("Happy state for " + hostname )
    return 1

# Register and serve
server.register_instance(SpaceMon())
server.serve_forever()
And now I get an email if any of my OS partitions is getting too full. It's surpising how small server software can be when you use a framework like XMLRPC. In the old days, I would have needed 200 lines of parsing code and case statements. Goodbye to all that.