Rocks burn-in

fah-on-rocks-sm
The other day I was talking about how to install Rocks Cluster. Well, today I’ll give you indication on how to test it out a bit. Now this is surely not the *proper* way to test the cluster out, which would be to run some fancy cluster-aware graphics rendering application or something of the sort, but this will put something on there and make it churn out some cpu cycles just to see how things look.

What I like to use for this task is Folding At Home, which is a protein folding program (hey, help cure diseases and stuff, right). You can get things ready by downloading the appropriate version of the client for your machine(s) from the download section. The current one that I am using is the Linux version 6.24 Beta.

Log on to your cluster and create a directory for each node that you want to run the FAH client on. If you only have a couple,. it’s easy to just do that by hand, if not, you can use this simple script:

#!/bin/bash
rockslist=$(rocks list host | grep ‘:’ | cut -d’:’ -f1)
for name in $rockslist
do
mkdir -p $name
done

From there, extract your FAH client file you just downloaded into your headnode directory. Tip: you headnode directory will be named something *other* than compute-?-?. Take the fah5 and mpiexec files from there and copy them to all your compute-?-? directories.

This should really get better instruction, but you’ll want to install screen on all your nodes. If you have things set up well, you should be able to do this as root:

rocks run host “yum -y install screen”

Go into your headnode directory and start your rocks client “./fah6” and answer the configuration questions. Once you get it actually processing a work unit, you can stop it with a control-c.

At this point, copy the client.cfg file from your headnode directory to all the compute node directories.

Now, back in the headnode directory, “screen -d -m ./fah6” which will start your folding at home client in a detached screen session and leave it running.

Now your are ready to start it up like that in your compute nodes too:

for name in compute*
do
echo “Killing $name”
ssh $name killall screen
echo “Restarting $name”
ssh $name “cd $name ; screen -d -m ./fah6”
done

And you can also use that script to periodically stop/restart (or just start again) FAH on your compute nodes as FAH will sometimes hang. I normally run this to restart FAH every couple weeks just to keep things going. Also do jump in occasionally and “screen -x” to look and see if there needs to be an updated client installed occasionally. Either way, this will eat up your spare cpu cycles and make use of your cluster while you learn on it and figure out what else to do with it. It’s also a lot of fun and you can help study/cure diseases too.

Leave a Reply

You must be logged in to post a comment.