Archive for the ‘RedHat’ Category

Rocks burn-in

fah-on-rocks-sm
The other day I was talking about how to install Rocks Cluster. Well, today I’ll give you indication on how to test it out a bit. Now this is surely not the *proper* way to test the cluster out, which would be to run some fancy cluster-aware graphics rendering application or something of the sort, but this will put something on there and make it churn out some cpu cycles just to see how things look.

What I like to use for this task is Folding At Home, which is a protein folding program (hey, help cure diseases and stuff, right). You can get things ready by downloading the appropriate version of the client for your machine(s) from the download section. The current one that I am using is the Linux version 6.24 Beta.

Log on to your cluster and create a directory for each node that you want to run the FAH client on. If you only have a couple,. it’s easy to just do that by hand, if not, you can use this simple script:

#!/bin/bash
rockslist=$(rocks list host | grep ‘:’ | cut -d':’ -f1)
for name in $rockslist
do
mkdir -p $name
done

From there, extract your FAH client file you just downloaded into your headnode directory. Tip: you headnode directory will be named something *other* than compute-?-?. Take the fah5 and mpiexec files from there and copy them to all your compute-?-? directories.

This should really get better instruction, but you’ll want to install screen on all your nodes. If you have things set up well, you should be able to do this as root:

rocks run host “yum -y install screen”

Go into your headnode directory and start your rocks client “./fah6″ and answer the configuration questions. Once you get it actually processing a work unit, you can stop it with a control-c.

At this point, copy the client.cfg file from your headnode directory to all the compute node directories.

Now, back in the headnode directory, “screen -d -m ./fah6″ which will start your folding at home client in a detached screen session and leave it running.

Now your are ready to start it up like that in your compute nodes too:

for name in compute*
do
echo “Killing $name”
ssh $name killall screen
echo “Restarting $name”
ssh $name “cd $name ; screen -d -m ./fah6″
done

And you can also use that script to periodically stop/restart (or just start again) FAH on your compute nodes as FAH will sometimes hang. I normally run this to restart FAH every couple weeks just to keep things going. Also do jump in occasionally and “screen -x” to look and see if there needs to be an updated client installed occasionally. Either way, this will eat up your spare cpu cycles and make use of your cluster while you learn on it and figure out what else to do with it. It’s also a lot of fun and you can help study/cure diseases too.

Friday, November 20th, 2009

Throw some Rocks at it!

ganglia
One of the parts of my day job is dealing with and managing our HPC cluster. This is an 8 node Rocks cluster that was installed maybe a week after I started. Now I was a bit green still at that point and failed to get a better grasp on some things at the time, like how to maintain and upgrade the thing, and I have recently been paying for that :-)

Apparently, the install we have doesn’t have a clear-cut way to do errata and bug fixes. It was an early version of the cluster software. Well, after some heated discussions with our Dell rep about this, I decided what I really needed to do was a bit of research to see what the deal really was and if I could get us upgraded to something a bit better and more current.

Along came my June 2009 issue of The Linux Journal which just happened to have a GREAT article in it about installing your very own Rocks Cluster (YAY!). Well, I hung on to that issue with the full intention of setting up a development/testing cluster when I had the chance. And that chance came just the other day.

Some of you probably don’t have a copy of the article, and I needed to do some things a bit different anyhow, so I am going to try and summarize here what I did to get my new dev cluster going.

Now what I needed is probably a little different that what most people will, so you will have to adjust things accordingly and I’ll try and mention the differences as I go along where I can. First off, I needed to run the cluster on RedHat proper and not CentOS, which is much easier to get going. I also am running my entire dev cluster virtually on an ESX box and most of you would be doing this with physical hardware.

To start things off I headed over to The Rocks CLuster website where I went to the download section and then to the page for Rocks 5.2 (Chimichanga) for Linux. At this point, those of you who do not need specifically RedHat should pick the appropriate version of the Jumbo DVD (either 32 or 64 bit). What I did was to grab the iso’s for the Kernel and Core Rolls. Those 2 cd images plus my dvd image for RHEL 5.4 are the equivalent to your one Jumbo DVD iso on the website that uses CentOS as the default Linux install.

Now at this point, you can follow the installation docs there (which are maybe *slightly* outdated(?), or just follow here as the install is pretty simple really. You will need a head node and one or more cluster nodes for your cluster. Your head node should have 2 interfaces and each cluster node 1 network interface. The idea here is that your head node will be the only node of your cluster that is directly accessible on your local area network and that head node will communicate on a separate private network with the cluster nodes. With 2 interfaces, plug your eth0 interface on all nodes, head and cluster into a separate switch and plug eth1 of your head node into your LAN. Turn on your head node and boot it up from the Jumbo DVD, or in the case of the RHEL people, from the Kernel cd.

The Rocks installer is really quite simple. Enter “build” at the welcome screen. Soon you will be at the configuration screen. There you will choose the “CD/DVD Based Rolls” selection where you can pick from your rolls and such. I chose everything except the Sun specific stuff (descriptions on which Rolls do what are in the download section). Since I was using RHEL instead of CentOS on the jumbo dvd, I had to push that “CD/DVD” button once per cd/dvd and select what I needed from each one.

Once the selections were made it asks you for information about the cluster. Only the FQDN and Cluster name are really necessary. After that you are given the chance to configure your public (lan) and private network settings, your root password, time zone and disk partitioning. My best advice here would be to go with default where possible although I did change my private network address settings and they worked perfectly. Letting the partitioner handle your disk partitioning is probably best too.

A quick note about disk space: If you are going to have a lot of disk space anywhere, it’s best on the head node as that space will be put in a partition that will be shared between compute nodes. Also, each node should have at least 30gb of hdd space to get the install done correctly. I tried with 16gb on one compute node and the install failed!

After all that (which really is not much at all), you just sit back and wait for your install to complete. After completion the install docs tell you to wait a few minutes for all the post install configs (behind the scenes I guess) to finish up before logging in.

Once you are at that point and logged into your head node, it is absolutely trivial to get a compute node running. First, from the command line on your head node, run “insert-ethers” and select “Compute”. Then, power on your compute node (do one at a time) and make sure it’s set to network boot (PXE). You will see the mac address and compute node name pop up on your insert-ethers screen and shortly thereafter your node will install itself from the head node, reboot and you’ll be rockin’ and rollin’!

Once your nodes are going, you can get to that shared drive space on /state/partition1. You can run commands on the hosts by doing “rocks run host uptime”, which would give you an uptime on all the hosts in the cluster. “rocks help” will help you out with more commands. You can ssh into any one of the nodes by simply doing “ssh compute-0-1″ or whichever node you want.

Now the only problem I have encountered so far is I had an issue with a compute node that didn’t want to install correctly (probably because I was impatient). I tried reinstalling it and it and somehow got a new nodename from insert-ethers. In order to delete my bad info in the node database that insert-ethers maintains I needed to do a “rocks remove host compute-0-1″ and then “rocks sync config” before I was able to make a new compute-0-1 node.

So now you and I have a functional cluster. What do you do with it? Well, you can do anything on there that requires the horsepower of multiple computers. Some things come to mind like graphics rendering and there are programs and instructions on the web on how to do those. I ran folding at home on mine. With a simple shell script I was able to setup and start folding at home on all my nodes. You could probably do most anything the same way. If any of you find something fantastic you like to run on your cluster, be sure to pass it along and let us know!

Friday, November 13th, 2009

Nagios

Even though I wrote and use OSM I also use Nagios at work (along with OSM). Actually, I administer Nagios there, however I have never actually installed and configured it. It was in place before I started there.

That being said, my manager asked me how to get it installed and running today, as he wants to try using it at home. This sort of spurred me into setting it up at home tonight. It’s really nice having a server that can handle a few test VMs, by the way :-)

I decided I would install it on CentOS, because I need to be able to get it running on RedHat for work, so off to Google I went. After a bit of searching I finally came across a WONDERFUL site which provides a quick and dirty script for getting Nagios installed and working lickety split. It works perfectly and the only adjustment I made to the script, other than changing the passwords in it, was to comment out the SELinux lines because I already have SELinux disabled.

That really was it. Pretty simple. Of course the rip here is actually getting Nagios to monitor your systems, and that is probably beyond the scope of this post, which was really meant as a reference for that install script. Configuring nagios by the command line is not for the faint of heart. The files you need to pay attention to end up in /usr/local/nagios/etc and /usr/local/nagios/etc/objects. Just keep in mind that the configs seem to reference eachother in a cyclical way and you really need to pay attention. I found a good starter-help at the bottom of this website for adding your first non-local machine. Once you get that working you’ll understand how to add more, but I still found it a bit of a frustrating experience for a few minutes.

I did note, however, that there are quite a few projects out there which claim to configure Nagios for you via a web interface. I hope to give them a shot or two in the coming days/nights. Let me know if any of you have tried any and how they fair.

Monday, May 4th, 2009

Building an rpm to install script files

On an rpm based system, say CentOS, first make sure that the rpm-build package is installed.

In your user account, not as root (bad form and all) make the following directories:


mkdir -p ~/rpm
mkdir -p ~/rpm/BUILD
mkdir -p ~/rpm/RPMS
mkdir -p ~/rpm/SOURCES
mkdir -p ~/rpm/SPECS
mkdir -p ~/rpm/SRPMS
mkdir -p ~/rpm/tmp

And create an ~/.rpmmacros file with the following in it:


%packager Your Name
%_topdir /home/YOUR HOME DIR/rpm
%_tmppath /home/YOUR HOME DIR/rpm/tmp

And now comes the fun part. Go to the ~/rpm/SOURCES directory and create a working package directory under that with the package name and a dash and the major revision number. For example, ~/rpm/SOURCES/linc-1. Now in that directory you will copy all the scripts/files that you wish to have in your package. For example, I might have a script in that directory called myscript.sh that I want to be installed as part of the linc package.

Once that is done, make a tarball of that directory in the ~/rpm/SOURCES directory named programname-revision.tar.gz. Using my previous example it would be:

tar czvf linc-1.tar.gz linc-1/

Now for the glue that makes this all stick together. Go to your ~/rpm/SPECS directory and create a spec file for your package. We’ll call mine linc.spec and it’ll look like this:


Summary: My first rpm script package
Name: linc
Version: 1
Release: 1
Source0: linc-1.tar.gz
License: GPL
Group: MyJunk
BuildArch: noarch
BuildRoot: %{_tmppath}/%{name}-buildroot
%description
Make some relevant package description here
%prep
%setup -q
%build
%install
install -m 0755 -d $RPM_BUILD_ROOT/opt/linc
install -m 0755 myscript.sh $RPM_BUILD_ROOT/opt/linc/myscript.sh
%clean
rm -rf $RPM_BUILD_ROOT
%post
echo " "
echo "This will display after rpm installs the package!"
%files
%dir /opt/linc
/opt/linc/myscript.sh

A lot of that file is pretty self explanatory except then install lines and the lines after %file. The install lines tell rpm what to install where and with what permissions. You also have to do any directory creation there as well (the one with the -d in the line). The things after %file are similar in that this tells rpm’s database which files are attached to this package. The %dir signifies a new directory, otherwise the files are listed with their complete paths.

Now that you have all that together. The last thing you need do is create the package. Just go to ~/rpm and do an “rpmbuild -ba SPECS/linc.spec”. You will end up with an ~/rpm/RPMS/noarch/linc-1-1.noarch.rpm if all goes well.

Monday, February 16th, 2009

RPM help

OK, I must be doing something wrong here, so if you are familiar with building rpms and can help me out, please do!

I am trying to build an rpm which has (for the sake of discussion) a script file in it that I want to install. The first thing I did was to install the rpm-build package so I had the correct tools. Afterwards I made an rpm directory to contain my rpm build trials, and under that dir, other dirs of BUILD, RPMS, SOURCES, SPECS and SRPMS to house my code, etc., as required by the “rpmbuild” program.

I go to ~/rpm/SOURCES and make a dir myscript-1 and in that dir I place my script “myscript”. Back in ~/rpm/SOURCES I create the requisite tar file “tar czvf myfile-1.tar.gz myfile-1″.

Now, I flip over to ~/rpm/SPECS to create the spec file myscript.spec, which looks like so:

Summary: Lincs Myscript
Name: myscript
Version: 1
Release: 1
Source0: myscript-1.tar.gz
License: GPL
Group: LINC
%description
This script does things.
%prep
%setup -q
%build
%install
install -m 0755 myscript /usr/local/bin/myscript
%post
echo "HA, yur dun!"
%files
/usr/local/bin/myscript

Now I can go to ~/rpm and actually create the rpm by doing:
sudo rpmbuild -ba –target noarch SPECS/myscript.spec
and this will indeed make an installable rpm file of myscript-1-1.noarch.rpm in the ~/rpm/RPMS/noarch/ directory. All this is fine and what I want EXCEPT:

While building the rpm, the process seems to BUILD/INSTALL everything on the local system as well. This means that after the build, I end up with an /usr/local/bin/myscript even though the package has not been installed on my system. Now for my purposes now, it’s not that big of a deal, however, I am sure there are times that I will NOT want to have the package I am building installed on the same machine. There just has to be a way around this that I cannot find so far and it’s annoying me to no end. HELP! :-)

Wednesday, February 4th, 2009

Channel Bonding Update

Just a quick update. If you do channel bonding on your ethernet, there apparently is really no good way to tell the speed of your interface. I tried mii-tool to see what the interface looked like, but I was horrified to see it reported 10bt half duplex on 2 bonded 100bt full duplex connections. After a half hour of googling, I was no closer to an answer although I have found many references to not paying attention to anything mii-tool says about it ;-)

Monday, January 26th, 2009

RedHat Channel Bonding

There is a first tome for everything right? Well, today was my first time setting up ethernet channel bonding on RedHat; RHEL 5.3 x64 to be exact, but any RHEL 5 or CentOS 5 should be exactly the same.

I found a great tutorial at http://www.linuxtopia.org/online_books/rhel5/rhel5_administration/rhel5_s1-networkscripts-interfaces.html. I’ll repost the relevant bits here lest they become lost somehow:

in /etc/sysconfig/network-scripts/ifcfg-bond0
———————————–
DEVICE=bond0
BOOTPROTO=none
ONBOOT=yes
NETWORK=192.168.1.0
NETMASK=255.255.255.0
IPADDR=192.168.1.10
USERCTL=no
———————————–

And then for /etc/sysconfig/network-scripts/ifcfg-eth0
———————————–
DEVICE=eth0
BOOTPROTO=none
ONBOOT=yes
MASTER=bond0
SLAVE=yes
USERCTL=no

———————————–

And pretty much the same for /etc/sysconfig/network-scripts/ifcfg-eth1
———————————–
DEVICE=eth1
BOOTPROTO=none
ONBOOT=yes
MASTER=bond0
SLAVE=yes
USERCTL=no
———————————–

Lastly add to /etc/modprobe.conf
alias bond0 bonding

and reboot! TADA!

Monday, January 26th, 2009

Ossec insmod error

Let me preface this by saying that if you are not running Ossec on at least your external facing machines, then you should be. It’s great software!

The reason this post is here is for reference mostly and maybe to be able to help someone out later via their favorite search engine.

I have been getting a couple errors reported lately through Ossec emails that report: “insmod: Hint: insmod errors can be caused by incorrect module parameters, including invalid IO or IRQ parameters. You may find more information in syslog or the output from dmesg”. Well, after checking, the actual error is found in /var/log/messages and is “floppy.o: init_module: No such device”. AHA! Well, it just so happens that these machines are servers *with no floppy*. The fix for this to turn off the errors seems to be to add “alias floppy off” in the /etc/modules.conf file and then run a “depmod -a”.

Tuesday, October 21st, 2008

Linux training

I am sure some of you have been wondering why I recently dropped off the face of the earth. Well, my company sent me to RedHat training last week.

Now most of you know I have been “doing” Linux for a very long time. Some of you may recall that I used RedHt early on, however, I was disenchanted with them around the RedHat 6.0 (pre Enterprise) when they started messing with their compiler, etc. I switched to Slackware at that time and haven’t really used RH until a year ago when I was hired as a Linux admin in a primarily RH shop.

All that being said, I went to some intensive RedHat training last week and I have to say that not only did I learn an enormous amount, but after working that hard with RHEL, my opinions have definitely changed. RedHat has come a long way baby!

The primary problem with RedHat that I used to see was rpm. I absolutely hated to be stuck in rpm dependency hell, where you would try and install an rpm only to have it tell you that you needed to fill a dependency first, and then have that one tell you the same thing until you were just fed up with the whole process. Well, this has been addressed with RedHat’s adoption of YUM. Yum now takes care of dependency tracking and fulfillment similarly to apt-get.

Once I realized that hurdle was past, I started to appreciate the huge strides that they have put into getting their Linux product enterprise ready. There really is a lot of spit and polish that has gone into things since the last time I really looked under the RedHat hood. If you haven’t looked in a while, I encourage you to do so.

The thing I was particularly impressed with is the uniformity and ease of service installs. Now I know that many of you are used to installing things like bind and dhcp and apache and sendmail/postfix, etc., what have you, on lots of other linux platforms, but there really seemed to be a uniformity to all this under RedHat, and the initial configurations or supplied config files seemed to be saner somehow. Most notably to me was the difference in ease of install for bind or sendmail between RHEL5x and any recent Ubuntu release. It could be that I had the training manual in hand, but it just seemed more ready to go and easier to change the config if you had to.

The other thing I have really come to appreciate recently, partly because of my job, is the enterprise attention to securing the server. RHEL does a good job at this with asking you for information during the install to help you start out with a working firewall and SELinux set up and running. Now, while I still see SELinux as a huge pain in the behind, the fact is that it does do it’s job if you let it, and does it well.


And, since I had spent a week doing RHEL and deciding it really is a good distribution choice for servers, I wanted to see what I could do for home use. Now RHEL costs some money, and if you are a business, and maybe even personally, the price may be right for support and the use of the RHN (RedHat Network), but for me, I want something a bit more inexpensive. Yeah, I am cheap ;-)

Basically, there are 2 well known RedHat derivatives. The first is Fedora, which is a community distribution that RHEL is actually based on. Fedora is a lot more bleeding edge than the current RHEL, though, so in some instances, things just don’t match between the two. My personal criteria, however, is to be able to use something at home that is as similar as possible to what I use at work. For that, I turned to CentOS, which is a distribution that is compiled directly from the RHEL sources and rebadged.

I have done a couple installs, a lot of poking around, some direct comparisons with the RedHat manual in hand, and I can state that this certainly seems to be the case. Everything I have done over the past week has direct application to my CentOS server with the exception being the logos and color scheme (and I actually like CentOS’s better).

Now, I probably won’t be using CentOS for a desktop or workstation anytime soon. And I probably won’t be using it as the ONLY type of Linux server either, after all, I really still love Slackware, but chances are very good that I will be running CentOS at home somewhere and surely RedHat at work. If you’re looking for an enterprise level Linux server environment, you really owe it to yourself than to give one of them a try.

Verdict = It’s good stuff!

Saturday, June 7th, 2008