How to successfully build a Linux Compute Cluster


		Home
		Altreia News
		Seavus Project Viewer
		Survey of Linux Clusters in the German Industry
		Linux Cluster TCO Calculator

How to successfully build a Linux Compute Cluster

From our experience in different projects and research of Linux Compute Cluster (LCC) deployment we have compiled topics to look at when you are planning a successfull Linux compute cluster.

If you are building a Linux compute cluster you might want to check your design for the following topics as you are planning a multi-dimensional optimization project:

KISS: Keep it simple and stupid
Every component of your cluster which is not absolutely necessary and makes your cluster more complex should be avoided. Complexity will likely cause problems in the future. Careful planning of the cluster helps to avoid later obstacles.
Architecture of a Linux Compute Cluster

Among others the following topics are interesting when you look at designing the architecture of a Linux Cluster:
- "Pizzaboxes" or Blades
  In general it depends on the specific needs which you have for your cluster like flexibility in hardware selection, networking components, performance offers, vendor independence, cost structure, space considerations. All these factors have to be considered when looking at the decision what kind of hardware would be optimal to deploy.
- Head Node
  You might want to consider a head node for the cluster where master processes for the scheduling software can run, data management can take place, ... The head node usually would have some failsafe measures like mirrored disks (RAID 1), multiple network interfaces, multiple fans, ...
- Dedicated Network
  
  A dedicated network for the cluster including a switch is reasonable in cases where high and guaranteed network bandwith as well as failsafeness is desired. Of course in cases where high bandwidth and low latency are necessary for the computation specialized network technology (Infiniband, Myrinet, ...) should be considered.
Mean Time Between Failure (MTBF)

Every component in your cluster can fail. In a cluster with 128 nodes / 256 CPUs a failure is likely to occur every 3-4 weeks. Methods to cope with failures are
- Job Management / Scheduling software
  Introducing failsafe software which controls the availability of the nodes and schedules computation tasks on available nodes helps to automate the computation in your cluster. Scheduling middleware to look at are Lava/LSF, Sun Grid Engine, Torque/PBSPro.
  In the financial market in addition there are Symphony and DataSynapse software. In the LifeScience market United Devices provides scheduling software.
- Monitoring
  Monitoring the nodes of your cluster concerning e.g. availability, functioning of components like fans, disks, ... is important to understand the operation of the cluster. Tools to look at are Ganglia and Nagios.
- Spare nodes
  If you need to provide a dedicated amount of computation power you might want to look at providing spare nodes (additional nodes) which are automatically included in the computation by the scheduling software if necessary and can be used for general computation if they are not replacing a failed node if appropriate.
- Temperature
  The ambient temperature is essential for the components - so as lower the ambient temperature in combination with an efficient airflow is the longer the components will last.
Hardware: Latest and Greatest ?

Two topics are interesting to look at when buying state of the art hardware:
- Price / performance
  Usually the hardware one level below (processor speed, ...) the best performance hardware available provides the optimal price / performance.
- Linux compatibility
  The supported Linux versions and also the free distributions often don't provide optimal support for the newest hardware available. This is at least a good reason to check in detail about the support of the hardware by the targeted Linux distribution.
Administration Effort for a Linux Compute Cluster

The commodity hardware concept offers great potential to save money on the lowercost hardware. At the same time the administration effort usually is higher if you compare it to the effort necessary for traditional UNIX-hardware for the same amount of CPUs in SMP(shared memory)- or distributed systems.

Experience shows that the effort to administrate a Linux Compute Cluster ranges from 2 - 5 Hours / Node / Year using appropriate cluster administration tools depending on complexity, experience and size of the cluster. So a cluster of 128 nodes (256 CPUs) deserves from 256 - 640 hours / year administration effort.
Disks or Network boot
In cases where local discs are necessary for e.g. scratching it is reasonable to have a local installation of the operating system. If this is not the case it is sometimes a more philosophical question. Experience shows that for a bigger cluster network boot might become cumbersome. Local disks are of course an additional source of failures.
Cluster Administration Systems

There are a lot of Linux cluster administration systems available on the market with different approaches. As this is a chapter for itself we list selected cluster administration systems into which we have more detailed insight:
- ClusterKnoppix
  
  OpenSoftware
- CSM
  
  Commercial cluster administration software from IBM; one of the leading commercial tools.
- OSCAR
  
  OpenSoftware; one of the leading free cluster administration software tools
- ROCKS
  
  OpenSoftware; one of the leading free cluster administration software tools.
  Platform Computing offers an interesting bundle of ROCKS with additional software (compilers, ...) including a free version of their workload management software LSF called Lava; paid maintenance is available
- Scali Manage
  
  Commercial cluster administration software from Scali; one of the leading commercial tools.
- Warewulf
  
  OpenSoftware

Please let us know if a point is missing or if you have additional questions.

Remarks on Linux Compute Clusters

LANL's Mike Warren:
Example of a self-made Linux cluster

How the LANL built "The Space Simulator" - a 294 processor GigE cluster for 500.000 USD. More ...

DaimlerChrysler's John Picklo:
Introducing clusters we lose less jobs !
Compared to the job loss rate of 0,1% on SMP due to system failure DaimlerChrysler loses again half of that rate on their Linux and Itanium clusters. More ...

Linux Cluster TCO-Calculator available
With our Total Cost of Ownership-calculator (TCO) for Linux Compute Clusters you can calculate the distribution of costs (e.g. Hardware, Administration, Environm.) and do comparisons with your present system.