Running Parallel Jobs on the Cluster

From NBSWiki

Jump to: navigation, search

Contents

Introduction

This article is a short step-by-step sequence of instructions to follow to get parallel MPI tasks to run off the cluster built according to the article Howto Build a Basic Gentoo Beowulf Cluster. At the moment, this article is centered on executing parallel tasks using LAM-MPI version 7.1.1.

Things to know

  • Ideally, we would be using Maui and Torque but technical issues are preventing their use for the moment. So please be courteous and make sure you don't overlap your tasks with other's.
  • All tasks such as compilation and running MPI apps _must_ run from the nodes. The head node is a 64 bit machine but the slave nodes are 32 bits. We are presently unable to run the parallel tasks in an heterogeneous environment. If you do find a way that you know works, do modify this article on how to accomplish this!

Monitoring the Cluster

Ganglia has been installed on the head node so that you can see what is happening. This is the direct link to the monitoring page, it will come in handy since we don't have a PBS (Parallel Batch Scheduler) installed at the moment.

Executing Parallel Tasks

There are two steps to running a parallel task (apart from coding and debugging your code). These are Compilation and Execution of the task. The following is a working example with step-by-step instructions.

Logging onto the nodes

  • Log onto the head node using SSH (-Y is to redirect graphical components if required)
ssh username@142.137.135.124 -Y

If you haven't done so already, follow the instructions on enabling passwordless SSH logon

  • Log onto one of the nodes
ssh username@thinkbig1 -Y

To check for available nodes (not in use), open the Ganglia web page.

Compilation

Once on the node, you compile and start LAM-MPI applications as you would usually. However, make sure your Makefile doesn't contain old references to the LAM installation from the old cluster (things such as /thinkbig/lam/include should not be there). Here's a complete example of the command line sequence:

ssh eric@142.137.135.124
eric@headless ~ $ ssh thinkbig21
Last login: Fri Aug 18 11:43:02 2006 from gw-02.cluster.local

eric@thinkbig21 ~ $ cd doc_eulanda/

eric@thinkbig21 ~/doc_eulanda $ make clean
rm -f ./obj/gamono.o\
 ./obj/cgapub.o ./obj/baseclassifier.o ./obj/basicrandom.o ./obj/chrono.o ./obj/distancestrategy.o ./obj/euclidiandistance.o ./obj/featuresvector.o ./obj/knnclassifier.o ./obj/randomstrategy.o ./obj/rsknnclassifier.o ./obj/prevote.o ./obj/entropydiversity.o ./obj/qaveragediversity.o ./obj/double_fault.o ./obj/correlation_coefficient.o ./obj/difficultydiversity.o ./obj/disagreement.o ./obj/interraterdiversity.o ./obj/faultmajoritydiversity.o ./obj/kohavi_wolpert.o ./obj/generalized.o ./obj/coincident_failure.o ./obj/margin.o

eric@thinkbig21 ~/doc_eulanda $ make
mpiCC -I. -I/export/home/eric/doc_eulanda/include -I -Wno-deprecated -c -o ./obj/gamono.o ./ga.cpp
In file included from /usr/lib/gcc/i686-pc-linux-gnu/4.1.1/include/g++-v4/backward/iostream.h:31,
 from ./ga.cpp:5:
/usr/lib/gcc/i686-pc-linux-gnu/4.1.1/include/g++-v4/backward/backward_warning.h:32:2: warning: #warning This file includes at least one deprecated or antiquated header. Please consider using one of the 32 headers found in section 17.4.1.2 of the C++ standard. Examples include substituting the <X> header for the <X.h> header for C++ includes, or <iostream> instead of the deprecated header <iostream.h>. To disable this warning use -Wno-deprecated.
mpiCC -I. -I/export/home/eric/doc_eulanda/include -I -Wno-deprecated -c -o ./obj/cgapub.o /export/home/eric/doc_eulanda/src/cgapub.cpp
/export/home/eric/doc_eulanda/src/cgapub.cpp:22:1: warning: "PREVOTEFILE" redefined
In file included from /export/home/eric/doc_eulanda/src/cgapub.cpp:3:
/export/home/eric/doc_eulanda/include/cgapub.hpp:42:1: warning: this is the location of the previous definition
mpiCC -I. -I/export/home/eric/doc_eulanda/include -I -Wno-deprecated -c -o ./obj/baseclassifier.o /export/home/eric/doc_eulanda/eocknn/baseclassifier.cpp
mpiCC -I. -I/export/home/eric/doc_eulanda/include -I -Wno-deprecated -c -o ./obj/basicrandom.o /export/home/eric/doc_eulanda/eocknn/basicrandom.cpp
mpiCC -I. -I/export/home/eric/doc_eulanda/include -I -Wno-deprecated -c -o ./obj/chrono.o /export/home/eric/doc_eulanda/eocknn/chrono.cpp
mpiCC -I. -I/export/home/eric/doc_eulanda/include -I -Wno-deprecated -c -o ./obj/distancestrategy.o /export/home/eric/doc_eulanda/eocknn/distancestrategy.cpp
mpiCC -I. -I/export/home/eric/doc_eulanda/include -I -Wno-deprecated -c -o ./obj/euclidiandistance.o /export/home/eric/doc_eulanda/eocknn/euclidiandistance.cpp
mpiCC -I. -I/export/home/eric/doc_eulanda/include -I -Wno-deprecated -c -o ./obj/featuresvector.o /export/home/eric/doc_eulanda/eocknn/featuresvector.cpp
mpiCC -I. -I/export/home/eric/doc_eulanda/include -I -Wno-deprecated -c -o ./obj/knnclassifier.o /export/home/eric/doc_eulanda/eocknn/knnclassifier.cpp
mpiCC -I. -I/export/home/eric/doc_eulanda/include -I -Wno-deprecated -c -o ./obj/randomstrategy.o /export/home/eric/doc_eulanda/eocknn/randomstrategy.cpp
mpiCC -I. -I/export/home/eric/doc_eulanda/include -I -Wno-deprecated -c -o ./obj/rsknnclassifier.o /export/home/eric/doc_eulanda/eocknn/rsknnclassifier.cpp
mpiCC -I. -I/export/home/eric/doc_eulanda/include -I -Wno-deprecated -c -o ./obj/prevote.o /export/home/eric/doc_eulanda/src/prevote.cpp
mpiCC -I. -I/export/home/eric/doc_eulanda/include -I -Wno-deprecated -c -o ./obj/entropydiversity.o /export/home/eric/doc_eulanda/src/entropydiversity.cpp
mpiCC -I. -I/export/home/eric/doc_eulanda/include -I -Wno-deprecated -c -o ./obj/qaveragediversity.o /export/home/eric/doc_eulanda/src/qaveragediversity.cpp
mpiCC -I. -I/export/home/eric/doc_eulanda/include -I -Wno-deprecated -c -o ./obj/double_fault.o /export/home/eric/doc_eulanda/src/double_fault.cpp
mpiCC -I. -I/export/home/eric/doc_eulanda/include -I -Wno-deprecated -c -o ./obj/correlation_coefficient.o /export/home/eric/doc_eulanda/src/correlation_coefficient.cpp
mpiCC -I. -I/export/home/eric/doc_eulanda/include -I -Wno-deprecated -c -o ./obj/difficultydiversity.o /export/home/eric/doc_eulanda/src/difficultydiversity.cpp
mpiCC -I. -I/export/home/eric/doc_eulanda/include -I -Wno-deprecated -c -o ./obj/disagreement.o /export/home/eric/doc_eulanda/src/disagreement.cpp
mpiCC -I. -I/export/home/eric/doc_eulanda/include -I -Wno-deprecated -c -o ./obj/interraterdiversity.o /export/home/eric/doc_eulanda/src/interraterdiversity.cpp
mpiCC -I. -I/export/home/eric/doc_eulanda/include -I -Wno-deprecated -c -o ./obj/faultmajoritydiversity.o /export/home/eric/doc_eulanda/src/faultmajoritydiversity.cpp
mpiCC -I. -I/export/home/eric/doc_eulanda/include -I -Wno-deprecated -c -o ./obj/kohavi_wolpert.o /export/home/eric/doc_eulanda/src/kohavi_wolpert.cpp
mpiCC -I. -I/export/home/eric/doc_eulanda/include -I -Wno-deprecated -c -o ./obj/generalized.o /export/home/eric/doc_eulanda/src/generalized.cpp
mpiCC -I. -I/export/home/eric/doc_eulanda/include -I -Wno-deprecated -c -o ./obj/coincident_failure.o /export/home/eric/doc_eulanda/src/coincident_failure.cpp
mpiCC -I. -I/export/home/eric/doc_eulanda/include -I -Wno-deprecated -c -o ./obj/margin.o /export/home/eric/doc_eulanda/src/margin.cpp
mpiCC -I. -I/export/home/eric/doc_eulanda/include -I -Wno-deprecated -L -L/usr/local/lib -o gamono ./obj/gamono.o ./obj/cgapub.o ./obj/baseclassifier.o ./obj/basicrandom.o ./obj/chrono.o ./obj/distancestrategy.o ./obj/euclidiandistance.o ./obj/featuresvector.o ./obj/knnclassifier.o ./obj/randomstrategy.o ./obj/rsknnclassifier.o ./obj/prevote.o ./obj/entropydiversity.o ./obj/qaveragediversity.o ./obj/double_fault.o ./obj/correlation_coefficient.o ./obj/difficultydiversity.o ./obj/disagreement.o ./obj/interraterdiversity.o ./obj/faultmajoritydiversity.o ./obj/kohavi_wolpert.o ./obj/generalized.o ./obj/coincident_failure.o ./obj/margin.o -lm -lmpi -lgsl -lgslcblas

Starting LAM-MPI

eric@thinkbig21 ~/doc_eulanda $ lamboot ~/thinknodes
LAM 7.1.2/MPI 2 C++/ROMIO - Indiana University

Note that thinknodes contains a list of the nodes I want to use as such:

eric@thinkbig21 ~/doc_eulanda $cat ~/thinknodes
#thinkbig1
#thinkbig12
thinkbig13
thinkbig16
thinkbig17
thinkbig18
thinkbig19
thinkbig20
thinkbig21
#thinkbig22
thinkbig23
thinkbig24

Run the MPI application

eric@thinkbig21 ~/doc_eulanda $ mpirun -v -np 9 gamono nsga.init >/data/eric_nsgatest.out
eric@thinkbig21 ~/doc_eulanda $

Of course, make sure to stop your LAM daemons with lamhalt.

Personal tools