Running Parallel Jobs on the Cluster
From NBSWiki
Contents |
Introduction
This article is a short step-by-step sequence of instructions to follow to get parallel MPI tasks to run off the cluster built according to the article Howto Build a Basic Gentoo Beowulf Cluster. At the moment, this article is centered on executing parallel tasks using LAM-MPI version 7.1.1.
Things to know
- Ideally, we would be using Maui and Torque but technical issues are preventing their use for the moment. So please be courteous and make sure you don't overlap your tasks with other's.
- All tasks such as compilation and running MPI apps _must_ run from the nodes. The head node is a 64 bit machine but the slave nodes are 32 bits. We are presently unable to run the parallel tasks in an heterogeneous environment. If you do find a way that you know works, do modify this article on how to accomplish this!
Monitoring the Cluster
Ganglia has been installed on the head node so that you can see what is happening. This is the direct link to the monitoring page, it will come in handy since we don't have a PBS (Parallel Batch Scheduler) installed at the moment.
Executing Parallel Tasks
There are two steps to running a parallel task (apart from coding and debugging your code). These are Compilation and Execution of the task. The following is a working example with step-by-step instructions.
Logging onto the nodes
- Log onto the head node using SSH (-Y is to redirect graphical components if required)
ssh username@142.137.135.124 -Y
If you haven't done so already, follow the instructions on enabling passwordless SSH logon
- Log onto one of the nodes
ssh username@thinkbig1 -Y
To check for available nodes (not in use), open the Ganglia web page.
Compilation
Once on the node, you compile and start LAM-MPI applications as you would usually. However, make sure your Makefile doesn't contain old references to the LAM installation from the old cluster (things such as /thinkbig/lam/include should not be there). Here's a complete example of the command line sequence:
ssh eric@142.137.135.124 eric@headless ~ $ ssh thinkbig21 Last login: Fri Aug 18 11:43:02 2006 from gw-02.cluster.local eric@thinkbig21 ~ $ cd doc_eulanda/ eric@thinkbig21 ~/doc_eulanda $ make clean rm -f ./obj/gamono.o\ ./obj/cgapub.o ./obj/baseclassifier.o ./obj/basicrandom.o ./obj/chrono.o ./obj/distancestrategy.o ./obj/euclidiandistance.o ./obj/featuresvector.o ./obj/knnclassifier.o ./obj/randomstrategy.o ./obj/rsknnclassifier.o ./obj/prevote.o ./obj/entropydiversity.o ./obj/qaveragediversity.o ./obj/double_fault.o ./obj/correlation_coefficient.o ./obj/difficultydiversity.o ./obj/disagreement.o ./obj/interraterdiversity.o ./obj/faultmajoritydiversity.o ./obj/kohavi_wolpert.o ./obj/generalized.o ./obj/coincident_failure.o ./obj/margin.o eric@thinkbig21 ~/doc_eulanda $ make mpiCC -I. -I/export/home/eric/doc_eulanda/include -I -Wno-deprecated -c -o ./obj/gamono.o ./ga.cpp In file included from /usr/lib/gcc/i686-pc-linux-gnu/4.1.1/include/g++-v4/backward/iostream.h:31, from ./ga.cpp:5: /usr/lib/gcc/i686-pc-linux-gnu/4.1.1/include/g++-v4/backward/backward_warning.h:32:2: warning: #warning This file includes at least one deprecated or antiquated header. Please consider using one of the 32 headers found in section 17.4.1.2 of the C++ standard. Examples include substituting the <X> header for the <X.h> header for C++ includes, or <iostream> instead of the deprecated header <iostream.h>. To disable this warning use -Wno-deprecated. mpiCC -I. -I/export/home/eric/doc_eulanda/include -I -Wno-deprecated -c -o ./obj/cgapub.o /export/home/eric/doc_eulanda/src/cgapub.cpp /export/home/eric/doc_eulanda/src/cgapub.cpp:22:1: warning: "PREVOTEFILE" redefined In file included from /export/home/eric/doc_eulanda/src/cgapub.cpp:3: /export/home/eric/doc_eulanda/include/cgapub.hpp:42:1: warning: this is the location of the previous definition mpiCC -I. -I/export/home/eric/doc_eulanda/include -I -Wno-deprecated -c -o ./obj/baseclassifier.o /export/home/eric/doc_eulanda/eocknn/baseclassifier.cpp mpiCC -I. -I/export/home/eric/doc_eulanda/include -I -Wno-deprecated -c -o ./obj/basicrandom.o /export/home/eric/doc_eulanda/eocknn/basicrandom.cpp mpiCC -I. -I/export/home/eric/doc_eulanda/include -I -Wno-deprecated -c -o ./obj/chrono.o /export/home/eric/doc_eulanda/eocknn/chrono.cpp mpiCC -I. -I/export/home/eric/doc_eulanda/include -I -Wno-deprecated -c -o ./obj/distancestrategy.o /export/home/eric/doc_eulanda/eocknn/distancestrategy.cpp mpiCC -I. -I/export/home/eric/doc_eulanda/include -I -Wno-deprecated -c -o ./obj/euclidiandistance.o /export/home/eric/doc_eulanda/eocknn/euclidiandistance.cpp mpiCC -I. -I/export/home/eric/doc_eulanda/include -I -Wno-deprecated -c -o ./obj/featuresvector.o /export/home/eric/doc_eulanda/eocknn/featuresvector.cpp mpiCC -I. -I/export/home/eric/doc_eulanda/include -I -Wno-deprecated -c -o ./obj/knnclassifier.o /export/home/eric/doc_eulanda/eocknn/knnclassifier.cpp mpiCC -I. -I/export/home/eric/doc_eulanda/include -I -Wno-deprecated -c -o ./obj/randomstrategy.o /export/home/eric/doc_eulanda/eocknn/randomstrategy.cpp mpiCC -I. -I/export/home/eric/doc_eulanda/include -I -Wno-deprecated -c -o ./obj/rsknnclassifier.o /export/home/eric/doc_eulanda/eocknn/rsknnclassifier.cpp mpiCC -I. -I/export/home/eric/doc_eulanda/include -I -Wno-deprecated -c -o ./obj/prevote.o /export/home/eric/doc_eulanda/src/prevote.cpp mpiCC -I. -I/export/home/eric/doc_eulanda/include -I -Wno-deprecated -c -o ./obj/entropydiversity.o /export/home/eric/doc_eulanda/src/entropydiversity.cpp mpiCC -I. -I/export/home/eric/doc_eulanda/include -I -Wno-deprecated -c -o ./obj/qaveragediversity.o /export/home/eric/doc_eulanda/src/qaveragediversity.cpp mpiCC -I. -I/export/home/eric/doc_eulanda/include -I -Wno-deprecated -c -o ./obj/double_fault.o /export/home/eric/doc_eulanda/src/double_fault.cpp mpiCC -I. -I/export/home/eric/doc_eulanda/include -I -Wno-deprecated -c -o ./obj/correlation_coefficient.o /export/home/eric/doc_eulanda/src/correlation_coefficient.cpp mpiCC -I. -I/export/home/eric/doc_eulanda/include -I -Wno-deprecated -c -o ./obj/difficultydiversity.o /export/home/eric/doc_eulanda/src/difficultydiversity.cpp mpiCC -I. -I/export/home/eric/doc_eulanda/include -I -Wno-deprecated -c -o ./obj/disagreement.o /export/home/eric/doc_eulanda/src/disagreement.cpp mpiCC -I. -I/export/home/eric/doc_eulanda/include -I -Wno-deprecated -c -o ./obj/interraterdiversity.o /export/home/eric/doc_eulanda/src/interraterdiversity.cpp mpiCC -I. -I/export/home/eric/doc_eulanda/include -I -Wno-deprecated -c -o ./obj/faultmajoritydiversity.o /export/home/eric/doc_eulanda/src/faultmajoritydiversity.cpp mpiCC -I. -I/export/home/eric/doc_eulanda/include -I -Wno-deprecated -c -o ./obj/kohavi_wolpert.o /export/home/eric/doc_eulanda/src/kohavi_wolpert.cpp mpiCC -I. -I/export/home/eric/doc_eulanda/include -I -Wno-deprecated -c -o ./obj/generalized.o /export/home/eric/doc_eulanda/src/generalized.cpp mpiCC -I. -I/export/home/eric/doc_eulanda/include -I -Wno-deprecated -c -o ./obj/coincident_failure.o /export/home/eric/doc_eulanda/src/coincident_failure.cpp mpiCC -I. -I/export/home/eric/doc_eulanda/include -I -Wno-deprecated -c -o ./obj/margin.o /export/home/eric/doc_eulanda/src/margin.cpp mpiCC -I. -I/export/home/eric/doc_eulanda/include -I -Wno-deprecated -L -L/usr/local/lib -o gamono ./obj/gamono.o ./obj/cgapub.o ./obj/baseclassifier.o ./obj/basicrandom.o ./obj/chrono.o ./obj/distancestrategy.o ./obj/euclidiandistance.o ./obj/featuresvector.o ./obj/knnclassifier.o ./obj/randomstrategy.o ./obj/rsknnclassifier.o ./obj/prevote.o ./obj/entropydiversity.o ./obj/qaveragediversity.o ./obj/double_fault.o ./obj/correlation_coefficient.o ./obj/difficultydiversity.o ./obj/disagreement.o ./obj/interraterdiversity.o ./obj/faultmajoritydiversity.o ./obj/kohavi_wolpert.o ./obj/generalized.o ./obj/coincident_failure.o ./obj/margin.o -lm -lmpi -lgsl -lgslcblas
Starting LAM-MPI
eric@thinkbig21 ~/doc_eulanda $ lamboot ~/thinknodes LAM 7.1.2/MPI 2 C++/ROMIO - Indiana University
Note that thinknodes contains a list of the nodes I want to use as such:
eric@thinkbig21 ~/doc_eulanda $cat ~/thinknodes #thinkbig1 #thinkbig12 thinkbig13 thinkbig16 thinkbig17 thinkbig18 thinkbig19 thinkbig20 thinkbig21 #thinkbig22 thinkbig23 thinkbig24
Run the MPI application
eric@thinkbig21 ~/doc_eulanda $ mpirun -v -np 9 gamono nsga.init >/data/eric_nsgatest.out eric@thinkbig21 ~/doc_eulanda $
Of course, make sure to stop your LAM daemons with lamhalt.
