on Friday, February 12, 2010 by Martin in Computer Tech, Linux, Comments (0) Print
Use TORQUE Resource Manager on Fedora 12
TORQUE is an open source resource manager providing control over batch jobs and distributed compute nodes. It is a community effort based on the original *PBS project and, with more than 1,200 patches, has incorporated significant advances in the areas of scalability, fault tolerance, and feature extensions contributed by NCSA, OSC, USC , the U.S. Dept of Energy, Sandia, PNNL, U of Buffalo, TeraGrid, and many other leading edge HPC organizations.[1]
1. Install TORQUE packages
$sudo yum install torque* libtorque
Configure TORQUE on the server
$cd /usr/share/doc/torque-2.1.10/ $sudo vi torque.setup
change the lines
qmgr -c 'create queue batch' qmgr -c 'set queue batch queue_type = execution' qmgr -c 'set queue batch started = true' qmgr -c 'set queue batch enabled = true' qmgr -c 'set queue batch resources_default.walltime = 1:00:00' qmgr -c 'set queue batch resources_default.nodes = 1' qmgr -c 'set server default_queue = batch'
as
qmgr -c 'create queue batch' qmgr -c 'set queue batch queue_type = execution' qmgr -c 'set queue batch started = true' qmgr -c 'set queue batch enabled = true' qmgr -c 'set queue batch resources_default.walltime = 72:00:00' # walltime = 72:00:00 means that every job has 72 hours to execute as default qmgr -c 'set queue batch resources_default.nodes = 1' qmgr -c 'set queue batch max_running = 2' # max_running = 2 means there are two jobs running at any time qmgr -c 'set queue batch max_user_run = 5' # max_user_run meas there are five jobs in the queue qmgr -c 'set server default_queue = batch'
then execute it as
$sudo ./torque.setup root
for root as the administrator.
3. setting the server nodes
the default TORQUE configuration folder on Fedora 12 is /var/torque
make a file server_priv/nodes like this
node01 np=2
node01 is your hostname, np=2 means 2 processors on the node
4. Initialize/Configure TORQUE on Each Compute Node
make a file mom_priv/torque.cfg like this
$pbsserver localhost # note: hostname running pbs_server $logevent 255 # bitmap of which events to log
5. Start the daemon service
$sudo chkconfig pbs_mom on $sudo chkconfig pbs_sched on $sudo chkconfig pbs_server on
6. Test service configuration
verify all nodes are correctly reporting
$pbsnodes -a
view additional service configuration
$qmgr -c 'p s'
Finally, you finish the settings so that you want to work on it. Submitting a job in the queue is to use command qsub
$qsub batchjob
the batchjob is a file containing some settings and command lines.
However, this is a simple configuration to use TORQUE on Fedora 12. A detailed configuration is on the site clusterresources.com
References
[1] http://www.clusterresources.com/products/torque-resource-manager.php
[2] ClusterResources. TORQUE Administrator’s Guide. v2.3
No Comments
Leave a comment