Frequently Asked Questions (FAQ)

How do I monitor the status of nodes on the LPL HPC?

There are three ways to monitor the overall status of the PACMAN or HiPAS cluster. Text Mode is the quickest.

Text Mode

If you do not have a DISPLAY redirected to your local machine you can monitor the cluster by typing beostatus -c. You will see a screen very similar to top that will automaniclly update the screen with information about the nodes.

GUI Mode

If you do have a display redirected you can use the graphical monitoring tool. You can acccess it by typing beostatus. After a short delay you will have a nice Colored GUI you can use to monitor various aspects of the cluster. You can change the style of the graphs by clicking on the Mode menu. You can exit the program by either closing the window, or clicking on the File menu and selecting Quit

Monitoring Job Status

Check the status of your job using qstat. Here's an example with output:

$ qsub myjob && watch qstat -n
 
master:
                                                            Req'd  Req'd   Elap
Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
15.hipas         bjosh   default  myjob         --    1  --    --  00:01 Q   --
    --

The watch command is used to execute the qstat -n command every 2 seconds by default. This will help you see the progression of events. Press Control-C to interrupt watch.

Some Helpful commands

Command Purpose
ps -ef | bpstat -P Display all running jobs, with node number for each.
qstat -Q Display status of all queues.
qstat -n Display status of queued jobs.
qstat -f JOBID Display very detailed information about JOBID.
qstat -Q -f Display status of all queues in more detail.
pbsnodes -a Display status of all nodes.

How to Find Which Nodes Your Job is Using

qstat -an 
Note your jobid(s).

qstat -f jobid 
Note the process id(s) of your job(s).

ps -ef | bpstat -P | grep yourname 
The number of the node running your job will be displayed in the first column of output.

Where To Find Job Output

When your job terminates, Torque will store its output and error streams in files in the script's work directory.

The output file is [JOBNAME].o[JOBID] by default. You can override that using the qsub -o PATH option.

The error file is [JOBNAME].e[JOBID] by default. You can override that using the qsub -e PATH option.

The qsub -j oe option can be used to join the output and error streams into a single file.