Using the job scripts to set up and submit a model run

The Linux desktop job script (tiegcm-linux.job)

Take a look at the default Linux job script tiegcm-linux.job. Near the top are several shell variables, with their default settings, which configure the job script (variables and values may vary between model versions):

set modeldir = tiegcm_trunk
set execdir  = /hao/aim/$user/tiegcm_trunk/tiegcm.exec
set tgcmdata = /hao/aim/tgcm/data/tiegcm2.0
set input    = $modeldir/scripts/tiegcm_res5.0_default.inp
set output   = tiegcm.out
set make     = Make.intel_hao64
set modelres = 5.0
set mpi      = TRUE  # must be TRUE for tiegcm2.0 and later
set nproc    = 4
set debug    = FALSE
set exec     = TRUE
set utildir  = $modeldir/scripts

Following are brief explanations of the job script shell variables:

Note

Absolute or relative paths are acceptable when specifying directories. Relative paths should be relative to the working directory (workdir). In practice, modeldir is usually relative to the working directory, and execdir and tgcmdata are usually absolute paths.

modeldir

The model root directory (modeldir from the source code download). The example above assumes the user has checked out the trunk revision as “tiegcm_trunk”. This directory contains subdirectories src/ , scripts/ , doc/ , tgcmrun/, and benchmarks/.

execdir

This is the execution directory (execdir), in which the model will be built and executed. It will be created if it does not already exist. It is typically on a large temporary disk. This directory will also contain the model output netCDF history files (see also NetCDF History Output Files)

tgcmdata

Directory containing startup history files and data files for model input. It is normally on a large temporary disk. These files are available from the data download tar file (separate downloads for each model resolution). Note that setting tgcmdata in the job script is optional: if it is specified, it will override any setting of the TGCMDATA environment variable. If it is not specified, the job script will use the TGCMDATA environment variable. If neither are set, tgcmdata will default to the current working directory.

input

The namelist input file. The default namelist file is in the scripts directory under the model root with file name tiegcm_res5.0_default.inp (for 5-degree resolution), or tiegcm_res2.5_default.inp (for 2.5-degree resolution). The default input file can be copied to the working directory, modified, and renamed for your own runs. In that case, be sure to reset the input file in the job script.

make

Make file containing platform-specific compiler flags, library locations, etc. If not otherwise specified with a path, the job script will look for this file in the scripts/ directory. This file is included in the main Makefile (scripts/Makefile). The user can either make necessary adjustments to an existing make file, or write their own for a different platform/compiler system.

There are three such makefiles available in the scripts/ directory for the Linux desktop platform:

You will need to set the paths to your local netCDF and ESMF libraries in these makefiles.

output

Name of the file to receive stdout output from the model. If this pre-exists, it will be overwritten when the model is executed. Here is an example stdout file from the root mpi task of a 4-processor run (5-degree resolution) on a Linux desktop machine: tiegcm_task0000.out

mpi

Logical flag indicating whether or not to link the MPI library for a multi-processor parallel run.

Warning

For tiegcm versions 2.0 and later, non-MPI runs (mpi=FALSE) are NOT supported. However, mpi runs (mpi=TRUE) with a single processor (nproc=1) ARE supported.

nproc

Number of processors to use in parallel execution. This will be the number of MPI tasks made available for the domain decomposition. On linux desktops, this is typically 4. For tiegcm on linux supercomputer clusters (e.g., the NCAR yellowstone system, where there are 16 processors per node), the recommended number is 16 for 5.0-degree resolution, or 64 for 2.5-degree resolution. For debug purposes, nproc=1 is supported. The models have been tested with the following processor counts: 1,4,8,12,16,24,32,48,64,72,80. See performance table for performance estimates at recommended processor counts and timesteps.

modelres
Model resolution. Two resolutions are supported:
  • modelres = 5.0 sets 5-degree lat x lon horizontal, and dz=0.50 vertical
  • modelres = 2.5 sets 2.5-degree lat x lon horizontal, and dz=0.25 vertical

If the resolution is changed, the model should be recompiled before re-executing the job script (type “gmake clean” in the execdir.

For more information, see Grid Structure and Resolution.

debug

If debug = TRUE, the job script will compile the build with debug flags set. Debug flags specific to the compiler are set in the make file. If debug is changed, the code should be recompiled (type “gmake clean” in the execdir before re-executing the job script).

exec

If exec = TRUE, the job script will execute the model after compilation, otherwise, the job script will stop after compilation without execution.

utildir

The utility directory containing supporting scripts. This is normally the scripts/ subdirectory in the model root directory modeldir

You are now ready to build and execute a default run. To do this, simply execute the job script as follows:

$ tiegcm-linux.job &

The compilation output will be displayed. If the build is successful (and exec=TRUE), the model will be executed, and stdout will go to the specified output file. If the job is successful, you can edit and rename the namelist input file, reset namelist input file in the job script, and re-execute the job script. If there has been no change to the source code, it will not need to recompile, and will use the pre-existing executable.

The yellowstone supercomputer job script (tiegcm-ys.job)

Note

This section contains information that is specific to user’s of the NCAR Linux Supercomputer yellowstone:

Linux yslogin3 2.6.32-358.el6.x86_64 #1 SMP
Tue Jan 29 11:47:41 EST 2013 x86_64 x86_64 x86_64 GNU/Linux

For more information about the NCAR yellowstone system, see https://www2.cisl.ucar.edu/resources/yellowstone

The model can be built and executed on yellowstone using the Intel compiler and the intelmpi implementation. To do this, copy and modify the job script tiegcm-ys.job from the scripts directory.

The yellowstone job script tiegcm-ys.job has the same user-settable shell variables as the Linux job script, but the default settings are slightly different:

set modeldir = tiegcm_trunk
set execdir  = /glade/scratch/$user/tiegcm_trunk/tiegcm.exec
set tgcmdata = /glade/p/hao/tgcm/data/tiegcm2.0
set input    = $modeldir/scripts/tiegcm_res5.0_default.inp
set output   = tiegcm.out
set modelres = 5.0
set make     = Make.intel_ys
set mpi       = TRUE   # must be TRUE for tiegcm2.0 and later
set debug     = FALSE
set exec      = TRUE
set utildir   = $modeldir/scripts
set runscript = run.lsf

In this example, it is assumed the user has checked-out the trunk revision as “tiegcm_trunk”. Note the execdir name, and the make file Make.intel_ys. The model resolution in this case is 5.0 degrees.

Also note the special “#BSUB” directives near the top of the yellowstone job script (descriptions in the right-hand column are for this document only, and are not in the script itself):

#BSUB -J tiegcm                # job name
#BSUB -P P28100036             # authorized project number
#BSUB -q premium               # premium queue
#BSUB -o tiegcm.%J.out         # stdout file
#BSUB -e tiegcm.%J.out         # stderr file
#BSUB -N
#BSUB -u $LOGNAME@ucar.edu     # email notification address
#BSUB -W 1:00                  # wallclock limit hours:minutes
#BSUB -n 16                    # number of processors (mpi tasks)
#BSUB -R "span[ptile=16]"      # use 16 processors per node

These are resource settings for the Load Sharing Facility (LSF), the batch queuing system sold by Platform Computing. The LSF is used for scheduling jobs on the yellowstone system at NCAR. This job will be submitted to the premium queue command, requesting 16 processors with a wallclock limit of 1 hour.

To submit the yellowstone job, simply execute the job script on the command line. It will build the model on the interactive node, and if successful, the runscript (run.lsf by default) will be created and submitted to the LSF via the bsub command.

Watch the progress of your LSF job with the command:

$ bjobs

You can kill a LSF job with this command:

$ bkill job_ID

Where job_ID is the job identifier given in the bjobs command.

For more information about the LSF, see the Wikipedia site:

http://en.wikipedia.org/wiki/Platform_LSF

or the Platform Computing site:

http://www.platform.com/workload-management/high-performance-computing/lp