Zest | Slurm Support

Zest is the Syracuse University researching computing high-performance computing (HPC) cluster. Zest is a non-interactive Linux environment intended to run analyses that require extensive parallelism or running for extended durations.

IDEs and Development

Zest is not intended to be used as a development environment. Activities on the cluster should be limited to submitting jobs, doing light editing with a text editor such as nano or vim, and running small tests that use a single core for no more than a few minutes. The use of IDEs such as Jupyter, Spyder, VSCode, etc is prohibited as these programs can interfere with other users of the system or, in the worst case, impact the system as a whole. If you need a development environment please contact us and other more appropriate resources can be provided.

Looking for OrangeGrid? While similar, the Zest and OrangeGrid clusters are unique environments. Information about OrangeGrid is available on the OrangeGrid (OG) | HTCondor Support home page.

On This Page

1 Accessing Zest
2 Zest | Slurm Commands & Cluster Info
3 Submitting Jobs (with SBATCH Examples)
4 Advanced SBATCH Commands and Examples
5 Zest FAQ
6 Additional Zest Resources
7 Additional Research Computing Resources
8 Getting Help

Accessing Zest

To access Zest, simply make an SSH connection using your NetID and specifying the login node you have been assigned. Refer to the access email you received from Research Computing staff with your login node number. The cluster supports connection via CMD, programs like Putty, and the use of SCP and WinSCP.

Example SSH Connection

ssh netid@its-zest-loginX.syr.edu

Campus Wi-Fi or Off Campus?

Campus Wi-Fi and off-campus networks cannot access the cluster directly. Users off campus need to connect to campus via Remote Desktop Services (RDS). RDS will provide you with a Windows 10 desktop that has access to your G: drive, campus OneDrive, and the research clusters. Once connected, SSH and SCP are available from the Windows command prompt or you can use Putty and WinSCP which are also installed. Full instructions and details for connection to RDS are available on the RDS home page. Note that Azure VPN is an alternative option, but not available for all users. See the Azure VPN page for more details.

In rare cases where RDS is not an option, the research computing team may provide remote access via a bastion host.

Zest | Slurm Commands & Cluster Info

Slurm Commands

Once connected, below are some basic commands to get started.

# Show node information. User this to view available nodes and resources.
sinfo

# Show job queue.
squeue

# Display job accounting information.
sacct

# Submit job script.
sbatch [script name]

# Start an interactive job.
salloc [options][command]

# Launch non-MPI parallel job steps, usually runs in a SBATCH script.
srun [options][command]

# Launch MPI parallel job steps, usually runs in a SBATCH script (Be sure to load associated modules).
mpirun [options][command]

# Display running job status.
sstat [jobid]

# Cancel a job.
scancel [jobid]

Learn more basics with the Slurm Quick Start User Guide.

Zest Cluster Local Storage

Note the default local storage locations.

Resource	Description

Resource	Description
/home/NetID/	NFS based user home directory available throughout the cluster
/tmp/	Temporary fast local storage only persistent for the current job

Lmod Commands

Lmod is also available on the Zest clusters, examples below.

# Show all available modules.
module avail

# Load the module environment.
module load [name]

# Search for Module names matching string.
modulespider [string]

# Search module name or description.
module keyword [string]

# List currently loaded modules.
module list

# Unload a module from environment.
module unload [name]

# Remove all modules.
module purge

# Save currently loaded modules to collection name. 
module save [name]

# Shows all saved collections.
module savelist

# Restore modules from collection name. 
module restore [name]

# Display all Lmod options.
module help

Zest Cluster Partitions

Note the Zest cluster has multiple partitions including ones configured for CPU-intensive work, GPU utilization, and for those needing longer runtimes. Users can submit jobs to any partitions they feel meet their requirements. Below is a list of currently available partitions.

Partition	General Purpose	Max Runtime (Days)

Partition	General Purpose	Max Runtime (Days)
normal (default)	Designed for CPU-intensive workloads.	20
compute_zone2	Designed for CPU-intensive workloads.	20
longjobs	Designed for CPU-intensive workloads that require extended runtimes.	40
gpu	Tailored for GPU-heavy computations.	20
gpu_zone2	Tailored for GPU-heavy computations.	20

If no partitions are specified, the default partition will be used. The current default is the 'normal' partition. If one or more partitions are specified, only those will be considered to run that job.

To point a jobs to particular partitions, simply add them either individually or as a list to the submission file, examples below.

# Submit a job a single partition.
#SBATCH --partition=computer_zone2

# Submit a job to the CPU partitions. 
#SBATCH --partition=computer_zone2,normal

# Submit a job to the GPU partitions.
#SBATCH --partition=gpu_zone2,gpu

Submitting Jobs (with SBATCH Examples)

Submitting jobs on the Zest cluster requires the creation of an SBATCH script. Below are common examples including the use of MPI and GPUs.

Basic SBATCH Example

Below is a basic SBATCH example.

#!/bin/bash
#
#SBATCH --nodes=1 
#SBATCH --ntasks=3 
#SBATCH --cpus-per-task=1
#SBATCH --mail-type=ALL
#SBATCH --mail-user=netid@syr.edu # replace netid with your NetID 
#
# This runs hostname three times (tasks) on a single node 
#
srun hostname

Assuming the above is 'job1.sh', use the 'sbatch' command to submit the job as seen below.

netid@its-zest-login1:[~]$ sbatch job1.sh 
Submitted batch job 781
netid@its-zest-login1:[~]$ more slurm-781.out
node1002
node1002
node1002
netid@its-zest-login1:[~]$

Note that the default output for jobs will be located in slurm-{jobid}.out.

MPI SBATCH Example

Use mpirun for MPI. Note that you'll want to ensure you have the necessary modules loaded, either directly in the SBATCH file or in your ~./bash_profile. If a script requires a module for compiling, ensure it is loaded prior to compiling.

#!/bin/bash
#
#SBATCH --nodes=3
#SBATCH --ntasks-per-node=20 
#SBATCH --cpus-per-task=1 
#SBATCH --mail-type=ALL
#SBATCH --mail-user=netid@syr.edu
#
# Load required modules, run 'module avail' for latest
module load imb # Loads Intel InfiniBand Benchmarks
module load openmpi4/4.1.6 # Load the openmpi4 module 

mpirun IMB-MPI1

Example GPU worker interactive job

Use interactive shell allocation to compile and test applications

# This will start an interactive shell on a Supermicro GPU system
# using 20 CPUs and 2 GPUs. If the resource is open, you’ll get a shell on 
# a worker node. Otherwise, srun will hang until resource is available.

netid@its-zest-login1:[~]$ srun --pty -p geforce -c 20 --gres=gpu:2 bash 
[netid@node1024 ~]$ cp -Rp /usr/local/cuda/samples CUDA_SAMPLES 
[netid@node1024 ~]$ cd CUDA_SAMPLES
[netid@node1024 ~/CUDA_SAMPLES]$ make -j20 all 
[netid@node1024 ~/CUDA_SAMPLES]$ exit 
netid@its-zest-login1:[~]$

GPU SBATCH Example

Ensure you are using the gpu supported slurm partitions, gpu and gpu_zone2.

#!/bin/bash
#
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1 
#SBATCH --cpus-per-task=10 
#SBATCH --partition=gpu_zone2,gpu
#SBATCH --gres=gpu:2 
#SBATCH --mail-type=ALL
#SBATCH --mail-user=netid@syr.edu 
#
nvidia-smi

GPU SBATCH Using MPI Example

Note the use of mpirun with the MPI path.

#!/bin/bash 
#
#SBATCH --nodes=3
#SBATCH --ntasks-per-node=1 
#SBATCH --cpus-per-task=10 
#SBATCH --partition=gpu_zone2,gpu 
#SBATCH --gres=gpu:2 
#SBATCH --mail-type=ALL
#SBATCH --mail-user=netid@syr.edu 
#
module load cuda
module load openmpi4/4.1.6 # Load the openmpi4 module

mpirun /home/netid/CUDA_SAMPLES/bin/x86_64/linux/release/simpleMPI

Advanced SBATCH Commands and Examples

SBATCH files can include some additional parameters depending on your computational needs. Below are some of the more common advanced parameters as well as an example SBATCH file.

Advanced SBATCH Commands

Note that it is recommended that researchers specify job computing requirements rather than calling for specific models of equipment, such as a specific GPU, unless required by the code itself. This will ensure all nodes that can run the job are used when next available which could greatly increase the rate at which your work is completed.

# Specify the minimum memory required per node specified in megabytes (M) or gigabytes (G).
--mem=<memory, ex. '4G'>

# Set the maximum time allowed for the job format as [days-]hours:minutes:seconds.
--time=<time, ex. '1-00:00:00' sets a maximum time of 1 day for the job>

# Submit a job to a specific partition. The default partition is used if not specified.
--partition=<partition_name>, ex. 'gpu_zone2,gpu' would restrict the job to these two partitions

# Add a constraint for the node. 
--constraint=<constraint_string, ex. 'gpu_type:A40' would restrict to A40 GPUs>

# Manually set the output and error file names. 
--output=<filename>
--error=<filename>

# Specify a dependency on another job or jobs.
--dependency=<dependency_type:jobid, ex. 'afterok:12345' means the current job can start after job 12345 completes successfully>

# Submits a job array with tasks identified by indexes specified by a single number or range. 
--array=<indexes>

# Set the job to requeue if it fails.
--requeue

Advanced SBATCH Example

The following SBATCH example specifies a job that requires:

4 GB of memory per node
a maximum runtime of 2 days
is submitted to the 'gpu' partition
is limited to nodes with an A40 gpu
has exclusive node access
depends on the successful completion of job 12345
is part of a job array with tasks numbered 1 through 10
will be requeued on system failure

The %A and %a in the output and error file names are replaced by the job ID and the array index, respectively, providing unique filenames for each task in the array.

#!/bin/bash
#
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=4G
#SBATCH --time=2-00:00:00
#SBATCH --partition=gpu_zone2,gpu
#SBATCH --constraint=gpu_type:A40
#SBATCH --output=job_%A_%a_output.txt
#SBATCH --error=job_%A_%a_error.txt
#SBATCH --exclusive
#SBATCH --dependency=afterok:12345
#SBATCH --array=1-10
#SBATCH --requeue
#
srun advanced_script.sh

Zest FAQ

Can I Use Docker with Zest?

The cluster doesn't support Docker directly, however, you can import Docker containers into Singularity. More info on Singularity is available from here: https://docs.sylabs.io/guides/3.6/user-guide/.

What packages are available on the login and worker nodes?

Package	Description

Package	Description
yasm	Modular Assembler
gnu8-compilers-ohpc	The GNU C Compiler and Support Files
ohpc-gnu8-io-libs	OpenHPC IO libraries for GNU
ohpc-autotools	OpenHPC autotools
openblas-gnu8-ohpc	An optimized BLAS library based on GotoBLAS2
openmpi3-pmix-slurm-gnu8-ohpc lmod-defaults-gnu8-openmpi3-ohpc imb-gnu8-openmpi3-ohpc	A powerful implementation of MPI
kernel-devel	OpenHPC default login environments
kernel-headers	Intel MPI Benchmarks (IMB)
dkms	Development package for building kernel modules Header files for the Linux kernel for use by glibc Dynamic Kernel Module Support Framework
libstdc++	GNU Standard C++ Library
boost-gnu8-openmpi3-ohpc hwloc-ohpc	Boost free peer-reviewed portable C++ source libraries Portable Hardware Locality
scalapack-gnu8-openmpi3-ohpc singularity-ohpc	A subset of LAPACK routines
gnuplot	Application and environment virtualization
motif-devel	A program for plotting mathematical expressions and data Development libraries and header files
tcl-devel	Tcl scripting language development environment
tk-devel	Tk graphical toolkit development files
qt	Qt toolkit
qt-devel	Development files for the Qt toolkit
libXScrnSaver	X.Org X11 libXss runtime library

What Lmod modules are available on the login and worker nodes?

The following list will be updated periodically. For a real-time list, enter 'module spider' when logged into the cluster.

Module	Description

Module	Description
adios: adios/1.13.1	The Adaptable IO System (ADIOS)
anaconda3: anaconda3/2023.9	Python environment
autotools	Autotools Developer utilities
boost: boost/1.81.0 Boost	Free peer-reviewed portable C++ source libraries
cmake: cmake/3.24.2	CMake is an open-source, cross-platform family of tools designed to build, test and package software
cuda: cuda/12.3	NVIDIA CUDA libraries
gnu12: gnu12/12.3.0	GNU Compiler Family (C/C++/Fortran for x86_64)
gromacs: gromacs/2023.2
hdf5: hdf5/1.10.8	A general purpose library and file format for storing scientific data
hwloc: hwloc/2.7.2	Portable Hardware Locality
imb: imb/2021.3	Intel MPI Benchmarks (IMB)
libfabric: libfabric/1.19.0	Development files for the libfabric library
mpich: mpich/3.4.3-ofi	MPICH MPI implementation
mvapich2: mvapich2/2.3.7	OSU MVAPICH2 MPI implementation
netcdf: netcdf/4.9.0	C Libraries for the Unidata network Common Data Form
netcdf-cxx: netcdf-cxx/4.3.1	C++ Libraries for the Unidata network Common Data Form
netcdf-fortran: netcdf-fortran/4.6.0	Fortran Libraries for the Unidata network Common Data Form
ohpc: ohpc
openblas: openblas/0.3.21	An optimized BLAS library based on GotoBLAS2
openmpi4: openmpi4/4.1.6	A powerful implementation of MPI
phdf5: phdf5/1.10.8	A general purpose library and file format for storing scientific data
pmix: pmix/4.2.1
pnetcdf: pnetcdf/1.12.3	A Parallel NetCDF library (PnetCDF)
prun: prun/2.2	job launch utility for multiple MPI families
scalapack: scalapack/2.2.0	A subset of LAPACK routines redesigned for heterogenous computing
singularity: singularity/3.7.1	Application and environment virtualization
ucx: ucx/1.15.0	UCX is a communication library implementing high-performance messaging

Additional Zest Resources

Slurm Quick Start User Guide

Slurm Command and Variables Cheat Sheet

Slurm Tutorials and Instructions

Additional Research Computing Resources

ITS Research Computing Home

ITS Remote Desktop Services (RDS)

OrangeGrid/HTCondor Support Home Page

Research Computing Events and Colloquia

Getting Help

Question about Research Computing? Any questions about using or acquiring research computing resources or access can be directed at researchcomputing@syr.edu.