Accessing Zest
To access Zest, simply make an SSH connection using your NetID and specifying the login node you have been assigned. The example below uses 'its-zest-login2.syr.edu'. Refer to the access Refer to the access email you received from Research Computing staff with your login node informationnumber. The cluster supports connection via CMD, programs like Putty, and the use of SCP and WinSCP.
Code Block |
---|
language | bash |
---|
theme | RDark |
---|
title | Example SSH Connection |
---|
|
ssh netid@its-zest-login2loginX.syr.edu |
Tip |
---|
title | Campus Wi-Fi or Off Campus? |
---|
|
OffCampus Wi-Fi and off-campus networks cannot access the cluster directly. Users off campus need to connect to campus via Remote Desktop Services (RDS). RDS will provide you with a Windows 10 desktop that has access to your G: drive, campus OneDrive, and the research clusters. Once connected, SSH and SCP are available from the Windows command prompt or you can use Putty and WinSCP which are also installed. Full instructions and details for connection to RDS are available on the RDS home page. Note that Azure VPN is an alternative option, but not available for all users. See the Azure VPN page for more details. In rare cases where RDS is not an option, the research computing team may provide remote access via a bastion host. Expand |
---|
title | Connecting Via a Bastion Host |
---|
| To connect via a bastion host, first SSH to the bastion host specified by research computing staff. Note, however, this connection will require a Google Authenticator passcode. If you have not already configured the Google Authenticator app, instructions have been provided below. 
Once on the bastion host, simply SSH normally to the login node you have been provided an account for. 
Steps to Set Up Google Authenticator1) If not already installed, download/install the Google Authenticator application from the application store (Apple) or Google play (Android) 2) Use your SSH client to connect to its-condor-t1.syr.edu. If you need to download a SSH client PuTTY is a good option for Windows and Unix. PuTTY can be downloaded from here. Apple user can use the built-in application called Terminal. 
3) Maximize your SSH window (you will a big window to display a QR code that you will scan through the Google Authenticator application). 4) When prompted use your SU NetID and password to login. 
5) It will display a basic instruction set for setting up your 2 factor authentication and then wait at a prompt before continuing – AGAIN BE SURE TO MAXIMIZE the SSH session window before you go to the next step to make sure the barcode will be fully displayed on the screen.
Image Modified
Image Modified
6) Once you continue it will display a key and barcode – use Google Authenticator Application you installed in step to scan the barcode or enter the key. 7) This should log you in successfully, on the subsequent logins, you’ll enter NetID password as Password prompt, then 6-digit Google Authenticator one time password at the Verification prompt. Enter the 6-digit code without any spaces even if Google Authenticator shows a space in the number string. 
|
|
Zest | Slurm Commands & Cluster Info
Slurm Commands
Once connected, below are some basic commands to get started.
Code Block |
---|
|
# Show node information. User this to view available nodes and resources.
sinfo
# Show job queue.
squeue
# Display job accounting information.
sacct
# Submit job script.
sbatch [script name]
# Start an interactive job.
salloc [options][command]
# Launch non-MPI parallel job steps, usually runs in a SBATCH script.
srun [options][command]
# DisplayLaunch MPI runningparallel job status.steps, sstatusually [jobid]runs in #a CancelSBATCH ascript job.(Be scancel [jobid]sure to load associated modules).
mpirun [options][command]
# Display running job status.
sstat [jobid]
# Cancel a job.
scancel [jobid] |
Learn more basics with the Slurm Quick Start User Guide.
Zest Cluster Local Storage
Note the default local storage locations.
Resource | Description |
---|
/home/NetID/ | NFS based user home directory available throughout the cluster |
/tmp/ | Temporary fast local storage only persistent for the current job |
Lmod Commands
Lmod is also available on the Zest clusters, examples below.
Code Block |
---|
|
# Show all available modules.
module avail
# Load the module environment.
module load [name]
# Search for Module names matching string.
modulespider [string]
# Search module name or description.
module keyword [string]
# List currently loaded modules.
module list
# Unload a module from environment.
module unload [name]
# Remove all modules.
module purge
# Save currently loaded modules to collection name.
module save [name]
# Shows all saved collections.
module savelist
# Restore modules from collection name.
module restore [name]
# Display all Lmod options.
module help |
Submitting Jobs (with Examples)
Submitting jobs on the Zest cluster requires the creation of an SBATCH script. Below are common examples including the use of MPI and GPUs.Basic SBATCH Example
Below is a basic SBATCH example.
Code Block |
---|
|
#!/bin/bash
#
#SBATCH --nodes=1
#SBATCH --ntasks=3
#SBATCH --cpus-per-task=1
#SBATCH --mail-type=ALL
#SBATCH --mail-user=netid@syr.edu # replace netid with your NetID
#
# This runs hostname three times (tasks) on a single node
#
srun hostname |
Assuming the above is 'job1.sh', use the 'sbatch' command to submit the job as seen below. Zest Cluster Partitions
Note the Zest cluster has multiple partitions including ones configured for CPU-intensive work, GPU utilization, and for those needing longer runtimes. Users can submit jobs to any partitions they feel meet their requirements. Below is a list of currently available partitions.
Partition | General Purpose | Max Runtime (Days) |
---|
normal (default) | Designed for CPU-intensive workloads. | 20 |
compute_zone2 | Designed for CPU-intensive workloads. | 20 |
longjobs | Designed for CPU-intensive workloads that require extended runtimes. | 40 |
gpu | Tailored for GPU-heavy computations. | 20 |
gpu_zone2 | Tailored for GPU-heavy computations. | 20 |
If no partitions are specified, the default partition will be used. The current default is the 'normal' partition. If one or more partitions are specified, only those will be considered to run that job.
To point a jobs to particular partitions, simply add them either individually or as a list to the submission file, examples below.
Code Block |
---|
|
netid@its-zest-login1:[~]$ sbatch job1.sh
Submitted batch job 781
netid@its-zest-login1:[~]$ more slurm-781.out
node1002
node1002
node1002
netid@its-zest-login1:[~]$ |
Note |
---|
Note that the default output for jobs will be located in slurm-{jobid}.out. |
MPI SBATCH Example
Use srun instead of mpirun since OpenMPI is supported by Slurm
Code Block |
---|
|
#!/bin/bash
#
#SBATCH --nodes=3
#SBATCH --ntasks-per-node=20
#SBATCH --cpus-per-task=1
#SBATCH --mail-type=ALL
#SBATCH --mail-user=netid@syr.edu
#
module load imb # Loads Intel InfiniBand Benchmarks
srun IMB-MPI1 |
Example GPU worker interactive job
Use interactive shell allocation to compile and test applications
Code Block |
---|
|
# This will start an interactive shell on a Supermicro GPU system
# using 20 CPUs and 2 GPUs. If the resource is open, you’ll get a shell on
# a worker node. Otherwise, srun will hang until resource is available.
# Submit a job a single partition.
#SBATCH --partition=computer_zone2
# Submit a job to the CPU partitions.
#SBATCH --partition=computer_zone2,normal
# Submit a job to the GPU partitions.
#SBATCH --partition=gpu_zone2,gpu |
Submitting Jobs (with SBATCH Examples)
Submitting jobs on the Zest cluster requires the creation of an SBATCH script. Below are common examples including the use of MPI and GPUs.
Package | Description |
---|
yasm | Modular Assembler |
gnu8-compilers-ohpc | The GNU C Compiler and Support Files |
ohpc-gnu8-io-libs | OpenHPC IO libraries for GNU |
ohpc-autotools | OpenHPC autotools |
openblas-gnu8-ohpc | An optimized BLAS library based on GotoBLAS2 |
openmpi3-pmix-slurm-gnu8-ohpc lmod-defaults-gnu8-openmpi3-ohpc imb-gnu8-openmpi3-ohpc | A powerful implementation of MPI |
kernel-devel | OpenHPC default login environments |
kernel-headers | Intel MPI Benchmarks (IMB) |
dkms | Development package for building kernel modules Header files for the Linux kernel for use by glibc Dynamic Kernel Module Support Framework |
libstdc++ | GNU Standard C++ Library |
boost-gnu8-openmpi3-ohpc hwloc-ohpc | Boost free peer-reviewed portable C++ source libraries Portable Hardware Locality |
scalapack-gnu8-openmpi3-ohpc singularity-ohpc | A subset of LAPACK routines |
gnuplot | Application and environment virtualization |
motif-devel | A program for plotting mathematical expressions and data Development libraries and header files |
tcl-devel | Tcl scripting language development environment |
tk-devel | Tk graphical toolkit development files |
qt | Qt toolkit |
qt-devel | Development files for the Qt toolkit |
libXScrnSaver | X.Org X11 libXss runtime library |
What Lmod modules are available on the login and worker nodes? Expand |
---|
title | Click here for a list of available modules... |
---|
|
Module | Description |
---|
adios: adios/1.13.1 | The Adaptable IO System (ADIOS) |
anaconda2: anaconda2/2019.7 | Python environment |
anaconda3: anaconda3/2019.7 | Python environment |
autotools | Autotools Developer utilities |
boost: boost/1.70.0 Boost | Free peer-reviewed portable C++ source libraries |
cmake: cmake/3.14.3 | CMake is an open-source, cross-platform family of tools designed to build, test and package software |
cuda: cuda/10-1 | NVIDIA CUDA libraries |
gnu8: gnu8/8.3.0 | GNU Compiler Family (C/C++/Fortran for x86_64) |
gromacs: gromacs/2019.4 | hdf5: hdf5/1.10.5 | A general purpose library and file format for storing scientific data |
hwloc: hwloc/2.0.3 | Portable Hardware Locality |
imb: imb/2018.1 | Intel MPI Benchmarks (IMB) |
mpich: mpich/3.3.1 | MPICH MPI implementation |
mvapich2: mvapich2/2.3.1 | OSU MVAPICH2 MPI implementation |
netcdf: netcdf/4.6.3 | C Libraries for the Unidata network Common Data Form |
netcdf-cxx: netcdf-cxx/4.3.0 | C++ Libraries for the Unidata network Common Data Form |
netcdf-fortran: netcdf-fortran/4.4.5 | Fortran Libraries for the Unidata network Common Data Form |
openblas: openblas/0.3.5 | An optimized BLAS library based on GotoBLAS2 |
openmpi3: openmpi3/3.1.4 | A powerful implementation of MPI |
phdf5: phdf5/1.10.5 | A general purpose library and file format for storing scientific data |
pmix: pmix/2.2.2 | pnetcdf: pnetcdf/1.11.1 | A Parallel NetCDF library (PnetCDF) |
prun: prun/1.3 | job launch utility for multiple MPI families |
scalapack: scalapack/2.0.2 | A subset of LAPACK routines redesigned for heterogenous computing |
singularity: singularity/3.2.1 | Application and environment virtualization |
Additional Zest Resources
Slurm Quick Start User Guide
Slurm Command and Variables Cheat Sheet
Slurm Tutorials and Instructions
Additional Research Computing Resources
ITS Research Computing Home
ITS Remote Desktop Services (RDS)
OrangeGrid/HTCondor Support Home Page
Research Computing Events and Colloquia
Getting Help
Question about Research Computing? Any questions about using or acquiring research computing resources or access can be directed at researchcomputing@syr.edu.
Basic SBATCH Example
Below is a basic SBATCH example.
Code Block |
---|
|
#!/bin/bash
#
#SBATCH --nodes=1
#SBATCH --ntasks=3
#SBATCH --cpus-per-task=1
#SBATCH --mail-type=ALL
#SBATCH --mail-user=netid@syr.edu # replace netid with your NetID
#
# This runs hostname three times (tasks) on a single node
#
srun hostname |
Assuming the above is 'job1.sh', use the 'sbatch' command to submit the job as seen below.
Code Block |
---|
|
netid@its-zest-login1:[~]$ srun --pty -p geforce -c 20 --gres=gpu:2 bash
[netid@node1024 sbatch job1.sh
Submitted batch job 781
netid@its-zest-login1:[~]$ cpmore slurm-Rp /usr/local/cuda/samples CUDA_SAMPLES
[netid@node1024 ~]$ cd CUDA_SAMPLES
[netid@node1024 ~/CUDA_SAMPLES]$ make -j20 all
[netid@node1024 ~/CUDA_SAMPLES]$ exit
netid@its-zest-login1:[~]$ |
GPU SBATCH Example
Ensure you are using the geforce slurm partition.781.out
node1002
node1002
node1002
netid@its-zest-login1:[~]$ |
Note |
---|
Note that the default output for jobs will be located in slurm-{jobid}.out. |
MPI SBATCH Example
Use mpirun for MPI. Note that you'll want to ensure you have the necessary modules loaded, either directly in the SBATCH file or in your ~./bash_profile. If a script requires a module for compiling, ensure it is loaded prior to compiling.
Code Block |
---|
|
#!/bin/bash
#
#SBATCH --nodes=13
#SBATCH --ntasks-per-node=120
#SBATCH --cpus-per-task=101
#SBATCH --partition=geforce
#SBATCH --gres=gpu:2
#SBATCH --mail-type=ALL
#SBATCH --mail-user=netid@syr.edu
#
#
nvidia-smi |
GPU SBATCH Using MPI Example
Note the use of srun with the MPI path.
Code Block |
---|
|
#!/bin/bash
#
#SBATCH --nodes=3
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=10
#SBATCH --partition=geforce
#SBATCH --gres=gpu:2
#SBATCH --mail-type=ALL
#SBATCH --mail-user=netid@syr.edu
#
module load cuda
srun /home/netid/CUDA_SAMPLES/bin/x86_64/linux/release/simpleMPI |
Zest FAQ
Can I Use Docker with Zest?
The cluster doesn't support Docker directly, however, you can import Docker containers into Singularity. More info on Singularity is available from here: https://docs.sylabs.io/guides/3.6/user-guide/.
What packages are available on the login and worker nodes? Expand |
---|
title | Click here for a full list of packages... |
---|
|
Excerpt |
---|
|
Advanced SBATCH Commands
SBATCH files can include some additional parameters depending on your computational needs. Below are some of the more common advanced parameters.
Note that is is recommended that researchers specify
Code Block |
---|
|
# Specify the minimum memory required per node specified in megabytes (M) or gigabytes (G).
--mem=<memory, ex. '4G'>
# Set the maximum time allowed for the job format as [days-]hours:minutes:seconds.
--time=<time, ex. '1-00:00:00' sets a maximum time of 1 day for the job>
# Submit a job to a specific partition.
--partition=<partition_name>
# Add a constraint for the node.
--constraint=<constraint_string, ex. 'gpu_type:A40' would restrict to A40 GPUs>
# Manually set the output and error file names.
--output=<filename>
--error=<filename>
# Specify a dependency on another job or jobs.
--dependency=<dependency_type:jobid, ex. 'afterok:12345' means the current job can start after job 12345 completes successfully>
# Submits a job array with tasks identified by indexes specified by a single number or range.
--array=<indexes>
# Set the job to requeue if it fails.
--requeue
# Set the job to hold the entire node allocated for the job.*
--exclusive |
Warning |
---|
|
Please use the exclusive flag cautiously as it will also prevent other researchers from using the resources on a node not utilized by your job. As an example, a job only using 32 cores may get picked up by a node with 128 cores. Those additional cores could otherwise be working for additional research efforts if not blocked off by the exclusive flag. |
Excerpt |
---|
|
Advanced SBATCH ExampleThe following SBATCH example specifies a job that requires: - 4 GB of memory per node
- a maximum runtime of 2 days
- is submitted to the 'gpu' partition
- is limited to nodes with an A40 gpu
- has exclusive node access
- depends on the successful completion of job 12345
- is part of a job array with tasks numbered 1 through 10
- will be requeued on system failure
The %A and %a in the output and error file names are replaced by the job ID and the array index, respectively, providing unique filenames for each task in the array. Code Block |
---|
|
|
#!/bin/bash
#
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=4G
#SBATCH --time=2-00:00:00
#SBATCH --partition=gpu
#SBATCH --constraint=gpu_type:A40
#SBATCH --output=job_%A_%a_output.txt
#SBATCH --error=job_%A_%a_error.txt
#SBATCH --exclusive
#SBATCH --dependency=afterok:12345
#SBATCH --array=1-10
#SBATCH --requeue
#
srun advanced_script.sh Load required modules, run 'module avail' for latest
module load imb # Loads Intel InfiniBand Benchmarks
module load openmpi4/4.1.6 # Load the openmpi4 module
mpirun IMB-MPI1
Example GPU worker interactive job
Use interactive shell allocation to compile and test applications
Code Block |
---|
|
# This will start an interactive shell on a Supermicro GPU system
# using 20 CPUs and 2 GPUs. If the resource is open, you’ll get a shell on
# a worker node. Otherwise, srun will hang until resource is available.
netid@its-zest-login1:[~]$ srun --pty -p geforce -c 20 --gres=gpu:2 bash
[netid@node1024 ~]$ cp -Rp /usr/local/cuda/samples CUDA_SAMPLES
[netid@node1024 ~]$ cd CUDA_SAMPLES
[netid@node1024 ~/CUDA_SAMPLES]$ make -j20 all
[netid@node1024 ~/CUDA_SAMPLES]$ exit
netid@its-zest-login1:[~]$ |
GPU SBATCH Example
Ensure you are using the gpu supported slurm partitions, gpu and gpu_zone2.
Code Block |
---|
|
#!/bin/bash
#
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=10
#SBATCH --partition=gpu_zone2,gpu
#SBATCH --gres=gpu:2
#SBATCH --mail-type=ALL
#SBATCH --mail-user=netid@syr.edu
#
nvidia-smi |
GPU SBATCH Using MPI Example
Note the use of mpirun with the MPI path.
Code Block |
---|
|
#!/bin/bash
#
#SBATCH --nodes=3
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=10
#SBATCH --partition=gpu_zone2,gpu
#SBATCH --gres=gpu:2
#SBATCH --mail-type=ALL
#SBATCH --mail-user=netid@syr.edu
#
module load cuda
module load openmpi4/4.1.6 # Load the openmpi4 module
mpirun /home/netid/CUDA_SAMPLES/bin/x86_64/linux/release/simpleMPI |
Advanced SBATCH Commands and Examples
SBATCH files can include some additional parameters depending on your computational needs. Below are some of the more common advanced parameters as well as an example SBATCH file.
Zest FAQ
Can I Use Docker with Zest?
The cluster doesn't support Docker directly, however, you can import Docker containers into Singularity. More info on Singularity is available from here: https://docs.sylabs.io/guides/3.6/user-guide/.
What packages are available on the login and worker nodes?
Expand |
---|
title | Click here for a full list of packages... |
---|
|
Package | Description |
---|
yasm | Modular Assembler | gnu8-compilers-ohpc | The GNU C Compiler and Support Files | ohpc-gnu8-io-libs | OpenHPC IO libraries for GNU | ohpc-autotools | OpenHPC autotools | openblas-gnu8-ohpc | An optimized BLAS library based on GotoBLAS2 | openmpi3-pmix-slurm-gnu8-ohpc lmod-defaults-gnu8-openmpi3-ohpc imb-gnu8-openmpi3-ohpc | A powerful implementation of MPI | kernel-devel | OpenHPC default login environments | kernel-headers | Intel MPI Benchmarks (IMB) | dkms | Development package for building kernel modules Header files for the Linux kernel for use by glibc Dynamic Kernel Module Support Framework | libstdc++ | GNU Standard C++ Library | boost-gnu8-openmpi3-ohpc hwloc-ohpc | Boost free peer-reviewed portable C++ source libraries Portable Hardware Locality | scalapack-gnu8-openmpi3-ohpc singularity-ohpc | A subset of LAPACK routines | gnuplot | Application and environment virtualization | motif-devel | A program for plotting mathematical expressions and data Development libraries and header files | tcl-devel | Tcl scripting language development environment | tk-devel | Tk graphical toolkit development files | qt | Qt toolkit | qt-devel | Development files for the Qt toolkit | libXScrnSaver | X.Org X11 libXss runtime library |
|
What Lmod modules are available on the login and worker nodes?
Expand |
---|
title | Click here for a list of available modules... |
---|
|
The following list will be updated periodically. For a real-time list, enter 'module spider' when logged into the cluster. Module | Description |
---|
adios: adios/1.13.1 | The Adaptable IO System (ADIOS) | anaconda3: anaconda3/2023.9 | Python environment | autotools | Autotools Developer utilities | boost: boost/1.81.0 Boost | Free peer-reviewed portable C++ source libraries | cmake: cmake/3.24.2 | CMake is an open-source, cross-platform family of tools designed to build, test and package software | cuda: cuda/12.3 | NVIDIA CUDA libraries | gnu12: gnu12/12.3.0 | GNU Compiler Family (C/C++/Fortran for x86_64) | gromacs: gromacs/2023.2 |
| hdf5: hdf5/1.10.8 | A general purpose library and file format for storing scientific data | hwloc: hwloc/2.7.2 | Portable Hardware Locality | imb: imb/2021.3 | Intel MPI Benchmarks (IMB) | libfabric: libfabric/1.19.0 | Development files for the libfabric library | mpich: mpich/3.4.3-ofi | MPICH MPI implementation | mvapich2: mvapich2/2.3.7 | OSU MVAPICH2 MPI implementation | netcdf: netcdf/4.9.0 | C Libraries for the Unidata network Common Data Form | netcdf-cxx: netcdf-cxx/4.3.1 | C++ Libraries for the Unidata network Common Data Form | netcdf-fortran: netcdf-fortran/4.6.0 | Fortran Libraries for the Unidata network Common Data Form | ohpc: ohpc |
| openblas: openblas/0.3.21 | An optimized BLAS library based on GotoBLAS2 | openmpi4: openmpi4/4.1.6 | A powerful implementation of MPI | phdf5: phdf5/1.10.8 | A general purpose library and file format for storing scientific data | pmix: pmix/4.2.1 |
| pnetcdf: pnetcdf/1.12.3 | A Parallel NetCDF library (PnetCDF) | prun: prun/2.2 | job launch utility for multiple MPI families | scalapack: scalapack/2.2.0 | A subset of LAPACK routines redesigned for heterogenous computing | singularity: singularity/3.7.1 | Application and environment virtualization | ucx: ucx/1.15.0 | UCX is a communication library implementing high-performance messaging |
|
Getting Help
Question about Research Computing? Any questions about using or acquiring research computing resources or access can be directed at researchcomputing@syr.edu.