Accessing Zest
To access Zest, simply make an SSH connection using your NetID and specifying the login node you have been assigned. The example below uses 'its-zest-login2.syr.edu'. Refer to the access email you received from Research Computing staff with your node information. The cluster supports connection via CMD, programs like Putty, and the use of SCP and WinSCP.
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
ssh netid@its-zest-login2.syr.edu |
Tip | ||
---|---|---|
| ||
Off-campus networks cannot access the cluster directly. Users off campus need to connect to campus via Remote Desktop Services (RDS). RDS will provide you with a Windows 10 desktop that has access to your G: drive, campus OneDrive, and the research clusters. Once connected, SSH and SCP are available from the Windows command prompt or you can use Putty and WinSCP which are also installed. Full instructions and details for connection to RDS are available on the RDS home page. Note that Azure VPN is an alternative option, but not available for all users. See the Azure VPN page for more details. |
Zest | Slurm Commands & Cluster Info
Slurm Commands
Once connected, below are some basic commands to get started.
Code Block | ||||
---|---|---|---|---|
| ||||
# Show node information. sinfo # Show job queue squeue # Display job accounting information sacct # Submit job script sbatch [script name] # Start an interactive job salloc [options][command] # Launch parallel job steps, usually runs in a SBATCH script srun [options][command] # Display running job status sstat [job idjobid] # Cancel a job scancel [job idjobid] |
Learn more basics with the Slurm Quick Start User Guide.
Lmod Commands
Lmod is also available on the Zest clusters, examples below.
Command | Description | |
---|---|---|
module avail | Show all available modules | |
module load [name] | Load the module environment | |
modulespider [string] | Search for module names matching string | |
module keyword [string] | Search module name or description | |
module list | List currently loaded modules | |
module unload [name] | Unload a module from environment | |
module purge | Remove all modules | |
module save [name] | Save currently loaded modules to collection name | |
module savelist | Shows all saved collections | |
module restore [name] | Restore modules from collection name | |
module help | Display all Lmod options |
Zest Cluster Architecture
All worker nodes have Infiniband interconnect
Hostname | Hardware | Function |
---|---|---|
its-zest-login1 | 4 CPU, 16G memory | Login for Biomed users |
its-zest-login2 | 4 CPU, 16G memory | Login for ECS users |
node[1000-1011] | 80 CPUs, 187G memory each | Worker nodes (Dell) |
node[1012-1023,1028-1030] | 36 CPUs, 188G memory each | Worker nodes (Supermicro) |
node[1024-1027] | 36 CPUs, 188G memory, 2 GPUs each | GPU worker nodes |
node[1031-1040] | 48 CPUs, 188G memory each | Worker nodes (Dell) |
node[1041-1062] | 112 CPUs, 512G memory each | Worker nodes (Dell) |
node[1063-1067] | 112 CPUs, 503G memory, 1GPU | GPU worker nodes |
node[1069-1078] | 64 CPUs, 2T memory each | Worker nodes |
node[1079-1096] | 128 CPUs, 354G memory each | Worker nodes |
Zest Cluster Storage and Partitions
General information about local storage locations and partition details.
Resource | Description |
---|---|
/home/NetID/ | NFS based user home directory available throughout the cluster |
/tmp/ | Temporary fast local storage only persistent for the current job |
Partition | Node Assignment | Description |
---|---|---|
normal | node[1010-1023] | Computational worker nodes. |
geforce | node[1024-1027] | Supermicro nodes with 2 x GeForce RTX 2080 Ti GPUs each |
a30 | node[1063-1067] | Dell nodes with one Nvidia A30 Tensor Core GPU each |
hyperv | node[1069-1096] | Intel Xeon Gold 6338 CPUs |
Submitting Jobs (with Examples)
Submitting jobs on the Zest cluster requires the creation of an SBATCH script. Below are common examples including the use of MPI and GPUs.
Basic SBATCH Example
Below is a basic SBATCH example.
Code Block | ||||
---|---|---|---|---|
| ||||
#!/bin/bash # #SBATCH --nodes=1 #SBATCH --ntasks=3 #SBATCH --cpus-per-task=1 #SBATCH --mail-type=ALL #SBATCH --mail-user=netid@syr.edu # replace netid with your NetID # # This runs hostname three times (tasks) on a single node # srun hostname |
Assuming the above is 'job1.sh', use the 'sbatch' command to submit the job as seen below.
Code Block | ||||
---|---|---|---|---|
| ||||
netid@its-zest-login1:[~]$ sbatch job1.sh Submitted batch job 781 netid@its-zest-login1:[~]$ more slurm-781.out node1002 node1002 node1002 netid@its-zest-login1:[~]$ |
Note |
---|
Note that the default output for jobs will be located in slurm-{jobid}.out. |
MPI SBATCH Example
Use srun instead of mpirun since OpenMPI is supported by Slurm
Code Block | ||||
---|---|---|---|---|
| ||||
#!/bin/bash # #SBATCH --nodes=3 #SBATCH --ntasks-per-node=20 #SBATCH --cpus-per-task=1 #SBATCH --mail-type=ALL #SBATCH --mail-user=netid@syr.edu # module load imb # Loads Intel InfiniBand Benchmarks srun IMB-MPI1 |
Example GPU worker interactive job
Use interactive shell allocation to compile and test applications
Code Block | ||||
---|---|---|---|---|
| ||||
# This will start an interactive shell on a Supermicro GPU system # using 20 CPUs and 2 GPUs. If the resource is open, you’ll get a shell on # a worker node. Otherwise, srun will hang until resource is available. netid@its-zest-login1:[~]$ srun --pty -p geforce -c 20 --gres=gpu:2 bash [netid@node1024 ~]$ cp -Rp /usr/local/cuda/samples CUDA_SAMPLES [netid@node1024 ~]$ cd CUDA_SAMPLES [netid@node1024 ~/CUDA_SAMPLES]$ make -j20 all [netid@node1024 ~/CUDA_SAMPLES]$ exit netid@its-zest-login1:[~]$ |
GPU SBATCH Example
Ensure you are using the geforce slurm partition.
Code Block | ||||
---|---|---|---|---|
| ||||
#!/bin/bash # #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=10 #SBATCH --partition=geforce #SBATCH --gres=gpu:2 #SBATCH --mail-type=ALL #SBATCH --mail-user=netid@syr.edu # nvidia-smi |
GPU SBATCH Using MPI Example
Note the use of srun with the MPI path.
Code Block | ||||
---|---|---|---|---|
| ||||
#!/bin/bash # #SBATCH --nodes=3 #SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=10 #SBATCH --partition=geforce #SBATCH --gres=gpu:2 #SBATCH --mail-type=ALL #SBATCH --mail-user=netid@syr.edu # module load cuda srun /home/netid/CUDA_SAMPLES/bin/x86_64/linux/release/simpleMPI |
Zest FAQ
Can I Use Docker with Zest?
The cluster doesn't support Docker directly, however, you can import Docker containers into Singularity. More info on Singularity is available from here: https://docs.sylabs.io/guides/3.6/user-guide/.
What packages are available on the login and worker nodes?
Expand | ||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||||||||||||||||||||||||||||||||
|
What Lmod modules are available on the login and worker nodes?
Expand | ||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Additional Research Computing Resources
ITS Remote Desktop Services (RDS)
OrangeGrid/HTCondor Support Home Page
Research Computing Events and Colloquia
Getting Help
Question about Research Computing? Any questions about using or acquiring research computing resources or access can be directed at researchcomputing@syr.edu.