Swing

Quick Facts

  • 6x public compute nodes
  • 2x public login nodes
  • 8x NVIDIA A100 40GB GPUS per node
  • 1TB DDR4 and 320GB GPU memory per node
  • 128 cpu cores per compute node
  • Infiniband HDR Interconnect

Available Partitions

Swing has only one partition and it is the default, which is named gpu. By default, you will be allocated 1/8th of the node resources per GPU (32 cores + 128GB RAM per GPU).

Nodes allow for multiple jobs from multiple users up until the resources are fully consumed (8 jobs with 1 GPU each per node, 1 job with 8 GPU per node, and everything in between).

You MUST request at least 1 GPU to run a job otherwise you will see the following error:

srun: error: Please request at least 1 GPU in the partition 'gpu'
srun: error: e.g '#SBATCH --gres=gpu:1')
srun: error: Unable to allocate resources: Invalid generic resource (gres) specification

Partition Name Number of Nodes GPUs Per Node GPU Memory Per Node CPUs Per Node DDR4 Memory Per Node Operating System
gpu 6 8x NVIDIA A100 40GB 320GB 2x AMD EPYC 7742 64-Core Processor (128 Total Cores) 1TB Ubuntu 20.04.2 LTS

File Storage

On Swing, users that want to take advantage of local scratch space will have the option of using a small scratch space on the node’s memory (located at /scratch, 20GB tmpfs). Otherwise, users have access to the same GPFS filesystems as on our other resources including home, project and group space.

Please see our detailed description of the file storage used in LCRC here.

Architecture

Swing runs with 2x AMD EPYC 7742 64-Core Processor and 8x NVIDIA A100 40GB GPUS per node.

Swing is also using an Infiniband HDR interconnect for its network. This fact comes into play when considering MPI programs that would use Infiniband library as a means for communication.

Running Jobs on Swing

For detailed information on how to run jobs on Swing, you can follow our documentation by clicking here: Running Jobs on Swing.

Swing utilizes the Slurm Workload Manager for job management. Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters.

As a cluster workload manager, Slurm has three key functions. First, it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work.

Allocations Note
Swing, unlike other LCRC clusters, charges allocation time based on GPU Hours instead of Core Hours. Please factor this in when applying for time on Swing.

Please see GPU Hour Usage for more details.