- 6x public compute nodes
- 2x public login nodes
- 8x NVIDIA A100 GPUS per node
- 1-2TB DDR4 and 320-640GB GPU memory per node
- 128 cpu cores per compute node
- Infiniband HDR Interconnect
Swing has two partitions, the default being named gpu. By default, you will be allocated 1/8th of the node resources per GPU.
Nodes allow for multiple jobs from multiple users up until the resources are fully consumed (8 jobs with 1 GPU each per node, 1 job with 8 GPU per node, and everything in between).
You MUST request at least 1 GPU to run a job otherwise you will see the following error:
srun: error: Please request at least 1 GPU in the partition 'gpu' srun: error: e.g '#SBATCH --gres=gpu:1') srun: error: Unable to allocate resources: Invalid generic resource (gres) specification
|Partition Name||Number of Nodes||GPUs Per Node||GPU Memory Per Node||CPUs Per Node||DDR4 Memory Per Node||Local Scratch Disk||Operating System|
|gpu||5||8x NVIDIA A100 40GB||320GB||2x AMD EPYC 7742 64-Core Processor (128 Total Cores)||1TB||14TB||Ubuntu 20.04.2 LTS|
|gpu-large||1||8x NVIDIA A100 80GB||640GB||2x AMD EPYC 7742 64-Core Processor (128 Total Cores)||2TB||28TB||Ubuntu 20.04.2 LTS|
On Swing, users that want to take advantage of local scratch space will have the option of using a small scratch space on the node’s memory (located at /scratch, 20GB tmpfs). Otherwise, users have access to the same GPFS filesystems as on our other resources including home, project and group space.
Please see our detailed description of the file storage used in LCRC here.
Swing runs with 2x AMD EPYC 7742 64-Core Processor and 8x NVIDIA A100 GPUS per node.
Swing is also using an Infiniband HDR interconnect for its network. This fact comes into play when considering MPI programs that would use Infiniband library as a means for communication.
Running Jobs on Swing
For detailed information on how to run jobs on Swing, you can follow our documentation by clicking here: Running Jobs on Swing.
Swing utilizes the Slurm Workload Manager for job management. Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters.
As a cluster workload manager, Slurm has three key functions. First, it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work.
Swing, unlike other LCRC clusters, charges allocation time based on GPU Hours instead of Core Hours. Please factor this in when applying for time on Swing.
Please see GPU Hour Usage for more details.