Swing, like the other LCRC clusters, utilize the Slurm Workload Manager for job management. Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters.
As a cluster workload manager, Slurm has three key functions. First, it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work.
The simplest way to become familiar with Slurm and its basic commands is to follow their Quick Start User Guide. In the rest of this page, we’ll cover specific examples and commands. If at any time something becomes unclear, please do contact LCRC support.
Please note that unlike other LCRC clusters, Swing is using Ubuntu as the operating system instead of CentOS.
Swing, unlike other LCRC clusters, charges allocation time based on GPU Hours instead of Core Hours. Please factor this in when applying for time on Swing.
Please see GPU Hour Usage below for more details.
Logging Into Swing
Please be sure to following our Getting Started documentation in order to make sure you’ve completed the necessary steps so that you can login to Swing. Once you’ve done this, you can SSH to Swing by running the following:
ssh <your_argonne_username>@swing.lcrc.anl.gov |
The LCRC login nodes should not be used to run jobs on. Doing so may impact other users and require these login nodes to be rebooted.
If you need to add a new SSH key as you may not have logged in for awhile, please read through our documentation here.
Swing also shares the same global GPFS filesystem as other LCRC clusters. All of your home and project directories noted in our storage documentation will be available between clusters.
Projects Used for Job Submission
LCRC resources require a valid project with an allocation to submit jobs. Projects are what keeps track of your quarterly allocations. Please see the following page for more information about Projects in LCRC.
To see how much time will be deducted from your project when running jobs on Swing, please see the following on GPU Hour Usage.
When logging into Swing for the first time, you’ll want to change your default project (as a reference, what LCRC calls projects are referred to as accounts in Slurm).
All LCRC clusters currently use separate allocation/time databases. Your time and balances on one cluster will not be the same on the other. If you need to check your current account’s (project’s) balance(s), change your default account, etc., please see our documentation below or reference the information here: Project Allocation Queries and Management.
- All Argonne Employee Swing users will have a default project set to startup-<username> upon first login with a project balance of 100 GPU hours.
- All Non-Argonne Employee Swing users will have a default project set to external upon first login which has no time allocated and thus you will not be able to submit jobs.
Query your Default Project on Swing
You can check your default project on Swing and make sure this is set correctly with this command:
lcrc-sbank -q default |
Setting a Default Project on Swing
You can set your default project on Swing with the following command:
lcrc-sbank -s default <project_name> |
You can also specify the project name on Swing in your job submission if you’d like to use something different other than your default. With SBATCH, this can be done with:
#SBATCH -A <project_name> |
Query Project Balances on Swing
You can query your project balances on Swing to see how much time you have available and how much you have used.
Query all of your project balances on Swing:
lcrc-sbank -q balance |
Query a specific project balance on Swing:
lcrc-sbank -q balance <project_name> |
Query a Project Transaction History on Swing
If you’d like to see the transaction history for a project on Swing, you can run the below.
lcrc-sbank -q trans <project_name> |
lcrc-sbank Help Menu
If you need to query the lcrc-sbank help menu at any time, simply run the below.
lcrc-sbank -h |
Software Environment Using Lmod
Swing is using Lmod (Lua Environment Modules) for environment variable management. Lmod has several advantages. For example, it prevents you from loading multiple versions of the same package at the same time. It also prevents you from having multiple compilers and MPI libraries loaded at the same time. See the Lmod User Guide for information on how to use Lmod.
The software environment is unique to Swing. We have already built some basic software. We will continue to install some standard software, but complex applications should be built by the user.
Please note that Python modules are available in the ‘anaconda3’ module and are not listed separately via Lmod.
Using Slurm to Submit Jobs
Swing is using Slurm for the job resource manager and scheduler for the cluster.
The Slurm Workload Manager is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters.
As a cluster workload manager, Slurm has three key functions. First, it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work.
Your best source of finding information on using Slurm will come from their quickstart guide here or by using the man pages.
Below we will outline some general information on the Swing Slurm partitions and supply some basic submission information to get you started.
Partitions Limits
Swing currently enforces the following limits on publicly available partitions:
- 4 Running Jobs per user.
- 10 Queued Jobs per user.
- 1 Days (24 Hours) Maximum Walltime.
- 1 Hour Default Walltime if not specified.
- 16 GPUs (2 full nodes) Max in use at one time.
- gpu is the default partition.
Available Partitions
Swing has two partitions, the default being named gpu. By default, you will be allocated 1/8th of the node resources per GPU.
Nodes allow for multiple jobs from multiple users up until the resources are fully consumed (8 jobs with 1 GPU each per node, 1 job with 8 GPUs per node, and everything in between). Slurm will clean up temporary files when all of your jobs on a node exit.
You MUST request at least 1 GPU to run a job otherwise you will see the following error:
srun: error: Please request at least 1 GPU in the partition 'gpu' srun: error: e.g '#SBATCH --gres=gpu:1') srun: error: Unable to allocate resources: Invalid generic resource (gres) specification
Here’s a short list of node specifications:
Partition Name | Number of Nodes | GPUs Per Node | GPU Memory Per Node | CPUs Per Node | DDR4 Memory Per Node | Local Scratch Disk | Operating System |
gpu | 5 | 8x NVIDIA A100 40GB | 320GB | 2x AMD EPYC 7742 64-Core Processor (128 Total Cores) | 1TB | 14TB | Ubuntu 20.04.2 LTS |
gpu-large | 1 | 8x NVIDIA A100 80GB | 640GB | 2x AMD EPYC 7742 64-Core Processor (128 Total Cores) | 2TB | 28TB | Ubuntu 20.04.2 LTS |
Job Submission Commands
The 3 most common tools you will use to submit jobs are sbatch, srun and salloc.
You can reference the table below for a simple, quick cheat sheet on a few examples about jobs in Slurm:
Slurm Command | Description |
sbatch <job_script> | Submit <job_script> to the Scheduler |
srun <options> | Run Parallel Jobs |
salloc <options> | Request an Interactive Job |
squeue | View Job Information |
scancel <job_id> | Delete a Job |
Example sbatch Job Submission
Let’s go through a simple compile and job submission using sbatch.
Here, we have a simple Hello World application for GPUs:
[[email protected] ~]$ cat hello.cu #include <stdio.h> #define NUM_BLOCKS 2 #define BLOCK_WIDTH 16 __global__ void hello() { printf("Hello world! I'm thread %d in block %d\n", threadIdx.x, blockIdx.x); } int main(int argc, char **argv) { // launch the kernel hello<<<NUM_BLOCKS, BLOCK_WIDTH>>>(); // force the printf()s to flush cudaDeviceSynchronize(); printf("That's all!\n"); return 0; }
Load an appropriate module and build your application:
[[email protected] ~]$ module load nvhpc [[email protected] ~]$ nvcc hello.cu -o hello-gpu-run
Let’s take a look at a simple submit script:
[[email protected] ~]$ cat hello.sh #!/bin/bash # #SBATCH --job-name=gpu-test #SBATCH --account=<my_lcrc_project_name> #SBATCH --nodes=2 #SBATCH --gres=gpu:8 #SBATCH --time=00:05:00 srun ./hello-gpu-run
Now, submit your job:
[[email protected] ~]$ sbatch hello.sh Submitted batch job 398
You can use squeue to check the status of your job. Once finished, check that your application ran successfully:
[[email protected] ~]$ cat slurm-398.out Hello world! I'm thread 0 in block 0 Hello world! I'm thread 1 in block 0 Hello world! I'm thread 2 in block 0 Hello world! I'm thread 3 in block 0 Hello world! I'm thread 4 in block 0 Hello world! I'm thread 5 in block 0 Hello world! I'm thread 6 in block 0 Hello world! I'm thread 7 in block 0 Hello world! I'm thread 8 in block 0 Hello world! I'm thread 9 in block 0 Hello world! I'm thread 10 in block 0 Hello world! I'm thread 11 in block 0 Hello world! I'm thread 12 in block 0 Hello world! I'm thread 13 in block 0 Hello world! I'm thread 14 in block 0 Hello world! I'm thread 15 in block 0 Hello world! I'm thread 0 in block 1 Hello world! I'm thread 1 in block 1 Hello world! I'm thread 2 in block 1 Hello world! I'm thread 3 in block 1 Hello world! I'm thread 4 in block 1 Hello world! I'm thread 5 in block 1 Hello world! I'm thread 6 in block 1 Hello world! I'm thread 7 in block 1 Hello world! I'm thread 8 in block 1 Hello world! I'm thread 9 in block 1 Hello world! I'm thread 10 in block 1 Hello world! I'm thread 11 in block 1 Hello world! I'm thread 12 in block 1 Hello world! I'm thread 13 in block 1 Hello world! I'm thread 14 in block 1 Hello world! I'm thread 15 in block 1 That's all! Hello world! I'm thread 0 in block 0 Hello world! I'm thread 1 in block 0 Hello world! I'm thread 2 in block 0 Hello world! I'm thread 3 in block 0 Hello world! I'm thread 4 in block 0 Hello world! I'm thread 5 in block 0 Hello world! I'm thread 6 in block 0 Hello world! I'm thread 7 in block 0 Hello world! I'm thread 8 in block 0 Hello world! I'm thread 9 in block 0 Hello world! I'm thread 10 in block 0 Hello world! I'm thread 11 in block 0 Hello world! I'm thread 12 in block 0 Hello world! I'm thread 13 in block 0 Hello world! I'm thread 14 in block 0 Hello world! I'm thread 15 in block 0 Hello world! I'm thread 0 in block 1 Hello world! I'm thread 1 in block 1 Hello world! I'm thread 2 in block 1 Hello world! I'm thread 3 in block 1 Hello world! I'm thread 4 in block 1 Hello world! I'm thread 5 in block 1 Hello world! I'm thread 6 in block 1 Hello world! I'm thread 7 in block 1 Hello world! I'm thread 8 in block 1 Hello world! I'm thread 9 in block 1 Hello world! I'm thread 10 in block 1 Hello world! I'm thread 11 in block 1 Hello world! I'm thread 12 in block 1 Hello world! I'm thread 13 in block 1 Hello world! I'm thread 14 in block 1 Hello world! I'm thread 15 in block 1 That's all!
Assuming everything went well, you’ve successfully compiled and submitted your test job!
Example Interactive Job Submission
There are a couple of ways to run an interactive job:
First, you can just get a session on a node by using the srun command in the following way:
srun --gres=gpu:1 --pty bash |
This will drop you onto one node with 1 GPU allocated to you. Once you exit the node, the allocation will be relinquished.
If you want more flexibility, you can instead have the system first allocate resources for the job using the
the salloc
command:
salloc -N 2 --gres=gpu:4 -t 00:30:00 |
This job will allocate 2 nodes with 4 GPUs each for 30 minutes. You should get the job number from the output. This command will not log you into any of your allocated nodes by default.
You can get a list of your allocated nodes and many other slurm settings set by the salloc command by doing:
printenv | grep SLURM |
After the resources were allocated and the session was granted use srun
command to run your job.
When you allocate resources via salloc, you can also now freely SSH to the nodes in your allocation as well if you prefer to run jobs from the nodes themselves.
Checking Queues and Jobs
To view job and job step information use squeue.
Here’s a quick example of what the output may look like:
squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 999 gpu test-joba user2 R 2:40:31 2 gpu[3-4] 998 gpu test-job2 user1 R 45:20 1 gpu1 997 gpu test-job1 user1 R 3:04 1 gpu1 |
Here are also some common options for squeue:
-a |
Display information about all jobs in all partitions. This is the default when running squeue with no options. |
-u <user_list> |
Request jobs or job steps from a comma separated list of users. The list can consist of user names or user id numbers. |
-j <job_id_list> |
Requests a comma separated list of job IDs to display. Defaults to all jobs. |
-l |
Report more of the available information for the selected jobs or job steps. |
Deleting a Job
To delete a job use scancel
. This command will take the job id as its argument. Your job id will be given to when you submit the job. You can also retrieve this from the squeue command detailed above.
scancel <job_id> |
Other Useful Slurm Commands
scontrol
– can be used to report more detailed information about nodes, partitions, jobs, job steps, and configuration.
Common examples:
scontrol show node node-name |
Shows detailed information about the nodes. |
scontrol show partition partition-name |
Shows detailed information about a specific partition. |
scontrol show job job-id |
Shows detailed information about a specific job or all jobs if no job id is given. |
scontrol update job job-id |
Change attributes of submitted job. |
For an extensive list of formatting options please consult scontrol
man page.
sinfo
– view information about jobs, nodes and partitions located in the Slurm scheduling queue
Common options:
-a, --all |
Display information about all partitions. |
-t, --states <states> |
Display nodes in a specific state. Example: idle |
-i <seconds>, --iterate=<seconds> |
Print the state on a periodic basis. Sleep for the indicated number of seconds between reports. |
-l, --long |
Print more detailed information. |
-n <nodes>, --nodes=<nodes> |
Print information only about the specified node(s). Multiple nodes may be comma separated or expressed using a node range expression. For example “gpu[1-2]” |
-o <output_format>, --format=<output_format> |
Specify the information to be displayed using an sinfo format string. |
For an extensive list of formatting options please consult sinfo
man page.
sacct
– command displays accounting data for all jobs and job steps and can be used to display the information about the complete jobs.
Common options:
-S, --starttime |
Select jobs in any state after the specified time. |
-E end_time, --endtime=end_time |
Select jobs in any state before the specified time. |
Valid time formats are:
HH:MM[:SS] [AM|PM]
MMDD[YY] or MM/DD[/YY] or MM.DD[.YY]
MM/DD[/YY]-HH:MM[:SS]
YYYY-MM-DD[THH:MM[:SS]]
Example:
# sacct -S2014-07-03-11:40 -E2014-07-03-12:00 -X -ojobid,start,end,state JobID Start End State --------- --------------------- -------------------- ------------ 2 2014-07-03T11:33:16 2014-07-03T11:59:01 COMPLETED 3 2014-07-03T11:35:21 Unknown RUNNING 4 2014-07-03T11:35:21 2014-07-03T11:45:21 COMPLETED 5 2014-07-03T11:41:01 Unknown RUNNING |
For an extensive list of formatting options please consult sacct
man page.
sprio
– view the factors that comprise a job’s scheduling priority.
sprio is used to view the components of a job’s scheduling priority when the multi-factor priority plugin is installed. sprio is a read-only utility that extracts information from the multi-factor priority plugin. By default, sprio returns information for all pending jobs. Options exist to display specific jobs by job ID and user name.
For an extensive list of formatting options please consult sprio
man page.
GPU Hour Usage
As mentioned, submitting jobs to Swing requires time allocated to a Project (or what Slurm calls an Account). Our documentation has an extensive write up on this on the following page: Projects in LCRC
Whenever a computing job runs on any computing node, the time the job uses will be counted and recorded as computing used by the associated project. A job must have a project in order to run on the computing nodes and will be assigned to your default project if none has been specified in your job script. ALL jobs submitted via sbatch, srun or salloc will deduct computing GPU hours from your project. Please note that unlike other LCRC clusters, Swing charges time based on GPU hours instead of CPU core hours. You should factor this in when applying for time on Swing.
On Swing, the compute nodes charge as follows for each job:
GPU Nodes # of Nodes * # GPUs Per Node * Time Used |
Compute Node Scratch Space
Swing currently writes all temporary files on the compute nodes to a 20 GB tmpfs at /scratch. Please note that all data will be deleted from this directory once your job completes. You can also change your environments TMPDIR variable in your job script if you want to set an alternate path.
Why Isn’t My Job Running Yet?
If today is NOT LCRC Maintenance Day and you find that your job is in the pending (PD) state after running squeue, Slurm will provide a reason for this shown in the squeue command. Here are a few of the most common reasons your job may not be running.
First, check to the see reason code by querying your job number in Slurm:
squeue -j <job_id> |
Then, you can determine why the job has not started by deciphering this sample reason list:
Reason Code | Description |
AccountNotAllowed | The job isn’t using an account that is allowed on the partition. |
AssocGrpBillingMinutes | The job doesn’t have enough time in the banking account to begin. |
BadConstraints | The job’s constraints can not be satisfied. |
BeginTime | The job’s earliest start time has not yet been reached. |
Cleaning | The job is being requeued and still cleaning up from its previous execution. |
Dependency | This job is waiting for a dependent job to complete. |
JobHeldAdmin | The job is held by a system administrator. |
JobHeldUser | The job is held by the user. |
NodeDown | A node required by the job is down. |
PartitionNodeLimit | The number of nodes required by this job is outside of it’s partitions current limits. Can also indicate that required nodes are DOWN or DRAINED. |
PartitionTimeLimit | The job’s time limit exceeds it’s partition’s current time limit. |
Priority | One or more higher priority jobs exist for this partition or advanced reservation. |
QOSMaxJobsPerUserLimit | The job’s QOS has reached its maximum job count for the user at one time. |
ReqNodeNotAvail | During LCRC Maintenance Day, you may see this reason, otherwise, some node specifically required by the job is not currently available. The node may currently be in use, reserved for another job, in an advanced reservation, DOWN, DRAINED, or not responding. Nodes which are DOWN, DRAINED, or not responding will be identified as part of the job’s “reason” field as “UnavailableNodes”. Such nodes will typically require the intervention of a system administrator to make available. |
Reservation | The job is waiting its advanced reservation to become available. |
Resources | The job is waiting for resources to become available. |
TimeLimit | The job exhausted its time limit. |
While this is not every reason code, these are the most common. You can view the full list of Slurm reason codes here.
Assuming your job is in the Priority/Resources state, you can use the sprio command to get a closer idea on when your job may start based on the priorities of other pending jobs. The priority is the sum of age, fairshare, jobsize and QOS (quality of service).
sprio is used to view the components of a job’s scheduling priority when the multi-factor priority plugin is installed. sprio is a read-only utility that extracts information from the multi-factor priority plugin. By default, sprio returns information for all pending jobs. Options exist to display specific jobs by job ID and user name.
For an extensive list of formatting options please consult sprio man page.
Application Specific Examples
Command Line Quick Reference Guide
Command | Description |
---|---|
sbatch <script_name> |
Submit a job. |
scancel <job_id> |
Delete a job. |
squeue squeue -u <username> |
Show queued jobs via the scheduler. Show queued jobs from a specific user. |
scontrol show job <job_id> |
Provide a detailed status report for a specified job via the scheduler. |
sinfo -t idle |
Get a list of all free/idle nodes. |
lcrc-sbank -q balance <project_name> lcrc-sbank -q balance lcrc-sbank -q default lcrc-sbank -s default <project_name> lcrc-sbank -q trans <project_name> |
Query a specific project balance. Query all of your project balances. Query your default project. Change your default project. Query all transactions on a project. |
lcrc-quota |
Query your global filesystem disk usage. |
Contact Information
Please contact [email protected] with any questions you may have regarding Swing usage.