Quick Facts
- 1024 public nodes
- 128GB DDR4 (Intel Broadwell) / 96GB DDR4,16GB MCDRAM (Intel Knights Landing) of memory on each node
- 36 cores (Intel Broadwell) / 64 cores (Intel Knights Landing) per compute node
- Omni-Path Fabric Interconnect
Available Partitions
Bebop has several partitions defined, a partition is similar to a queue. Use the -p option with srun or sbatch to select a partition. The default partition is bdwall.
Bebop Partition Name | Description | Number of Nodes | CPU Type | Cores Per Node | Memory Per Node | Local Scratch Disk |
bdwall | All Broadwell Nodes | 664 | Intel Xeon E5-2695v4 | 36 | 128GB DDR4 | 15 GB or 4 TB |
bdw | Broadwell Nodes with 15 GB /scratch | 600 | Intel Xeon E5-2695v4 | 36 | 128GB DDR4 | 15 GB |
bdwd | Broadwell Nodes with 4 TB /scratch | 64 | Intel Xeon E5-2695v4 | 36 | 128GB DDR4 | 4 TB |
bdws | Broadwell Shared Nodes (Oversubscription / Non-Exclusive). | 8 | Intel Xeon E5-2695v4 | 36 | 128GB DDR4 | 15 GB |
knlall | All Knights Landing Nodes | 348 | Intel Xeon Phi 7230 | 64 | 96GB DDR4/16GB MCDRAM | 15 GB or 4 TB |
knl | Knights Landing Nodes with 15GB /scratch | 284 | Intel Xeon Phi 7230 | 64 | 96GB DDR4/16GB MCDRAM | 15 GB |
knld | Knights Landing Nodes with 4TB /scratch | 64 | Intel Xeon Phi 7230 | 64 | 96GB DDR4/16GB MCDRAM | 4 TB |
knls | Knights Landing Shared Nodes (Oversubscription / Non-Exclusive) | 4 | Intel Xeon Phi 7230 | 64 | 96GB DDR4/16GB MCDRAM | 15 GB |
knl-preemptable | All Knights Landing Nodes with restrictions including preemption. Click here for more details. | 348 | Intel Xeon Phi 7230 | 64 | 96GB DDR4/16GB MCDRAM | 15 GB or 4 TB |
File Storage
There are no physical disks in most of the Bebop nodes themselves, and as such the OS running on every node runs in a diskless environment. Users that do take advantage of local scratch space currently on Blues will still have the option of using a scratch space on the node’s memory (15GB located at /scratch). A subset of our Broadwell and KNL nodes instead have a 4TB /scratch available. For details on which queues have which scratch space, please refer to the queue table above. The scratch space is essentially a RAM disk and consumes an amount of memory, so this should be taken into account if you are running a large job that requires a substantial amount of memory.
Please see our detailed description of the file storage used in LCRC here.
Architecture
Bebop runs mostly on Intel Broadwell and Knights Landing processors. Broadwell nodes can take advantage of using the AVX2 and AVX instruction sets while Knight Landing can use AVX-512, AVX2 and AVX.
Bebop is using an Intel Omni-Path interconnect for its network. This fact comes into play when considering MPI programs that would use Infiniband library as a means for communication. Omni-Path has its own communication method that only works with their gear (called PSM2), allowing for higher performance than you would see from using ibverbs or PSM. This means that you should recompile your code and use one of the MPI’s on Bebop which supports PSM2.
Running Jobs on Bebop
For detailed information on how to run jobs on Bebop, you can follow our documentation by clicking here: Running Jobs on Bebop.
Bebop utilizes the Slurm Workload Manager (formerly known as Simple Linux Utility for Resource Management or SLURM) for job management. Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters.
As a cluster workload manager, Slurm has three key functions. First, it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work.