Blues

Blues was one of the clusters that belongs to the computational power of LCRC, featuring ~350 publicly usable compute nodes and over 6,000 cores available for all users. Overall, Blues is comprised of ~800 compute nodes of varying architectures including private condo nodes. Sporting roughly twice the computational power of the previous cluster, Fusion, Blues has some similarities and some differences that all users that plan to use Blues should become familiar with.

Quick Facts

  • ~350 public nodes
  • 64 GB (Intel Sandy Bridge)/128 GB (Intel Haswell) of memory on each node
  • 16 cores (Intel Sandy Bridge)/32 cores (Intel Haswell) per compute node
  • QLogic QDR InfiniBand Interconnect (fat-tree topology)
  • Over 6,000 compute cores available
  • Theoretical peak performance of 107.8 TFlops

Available Partitions

Blues has several publicly available partitions defined. The default partition is sball. Blues condo nodes partitions are not listed below. You can get a list of all partition names on Blues that you have access to by running sinfo -o %P. Any partition that is not sball, haswell, shared, ivy or biggpu is considered a condo partition.

Blues Partition Name Description Number of Nodes CPU Type Co-Processors Cores Per Node Memory Per Node Local Scratch Disk
sball Sandy Bridge Nodes 300 Intel Xeon E5-2670 2.6GHz 16 64 GB 15 GB
shared Sandy Bridge Shared Nodes (Oversubscription / Non-Exclusive) 4 Intel Xeon E5-2670 2.6GHz 16 64 GB 15 GB
haswell Haswell Nodes 60 Intel Xeon E5-2698v3 2.3GHz 32 128 GB 15 GB
ivy Ivy Bridge Nodes 1 Intel Xeon E5-2670v2 2.5GHz 20 64 GB 15 GB
biggpu Sandy Bridge Nodes 6 Intel Xeon E5-2670 2.6GHz 2x NVIDIA Tesla K40m GPU 16 768 GB 1 TB

File Storage

On Blues, there are no disks on the nodes themselves. They are completely diskless. Users that want to take advantage of local scratch space will still have the option of using a small scratch space on the node’s memory (located at /scratch). The scratch space is essentially a RAM disk and consumes an amount of memory, so this should be taken into account if you are running a large job that requires a substantial amount of memory.

Please see our detailed description of the file storage used in LCRC here.

Architecture

Blues runs mostly on Intel Sandy Bridge processors, which have a special subprocessor on chip that would normally work as a weak graphics processor. This subprocessor can be taken advantage of by using the AVX instruction set. All compilers currently on Blues support the AVX instruction set among being the latest and greatest currently available.

Blues is mostly using a QLogic QDR InfiniBand interconnect for its network. This fact comes into play when considering MPI programs that would use the standard ibverbs library as a means for communication. QLogic has their own version of ibverbs that only works with their gear (called infinipath or PSM), allowing for higher performance than you would see from using ibverbs. This means that you should recompile your code and use one of the MPI’s on blues which supports PSM.

Diagram of Blues Network
Blues Network Diagram

Running Jobs on Blues

For detailed information on how to run jobs on Blues, you can follow our documentation by clicking here: Running Jobs on Blues.

Blues utilizes the Slurm Workload Manager (formerly known as Simple Linux Utility for Resource Management or SLURM) for job management. Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters.

As a cluster workload manager, Slurm has three key functions. First, it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work.