LCRC is a heavily used lab resource and hence adhering to good citizenship practices makes it easier for all to avail of the computational resources to advance the scientific mission of the laboratory. LCRC is a mid-sized computing facility (typically hosting 10,000 cores or less). LCRC has over 300 active users and over 100 active projects.
Currently, with Blues and Fusion there are about 7,500 cores available for general lab-wide use. Hence the maximum available computing time in core-hours (on a yearly basis) is approximately 56 million core-hours, assuming an 85% utilization factor (7,500 x 24 x 365 x 0.85). The utilization factor accounts for time lost due to monthly maintenance, unscheduled power outages, hardware issues, etc. Based on the above figures, the typical intended size of an LCRC project should be on the order of 200-500,000 core-hours. Given the number of cores and users, LCRC recommends the following job submissions practices to ensure a “fair-share” of the computational resources and prevent unusually long wait-times for all users.
Typical job sizes should be about 5% of your overall yearly core-hour allocation (or less). With this level of core-hour utilization per case, a project could run about 20 different cases over the year (or more if the jobs sizes are smaller). Users are encouraged to compute the core-hours for each submitted job (computed as requested wall time in hours x requested cores) and ensure that they do not exceed the available core-hours using the
lcrc-qbank command. For instance, a user with a 250,000 core-hour allocation (half-yearly) could exceed their entire allocation if they submitted a 2000 core job for a week (2000 x 24 x 7 = 336,000 core-hours). Not only does this considerably slow down the queue for other users but prevent the project from running more cases unless they get approved for more time from the allocations committee. Project PIs are requested to ensure that their project members are using the allocated core-hour judiciously. PIs can check the transactions of various project members using the
lcrc-qbank -q trans command.
Minimize single core jobs
LCRC resources are typically intended for projects using scalable parallel codes that cannot run on standard desktops and laptops. Since an entire node (16 cores on Blues) are allocated to a single job, single core jobs will lead to great under-utilization of a node. Users submitting multiple single core jobs will also severely slow down the queue. Please contact LCRC support if your project requires running multiple single-core jobs. Users are also requested to consider parallelizing their code(s) or using Swift (a parallel scripting language) to run multiple single core jobs on a single node.
Streamline your job-submissions
Users are requested to limit the number of jobs in queue to 30 (or less). Submitting a large number of jobs simultaneously increases the wait time for other users. Please consider streamlining your workflow so that your project is not using more than 20% of the cores at any given time (over all currently running jobs).
Minimize submitting large core-count jobs
Blues has about 320 nodes (approximately 5,000 cores) for general use. The typical job size is 4 to 8 nodes, so 40 to 80 jobs of different user can run simultaneously. Please avoid submitting jobs that require over 64 nodes for extended periods of time. Please contact LCRC support if your project requires you to run large core-count jobs. LCRC typically allows users to run large core-count jobs including full-machine jobs for short periods (a hour to 12 hours) immediately following maintenance on the second Monday of each month. Large core-count jobs (> 64 nodes) are typically run for performance/scaling studies and should be sparingly used for production runs.
Industry projects to be run exclusively on Fusion
Industry projects are granted their full requested allocations but are required to run exclusively on Fusion. Project PIs and members of industry projects should ensure that industry jobs be submitted only on Fusion and not on Blues.