Jazz scheduling policy
The current Jazz job scheduling policy is pretty flexible, allowing
people to submit nearly any kind of job mix into the job queue. We
recently implemented a policy that prevents more than 32 jobs
for any one user from running at one time. In addition, we
have a process for handling overdrawn projects and a priority
scheduling policy, both of which are described below.
Process for Overdrawn LCRC Projects
- The system automatically checks for LCRC projects that have
exceeded their balance.
- The queue priority of overdrawn projects will drop automatically
to priority level 5. For details on the scheduling policy, please
see the priority scheduling section of this document.
- Overdrawn projects will be limited to running only one job per
user at a time.
- A member of the LCRC staff will notify overdrawn projects via email
(to all Project members) within one business day. We will also
invite the PIs to submit a revised allocation request if needed.
- The PIs may request a change in the distribution of the current
project allocation (e.g., some/all of their 2nd half time in the
first half). LCRC staff may move up to 30K hours; larger requests
need approval by the Allocations Committee.
- Alternately, the PIs may submit a revised allocation request,
asking for more time. Increments of less than 30K hours will be
managed by LCRC staff and reported to the Allocations Committee;
larger requests need approval by the Allocations Committee.
- If no request is received, the overdrawn project will be suspended
(unable to start new jobs) when its time use exceeds 100% overdrawn
or 25K hours overdrawn, whichever comes first. This gives the
project a cushion to finish up, to put in their revised request, or
to make other arrangements.
Projects are encouraged to contact LCRC staff when their project needs
change, particularly when there are special needs. We will make every
effort to find ways to meet your research needs. The Allocations
Committee meets quarterly.
The LCRC
External Resources area
has pointers to other
computing resources available to researchers, including a page listing
known Calls for Proposals. If you know of other opportunities, please
send the information to support@lcrc.anl.gov.
Jazz priority scheduling
The Jazz priority scheduling policy is implemented using PBS queues.
Jobs are submitted to a default queue, and PBS then routes the
jobs to the appropriate priority queue based on the priority assigned
to the job owner. The larger the priority number, the later a job
will be run (i.e. a job with priority 1 will be run before a job with
priority 2, if possible). Each user is assigned a priority based on
their memberships in Jazz projects. If they are a member of multiple
projects, they are assigned the largest priority number of the projects.
The CSAC Allocation Committee has assigned a priority to each active
project on Jazz. If a project goes negative its priority is set
to priority level 5. For example, if
project A has been assigned priority 3 by the committee and has a
positive balance, jobs of members of project A are assigned priority
3. If project A goes negative, jobs of members of project A will then
be assigned priority 5.
PBS schedules jobs FIFO within each priority level. Priority 0 jobs
are scheduled FIFO until no more priority 0 jobs will fit, then
priority 1 jobs are scheduled FIFO until no more priority 1 jobs will
fit, etc. Priority only affects jobs if not all jobs in the queue can
be scheduled due to not enough resources. If two jobs of differing
priority cannot both fit within existing resources, but either of them
will fit, the job with the lower priority number will be run first.
Jazz does not use job preemption so once a job has started, it will
run until it is finished, aborts with an error or is deleted by the
owner or an administrator.
General Scheduling Guidelines
Jazz is expected to support all kinds of different usage, we do not
want to unnecessarily limit the kinds of work that could be done on
Jazz. We would prefer to maintain a flexible scheduling policy based
on the needs of the community rather than impose a strict policy.
As a consequence of this liberal scheduling policy, there have been
instances in the past where people inappropriately overused the machine,
resulting in delays for everyone else. Unfortunately, as Jazz becomes
more heavily used, there are more people overusing the machine.
We ask that all users follow these guidelines:
- Follow good queue etiquette. This means, among other
things, don't put in jobs that use a large number of nodes that
run for a long time. If your job uses several hundred nodes, limit
the run time to a few hours. Likewise, if your job runs for several
hundred hours, limit the number of nodes to less than a dozen or so.
- Clearly, some work will require use of the machine beyond those
bounds. In those cases, please send a quick note to
support@lcrc.anl.gov so that we can be aware of these. If
necessary, we can arrange for a reservation and notify the
community.
- If we get well-founded complaints from other users of the
system about your jobs, we will attempt to contact you to determine
the best course of action. However, under some circumstances, we
may have to kill running jobs. We don't want to do this, but it has
been necessary a few times.
- We strongly encourage checkpointing. Checkpointing not
only allows you to recover from a job that has died unexpectedly,
but can also allow you to break a long-running job into smaller
chunks that are therefore easier to schedule.