Skip to main content

Search Docs by Keyword

Table of Contents

Slurm memory limits

Slurm imposes a memory limit on each job. By default, it is deliberately small — 100 MB per node. If your job uses more than that, you will get an error that your job Exceeded job memory limit and the job state will be OUT_OF_MEMORY. For example, a job that ran out of memory will show OUT_OF_ME+ in the State column (second to last):

[jharvard@holylogin05 ~]$ sacct -j 13860026
JobID        JobName    Partition  Account    AllocCPUS  State      ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
13860026     parallel_+ test       jharvard_+          8 FAILED          1:0
13860026.ba+ batch                 jharvard_+          8 FAILED          1:0
13860026.ex+ extern                jharvard_+          8 COMPLETED       0:0
13860026.0   matlab                jharvard_+          8 OUT_OF_ME+    0:125

Also note that the number recorded by slurm for memory usage will be inaccurate if the job terminated due to being out of memory.  To get an accurate measurement you must have a job that completes successfully as then slurm will record the true memory peak.

Requesting more memory

To set a larger limit, add to your job submission:

#SBATCH --mem X

where X is the maximum amount of memory your job will use per node, in MB. Or, if you prefer to use GB unit, you can add G after X. For example, to request 4GB:

#SBATCH --mem 4G

The larger your working data set, the larger --mem needs to be. However, it is easier for the scheduler to find a place to run your job when you request smaller memory amounts. To determine an appropriate value, start relatively large (job slots on average have about 4000 MB per core, but that’s much larger than needed for most jobs) and then use jobstats to look at how much your job is actually using or used (see details below).

Monitor memory usage

jobstats allows you to see the current memory usage of a running job or the memory usage of a completed job (again, for jobs that exited with OUT_OF_MEMORY, the memory usage is not accurate). To see memory usage with jobstats, execute:

jobstats JOBID

where JOBID is the job you are interested in.  You should set the memory you request to something a little larger than what jobstats reports, since --mem defines a hard upper limit.

You can also use seff-account to get summary information about jobs over a period of time.

Multi-node jobs

Note that for parallel jobs spanning multiple nodes, this is the maximum memory used on any one node; if you’re not setting an even distribution of tasks per node (e.g. with --ntasks-per-node), the same job could have very different values when run at different times.

jobstats is particularly useful for multi-node jobs because it shows the current memory usage of a running job or the memory usage of a completed job.

© The President and Fellows of Harvard College.
Except where otherwise noted, this content is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license.