Search Docs by Keyword
Slurm memory limits
Slurm imposes a memory limit on each job. By default, it is deliberately small — 100 MB per node. If your job uses more than that, you will get an error that your job Exceeded job memory limit and the job state will be OUT_OF_MEMORY. For example, a job that ran out of memory will show OUT_OF_ME+ in the State column (second to last):
[jharvard@holylogin05 ~]$ sacct -j 13860026 JobID JobName Partition Account AllocCPUS State ExitCode ------------ ---------- ---------- ---------- ---------- ---------- -------- 13860026 parallel_+ test jharvard_+ 8 FAILED 1:0 13860026.ba+ batch jharvard_+ 8 FAILED 1:0 13860026.ex+ extern jharvard_+ 8 COMPLETED 0:0 13860026.0 matlab jharvard_+ 8 OUT_OF_ME+ 0:125
Also note that the number recorded by slurm for memory usage will be inaccurate if the job terminated due to being out of memory. To get an accurate measurement you must have a job that completes successfully as then slurm will record the true memory peak.
Requesting more memory
To set a larger limit, add to your job submission:
#SBATCH --mem X
where X is the maximum amount of memory your job will use per node, in MB. Or, if you prefer to use GB unit, you can add G after X. For example, to request 4GB:
#SBATCH --mem 4G
The larger your working data set, the larger --mem needs to be. However, it is easier for the scheduler to find a place to run your job when you request smaller memory amounts. To determine an appropriate value, start relatively large (job slots on average have about 4000 MB per core, but that’s much larger than needed for most jobs) and then use jobstats to look at how much your job is actually using or used (see details below).
Monitor memory usage
jobstats allows you to see the current memory usage of a running job or the memory usage of a completed job (again, for jobs that exited with OUT_OF_MEMORY, the memory usage is not accurate). To see memory usage with jobstats, execute:
jobstats JOBID
where JOBID is the job you are interested in. You should set the memory you request to something a little larger than what jobstats reports, since --mem defines a hard upper limit.
You can also use seff-account to get summary information about jobs over a period of time.
Multi-node jobs
Note that for parallel jobs spanning multiple nodes, this is the maximum memory used on any one node; if you’re not setting an even distribution of tasks per node (e.g. with --ntasks-per-node), the same job could have very different values when run at different times.
jobstats is particularly useful for multi-node jobs because it shows the current memory usage of a running job or the memory usage of a completed job.
Bookmarkable Links
