aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorDavid Rientjes <rientjes@google.com>2014-01-30 15:46:11 -0800
committerBen Hutchings <ben@decadent.org.uk>2014-04-02 00:58:46 +0100
commit41cf82c2b1e6a027d487c1f92dc3a0e811e1529a (patch)
treeb4ab924817fb4bd27c9d2147e14a1531ecd48248
parent60745e70f4b4725a1ae9387af914de0d25470491 (diff)
downloadkernel_samsung_smdk4412-41cf82c2b1e6a027d487c1f92dc3a0e811e1529a.zip
kernel_samsung_smdk4412-41cf82c2b1e6a027d487c1f92dc3a0e811e1529a.tar.gz
kernel_samsung_smdk4412-41cf82c2b1e6a027d487c1f92dc3a0e811e1529a.tar.bz2
mm, oom: base root bonus on current usage
commit 778c14affaf94a9e4953179d3e13a544ccce7707 upstream. A 3% of system memory bonus is sometimes too excessive in comparison to other processes. With commit a63d83f427fb ("oom: badness heuristic rewrite"), the OOM killer tries to avoid killing privileged tasks by subtracting 3% of overall memory (system or cgroup) from their per-task consumption. But as a result, all root tasks that consume less than 3% of overall memory are considered equal, and so it only takes 33+ privileged tasks pushing the system out of memory for the OOM killer to do something stupid and kill dhclient or other root-owned processes. For example, on a 32G machine it can't tell the difference between the 1M agetty and the 10G fork bomb member. The changelog describes this 3% boost as the equivalent to the global overcommit limit being 3% higher for privileged tasks, but this is not the same as discounting 3% of overall memory from _every privileged task individually_ during OOM selection. Replace the 3% of system memory bonus with a 3% of current memory usage bonus. By giving root tasks a bonus that is proportional to their actual size, they remain comparable even when relatively small. In the example above, the OOM killer will discount the 1M agetty's 256 badness points down to 179, and the 10G fork bomb's 262144 points down to 183500 points and make the right choice, instead of discounting both to 0 and killing agetty because it's first in the task list. Signed-off-by: David Rientjes <rientjes@google.com> Reported-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [bwh: Backported to 3.2: existing code changes 'points' directly rather than using 'adj' variable] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
-rw-r--r--Documentation/filesystems/proc.txt4
-rw-r--r--mm/oom_kill.c2
2 files changed, 3 insertions, 3 deletions
diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 0ec91f0..404a097 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -1293,8 +1293,8 @@ may allocate from based on an estimation of its current memory and swap use.
For example, if a task is using all allowed memory, its badness score will be
1000. If it is using half of its allowed memory, its score will be 500.
-There is an additional factor included in the badness score: root
-processes are given 3% extra memory over other tasks.
+There is an additional factor included in the badness score: the current memory
+and swap usage is discounted by 3% for root processes.
The amount of "allowed" memory depends on the context in which the oom killer
was called. If it is due to the memory assigned to the allocating task's cpuset
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 069b64e..4dda948 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -213,7 +213,7 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
* implementation used by LSMs.
*/
if (has_capability_noaudit(p, CAP_SYS_ADMIN))
- points -= 30;
+ points -= (points * 3) / 100;
/*
* /proc/pid/oom_score_adj ranges from -1000 to +1000 such that it may