From 30106b8ce2cc2243514116d6f29086e6deecc754 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Wed, 4 May 2011 15:38:19 +0200 Subject: slub: Fix the lockless code on 32-bit platforms with no 64-bit cmpxchg The SLUB allocator use of the cmpxchg_double logic was wrong: it actually needs the irq-safe one. That happens automatically when we use the native unlocked 'cmpxchg8b' instruction, but when compiling the kernel for older x86 CPUs that do not support that instruction, we fall back to the generic emulation code. And if you don't specify that you want the irq-safe version, the generic code ends up just open-coding the cmpxchg8b equivalent without any protection against interrupts or preemption. Which definitely doesn't work for SLUB. This was reported by Werner Landgraf , who saw instability with his distro-kernel that was compiled to support pretty much everything under the sun. Most big Linux distributions tend to compile for PPro and later, and would never have noticed this problem. This also fixes the prototypes for the irqsafe cmpxchg_double functions to use 'bool' like they should. [ Btw, that whole "generic code defaults to no protection" design just sounds stupid - if the code needs no protection, there is no reason to use "cmpxchg_double" to begin with. So we should probably just remove the unprotected version entirely as pointless. - Linus ] Signed-off-by: Thomas Gleixner Reported-and-tested-by: werner Acked-and-tested-by: Ingo Molnar Acked-by: Christoph Lameter Cc: Pekka Enberg Cc: Jens Axboe Cc: Tejun Heo Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1105041539050.3005@ionos Signed-off-by: Ingo Molnar Signed-off-by: Linus Torvalds --- mm/slub.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'mm') diff --git a/mm/slub.c b/mm/slub.c index 94d2a33..9d2e5e4 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1940,7 +1940,7 @@ redo: * Since this is without lock semantics the protection is only against * code executing on this cpu *not* from access by other cpus. */ - if (unlikely(!this_cpu_cmpxchg_double( + if (unlikely(!irqsafe_cpu_cmpxchg_double( s->cpu_slab->freelist, s->cpu_slab->tid, object, tid, get_freepointer(s, object), next_tid(tid)))) { @@ -2145,7 +2145,7 @@ redo: set_freepointer(s, object, c->freelist); #ifdef CONFIG_CMPXCHG_LOCAL - if (unlikely(!this_cpu_cmpxchg_double( + if (unlikely(!irqsafe_cpu_cmpxchg_double( s->cpu_slab->freelist, s->cpu_slab->tid, c->freelist, tid, object, next_tid(tid)))) { -- cgit v1.1 From a1fde08c74e90accd62d4cfdbf580d2ede938fe7 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Wed, 4 May 2011 21:30:28 -0700 Subject: VM: skip the stack guard page lookup in get_user_pages only for mlock The logic in __get_user_pages() used to skip the stack guard page lookup whenever the caller wasn't interested in seeing what the actual page was. But Michel Lespinasse points out that there are cases where we don't care about the physical page itself (so 'pages' may be NULL), but do want to make sure a page is mapped into the virtual address space. So using the existence of the "pages" array as an indication of whether to look up the guard page or not isn't actually so great, and we really should just use the FOLL_MLOCK bit. But because that bit was only set for the VM_LOCKED case (and not all vma's necessarily have it, even for mlock()), we couldn't do that originally. Fix that by moving the VM_LOCKED check deeper into the call-chain, which actually simplifies many things. Now mlock() gets simpler, and we can also check for FOLL_MLOCK in __get_user_pages() and the code ends up much more straightforward. Reported-and-reviewed-by: Michel Lespinasse Cc: stable@kernel.org Signed-off-by: Linus Torvalds --- mm/memory.c | 7 +++---- mm/mlock.c | 5 +---- 2 files changed, 4 insertions(+), 8 deletions(-) (limited to 'mm') diff --git a/mm/memory.c b/mm/memory.c index 607098d4..27f4253 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1359,7 +1359,7 @@ split_fallthrough: */ mark_page_accessed(page); } - if (flags & FOLL_MLOCK) { + if ((flags & FOLL_MLOCK) && (vma->vm_flags & VM_LOCKED)) { /* * The preliminary mapping check is mainly to avoid the * pointless overhead of lock_page on the ZERO_PAGE @@ -1552,10 +1552,9 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, } /* - * If we don't actually want the page itself, - * and it's the stack guard page, just skip it. + * For mlock, just skip the stack guard page. */ - if (!pages && stack_guard_page(vma, start)) + if ((gup_flags & FOLL_MLOCK) && stack_guard_page(vma, start)) goto next_page; do { diff --git a/mm/mlock.c b/mm/mlock.c index 6b55e3e..516b2c2 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -162,7 +162,7 @@ static long __mlock_vma_pages_range(struct vm_area_struct *vma, VM_BUG_ON(end > vma->vm_end); VM_BUG_ON(!rwsem_is_locked(&mm->mmap_sem)); - gup_flags = FOLL_TOUCH; + gup_flags = FOLL_TOUCH | FOLL_MLOCK; /* * We want to touch writable mappings with a write fault in order * to break COW, except for shared mappings because these don't COW @@ -178,9 +178,6 @@ static long __mlock_vma_pages_range(struct vm_area_struct *vma, if (vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC)) gup_flags |= FOLL_FORCE; - if (vma->vm_flags & VM_LOCKED) - gup_flags |= FOLL_MLOCK; - return __get_user_pages(current, mm, addr, nr_pages, gup_flags, NULL, NULL, nonblocking); } -- cgit v1.1 From a09a79f66874c905af35d5bb5e5f2fdc7b6b894d Mon Sep 17 00:00:00 2001 From: Mikulas Patocka Date: Mon, 9 May 2011 13:01:09 +0200 Subject: Don't lock guardpage if the stack is growing up Linux kernel excludes guard page when performing mlock on a VMA with down-growing stack. However, some architectures have up-growing stack and locking the guard page should be excluded in this case too. This patch fixes lvm2 on PA-RISC (and possibly other architectures with up-growing stack). lvm2 calculates number of used pages when locking and when unlocking and reports an internal error if the numbers mismatch. [ Patch changed fairly extensively to also fix /proc//maps for the grows-up case, and to move things around a bit to clean it all up and share the infrstructure with the /proc bits. Tested on ia64 that has both grow-up and grow-down segments - Linus ] Signed-off-by: Mikulas Patocka Tested-by: Tony Luck Cc: stable@kernel.org Signed-off-by: Linus Torvalds --- mm/memory.c | 16 +++++++--------- 1 file changed, 7 insertions(+), 9 deletions(-) (limited to 'mm') diff --git a/mm/memory.c b/mm/memory.c index 27f4253..61e66f0 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1412,9 +1412,8 @@ no_page_table: static inline int stack_guard_page(struct vm_area_struct *vma, unsigned long addr) { - return (vma->vm_flags & VM_GROWSDOWN) && - (vma->vm_start == addr) && - !vma_stack_continue(vma->vm_prev, addr); + return stack_guard_page_start(vma, addr) || + stack_guard_page_end(vma, addr+PAGE_SIZE); } /** @@ -1551,12 +1550,6 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, continue; } - /* - * For mlock, just skip the stack guard page. - */ - if ((gup_flags & FOLL_MLOCK) && stack_guard_page(vma, start)) - goto next_page; - do { struct page *page; unsigned int foll_flags = gup_flags; @@ -1573,6 +1566,11 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, int ret; unsigned int fault_flags = 0; + /* For mlock, just skip the stack guard page. */ + if (foll_flags & FOLL_MLOCK) { + if (stack_guard_page(vma, start)) + goto next_page; + } if (foll_flags & FOLL_WRITE) fault_flags |= FAULT_FLAG_WRITE; if (nonblocking) -- cgit v1.1 From 42c36f63ac1366ab0ecc2d5717821362c259f517 Mon Sep 17 00:00:00 2001 From: Hugh Dickins Date: Mon, 9 May 2011 17:44:42 -0700 Subject: vm: fix vm_pgoff wrap in upward expansion Commit a626ca6a6564 ("vm: fix vm_pgoff wrap in stack expansion") fixed the case of an expanding mapping causing vm_pgoff wrapping when you had downward stack expansion. But there was another case where IA64 and PA-RISC expand mappings: upward expansion. This fixes that case too. Signed-off-by: Hugh Dickins Cc: stable@kernel.org Signed-off-by: Linus Torvalds --- mm/mmap.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) (limited to 'mm') diff --git a/mm/mmap.c b/mm/mmap.c index e27e0cf..772140c 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1767,10 +1767,13 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address) size = address - vma->vm_start; grow = (address - vma->vm_end) >> PAGE_SHIFT; - error = acct_stack_growth(vma, size, grow); - if (!error) { - vma->vm_end = address; - perf_event_mmap(vma); + error = -ENOMEM; + if (vma->vm_pgoff + (size >> PAGE_SHIFT) >= vma->vm_pgoff) { + error = acct_stack_growth(vma, size, grow); + if (!error) { + vma->vm_end = address; + perf_event_mmap(vma); + } } } vma_unlock_anon_vma(vma); -- cgit v1.1