aboutsummaryrefslogtreecommitdiffstats
path: root/arch/powerpc
Commit message (Collapse)AuthorAgeFilesLines
* Merge remote-tracking branch 'korg/linux-3.0.y' into cm-13.0rogersb112015-11-1014-17/+41
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: crypto/algapi.c drivers/gpu/drm/i915/i915_debugfs.c drivers/gpu/drm/i915/intel_display.c drivers/video/fbmem.c include/linux/nls.h kernel/cgroup.c kernel/signal.c kernel/timeconst.pl net/ipv4/ping.c Change-Id: I1f532925d1743df74d66bcdd6fc92f05c72ee0dd
| * powerpc: Fix parameter clobber in csum_partial_copy_generic()Paul E. McKenney2013-10-131-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit d9813c3681a36774b254c0cdc9cce53c9e22c756 upstream. The csum_partial_copy_generic() uses register r7 to adjust the remaining bytes to process. Unfortunately, r7 also holds a parameter, namely the address of the flag to set in case of access exceptions while reading the source buffer. Lacking a quantum implementation of PowerPC, this commit instead uses register r9 to do the adjusting, leaving r7's pointer uncorrupted. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * powerpc/vio: Fix modalias_show return valuesPrarit Bhargava2013-10-131-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit e82b89a6f19bae73fb064d1b3dd91fcefbb478f4 upstream. modalias_show() should return an empty string on error, not -ENODEV. This causes the following false and annoying error: > find /sys/devices -name modalias -print0 | xargs -0 cat >/dev/null cat: /sys/devices/vio/4000/modalias: No such device cat: /sys/devices/vio/4001/modalias: No such device cat: /sys/devices/vio/4002/modalias: No such device cat: /sys/devices/vio/4004/modalias: No such device cat: /sys/devices/vio/modalias: No such device Signed-off-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * powerpc/iommu: Use GFP_KERNEL instead of GFP_ATOMIC in iommu_init_table()Nishanth Aravamudan2013-10-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 1cf389df090194a0976dc867b7fffe99d9d490cb upstream. Under heavy (DLPAR?) stress, we tripped this panic() in arch/powerpc/kernel/iommu.c::iommu_init_table(): page = alloc_pages_node(nid, GFP_ATOMIC, get_order(sz)); if (!page) panic("iommu_init_table: Can't allocate %ld bytes\n", sz); Before the panic() we got a page allocation failure for an order-2 allocation. There appears to be memory free, but perhaps not in the ATOMIC context. I looked through all the call-sites of iommu_init_table() and didn't see any obvious reason to need an ATOMIC allocation. Most call-sites in fact have an explicit GFP_KERNEL allocation shortly before the call to iommu_init_table(), indicating we are not in an atomic context. There is some indirection for some paths, but I didn't see any locks indicating that GFP_KERNEL is inappropriate. With this change under the same conditions, we have not been able to reproduce the panic. Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * powerpc: Handle unaligned ldbrx/stdbrxAnton Blanchard2013-09-261-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 230aef7a6a23b6166bd4003bfff5af23c9bd381f upstream. Normally when we haven't implemented an alignment handler for a load or store instruction the process will be terminated. The alignment handler uses the DSISR (or a pseudo one) to locate the right handler. Unfortunately ldbrx and stdbrx overlap lfs and stfs so we incorrectly think ldbrx is an lfs and stdbrx is an stfs. This bug is particularly nasty - instead of terminating the process we apply an incorrect fixup and continue on. With more and more overlapping instructions we should stop creating a pseudo DSISR and index using the instruction directly, but for now add a special case to catch ldbrx/stdbrx. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * powerpc: Use -mtraceback=noAnton Blanchard2013-08-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit af9719c3062dfe216a0c3de3fa52be6d22b4456c upstream. gcc 4.7 will be more strict about parsing the -mtraceback option: gcc: error: unrecognized argument in option '-mtraceback=none' gcc: note: valid arguments to '-mtraceback=' are: full no part gcc used to do a 2 char compare so both "no" and "none" would match. Switch to using -mtraceback=no should work everywhere. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * powerpc/modules: Module CRC relocation fix causes perf issuesAnton Blanchard2013-08-042-6/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 0e0ed6406e61434d3f38fb58aa8464ec4722b77e upstream. Module CRCs are implemented as absolute symbols that get resolved by a linker script. We build an intermediate .o that contains an unresolved symbol for each CRC. genksysms parses this .o, calculates the CRCs and writes a linker script that "resolves" the symbols to the calculated CRC. Unfortunately the ppc64 relocatable kernel sees these CRCs as symbols that need relocating and relocates them at boot. Commit d4703aef (module: handle ppc64 relocating kcrctabs when CONFIG_RELOCATABLE=y) added a hook to reverse the bogus relocations. Part of this patch created a symbol at 0x0: # head -2 /proc/kallsyms 0000000000000000 T reloc_start c000000000000000 T .__start This reloc_start symbol is causing lots of confusion to perf. It thinks reloc_start is a massive function that stretches from 0x0 to 0xc000000000000000 and we get various cryptic errors out of perf, including: problem incrementing symbol count, skipping event This patch removes the reloc_start linker script label and instead defines it as PHYSICAL_START. We also need to wrap it with CONFIG_PPC64 because the ppc32 kernel can set a non zero PHYSICAL_START at compile time and we wouldn't want to subtract it from the CRCs in that case. Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * powerpc: fix numa distance for form0 device treeVaidyanathan Srinivasan2013-05-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 7122beeee7bc1757682049780179d7c216dd1c83 upstream. The following commit breaks numa distance setup for old powerpc systems that use form0 encoding in device tree. commit 41eab6f88f24124df89e38067b3766b7bef06ddb powerpc/numa: Use form 1 affinity to setup node distance Device tree node /rtas/ibm,associativity-reference-points would index into /cpus/PowerPCxxxx/ibm,associativity based on form0 or form1 encoding detected by ibm,architecture-vec-5 property. All modern systems use form1 and current kernel code is correct. However, on older systems with form0 encoding, the numa distance will get hard coded as LOCAL_DISTANCE for all nodes. This causes task scheduling anomaly since scheduler will skip building numa level domain (topmost domain with all cpus) if all numa distances are same. (value of 'level' in sched_init_numa() will remain 0) Prior to the above commit: ((from) == (to) ? LOCAL_DISTANCE : REMOTE_DISTANCE) Restoring compatible behavior with this patch for old powerpc systems with device tree where numa distance are encoded as form0. Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * powerpc/spufs: Initialise inode->i_ino in spufs_new_inode()Michael Ellerman2013-05-071-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 6747e83235caecd30b186d1282e4eba7679f81b7 upstream. In commit 85fe402 (fs: do not assign default i_ino in new_inode), the initialisation of i_ino was removed from new_inode() and pushed down into the callers. However spufs_new_inode() was not updated. This exhibits as no files appearing in /spu, because all our dirents have a zero inode, which readdir() seems to dislike. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * powerpc: Add isync to copy_and_flushMichael Neuling2013-05-071-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 29ce3c5073057991217916abc25628e906911757 upstream. In __after_prom_start we copy the kernel down to zero in two calls to copy_and_flush. After the first call (copy from 0 to copy_to_here:) we jump to the newly copied code soon after. Unfortunately there's no isync between the copy of this code and the jump to it. Hence it's possible that stale instructions could still be in the icache or pipeline before we branch to it. We've seen this on real machines and it's results in no console output after: calling quiesce... returning from prom_init The below adds an isync to ensure that the copy and flushing has completed before any branching to the new instructions occurs. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * powerpc: pSeries_lpar_hpte_remove fails from Adjunct partition being ↵Michael Wolf2013-04-121-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | performed before the ANDCOND test commit 9fb2640159f9d4f5a2a9d60e490482d4cbecafdb upstream. Some versions of pHyp will perform the adjunct partition test before the ANDCOND test. The result of this is that H_RESOURCE can be returned and cause the BUG_ON condition to occur. The HPTE is not removed. So add a check for H_RESOURCE, it is ok if this HPTE is not removed as pSeries_lpar_hpte_remove is looking for an HPTE to remove and not a specific HPTE to remove. So it is ok to just move on to the next slot and try again. Signed-off-by: Michael Wolf <mjw@linux.vnet.ibm.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * signal: Define __ARCH_HAS_SA_RESTORER so we know whether to clear sa_restorerBen Hutchings2013-04-051-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Vaguely based on upstream commit 574c4866e33d 'consolidate kernel-side struct sigaction declarations'. flush_signal_handlers() needs to know whether sigaction::sa_restorer is defined, not whether SA_RESTORER is defined. Define the __ARCH_HAS_SA_RESTORER macro to indicate this. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * powerpc: Fix cputable entry for 970MP rev 1.0Benjamin Herrenschmidt2013-03-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | commit d63ac5f6cf31c8a83170a9509b350c1489a7262b upstream. Commit 44ae3ab3358e962039c36ad4ae461ae9fb29596c forgot to update the entry for the 970MP rev 1.0 processor when moving some CPU features bits to the MMU feature bit mask. This breaks booting on some rare G5 models using that chip revision. Reported-by: Phileas Fogg <phileas-fogg@mail.ru> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * powerpc/kexec: Disable hard IRQ before kexecPhileas Fogg2013-02-281-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | commit 8520e443aa56cc157b015205ea53e7b9fc831291 upstream. Disable hard IRQ before kexec a new kernel image. Not doing it can result in corrupted data in the memory segments reserved for the new kernel. Signed-off-by: Phileas Fogg <phileas-fogg@mail.ru> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | Merge remote-tracking branch 'kernelorg/linux-3.0.y' into 3_0_64Andrew Dodd2013-02-2715-51/+75
|\ \ | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: arch/arm/Kconfig arch/arm/include/asm/hwcap.h arch/arm/kernel/smp.c arch/arm/plat-samsung/adc.c drivers/gpu/drm/i915/i915_reg.h drivers/gpu/drm/i915/intel_drv.h drivers/mmc/core/sd.c drivers/net/tun.c drivers/net/usb/usbnet.c drivers/regulator/max8997.c drivers/usb/core/hub.c drivers/usb/host/xhci.h drivers/usb/serial/qcserial.c fs/jbd2/transaction.c include/linux/migrate.h kernel/sys.c kernel/time/timekeeping.c lib/genalloc.c mm/memory-failure.c mm/memory_hotplug.c mm/mempolicy.c mm/page_alloc.c mm/vmalloc.c mm/vmscan.c mm/vmstat.c scripts/Kbuild.include Change-Id: I91e2d85c07320c7ccfc04cf98a448e89bed6ade6
| * powerpc: fix wii_memory_fixups() compile error on 3.0.y treeShuah Khan2013-01-211-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [not upstream as the code involved was removed in the 3.3.0 release] Fix wii_memory_fixups() the following compile error on 3.0.y tree with wii_defconfig on 3.0.y tree. CC arch/powerpc/platforms/embedded6xx/wii.o arch/powerpc/platforms/embedded6xx/wii.c: In function ‘wii_memory_fixups’: arch/powerpc/platforms/embedded6xx/wii.c:88:2: error: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 2 has type ‘phys_addr_t’ [-Werror=format] arch/powerpc/platforms/embedded6xx/wii.c:88:2: error: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 3 has type ‘phys_addr_t’ [-Werror=format] arch/powerpc/platforms/embedded6xx/wii.c:90:2: error: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 2 has type ‘phys_addr_t’ [-Werror=format] arch/powerpc/platforms/embedded6xx/wii.c:90:2: error: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 3 has type ‘phys_addr_t’ [-Werror=format] cc1: all warnings being treated as errors make[2]: *** [arch/powerpc/platforms/embedded6xx/wii.o] Error 1 make[1]: *** [arch/powerpc/platforms/embedded6xx] Error 2 make: *** [arch/powerpc/platforms] Error 2 Signed-off-by: Shuah Khan <shuah.khan@hp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * KVM: PPC: 44x: fix DCR read/writeAlexander Graf2013-01-171-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | commit e43a028752fed049e4bd94ef895542f96d79fa74 upstream. When remembering the direction of a DCR transaction, we should write to the same variable that we interpret on later when doing vcpu_run again. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: CAI Qian <caiqian@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * powerpc/vdso: Remove redundant locking in update_vsyscall_tz()Shan Hai2013-01-171-5/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit ce73ec6db47af84d1466402781ae0872a9e7873c upstream. The locking in update_vsyscall_tz() is not only unnecessary because the vdso code copies the data unproteced in __kernel_gettimeofday() but also introduces a hard to reproduce race condition between update_vsyscall() and update_vsyscall_tz(), which causes user space process to loop forever in vdso code. The following patch removes the locking from update_vsyscall_tz(). Locking is not only unnecessary because the vdso code copies the data unprotected in __kernel_gettimeofday() but also erroneous because updating the tb_update_count is not atomic and introduces a hard to reproduce race condition between update_vsyscall() and update_vsyscall_tz(), which further causes user space process to loop forever in vdso code. The below scenario describes the race condition, x==0 Boot CPU other CPU proc_P: x==0 timer interrupt update_vsyscall x==1 x++;sync settimeofday update_vsyscall_tz x==2 x++;sync x==3 sync;x++ sync;x++ proc_P: x==3 (loops until x becomes even) Because the ++ operator would be implemented as three instructions and not atomic on powerpc. A similar change was made for x86 in commit 6c260d58634 ("x86: vdso: Remove bogus locking in update_vsyscall_tz") Signed-off-by: Shan Hai <shan.hai@windriver.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * powerpc: Fix CONFIG_RELOCATABLE=y CONFIG_CRASH_DUMP=n buildAnton Blanchard2013-01-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 11ee7e99f35ecb15f59b21da6a82d96d2cd3fcc8 upstream. If we build a kernel with CONFIG_RELOCATABLE=y CONFIG_CRASH_DUMP=n, the kernel fails when we run at a non zero offset. It turns out we were incorrectly wrapping some of the relocatable kernel code with CONFIG_CRASH_DUMP. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * powerpc: Keep thread.dscr and thread.dscr_inherit in syncAnton Blanchard2012-12-172-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 00ca0de02f80924dfff6b4f630e1dff3db005e35 upstream. When we update the DSCR either via emulation of mtspr(DSCR) or via a change to dscr_default in sysfs we don't update thread.dscr. We will eventually update it at context switch time but there is a period where thread.dscr is incorrect. If we fork at this point we will copy the old value of thread.dscr into the child. To avoid this, always keep thread.dscr in sync with reality. This issue was found with the following testcase: http://ozlabs.org/~anton/junkcode/dscr_inherit_test.c Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Haren Myneni <haren@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * powerpc: Update DSCR on all CPUs when writing sysfs dscr_defaultAnton Blanchard2012-12-171-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 1b6ca2a6fe56e7697d57348646e07df08f43b1bb upstream. Writing to dscr_default in sysfs doesn't actually change the DSCR - we rely on a context switch on each CPU to do the work. There is no guarantee we will get a context switch in a reasonable amount of time so fire off an IPI to force an immediate change. This issue was found with the following test case: http://ozlabs.org/~anton/junkcode/dscr_explicit_test.c Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Haren Myneni <haren@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * powerpc/ptrace: Fix build with gcc 4.6Benjamin Herrenschmidt2012-12-171-4/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit e69b742a6793dc5bf16f6eedca534d4bc10d68b2 upstream. gcc (rightfully) complains that we are accessing beyond the end of the fpr array (we do, to access the fpscr). The only sane thing to do (whether anything in that code can be called remotely sane is debatable) is to special case fpscr and handle it as a separate statement. I initially tried to do it it by making the array access conditional to index < PT_FPSCR and using a 3rd else leg but for some reason gcc was unable to understand it and still spewed the warning. So I ended up with something a tad more intricated but it seems to build on 32-bit and on 64-bit with and without VSX. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * powerpc: Restore correct DSCR in context switchAnton Blanchard2012-09-142-6/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 714332858bfd40dcf8f741498336d93875c23aa7 upstream. During a context switch we always restore the per thread DSCR value. If we aren't doing explicit DSCR management (ie thread.dscr_inherit == 0) and the default DSCR changed while the process has been sleeping we end up with the wrong value. Check thread.dscr_inherit and select the default DSCR or per thread DSCR as required. This was found with the following test case, when running with more threads than CPUs (ie forcing context switching): http://ozlabs.org/~anton/junkcode/dscr_default_test.c With the four patches applied I can run a combination of all test cases successfully at the same time: http://ozlabs.org/~anton/junkcode/dscr_default_test.c http://ozlabs.org/~anton/junkcode/dscr_explicit_test.c http://ozlabs.org/~anton/junkcode/dscr_inherit_test.c Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * powerpc: Fix DSCR inheritance in copy_thread()Anton Blanchard2012-09-141-10/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 1021cb268b3025573c4811f1dee4a11260c4507b upstream. If the default DSCR is non zero we set thread.dscr_inherit in copy_thread() meaning the new thread and all its children will ignore future updates to the default DSCR. This is not intended and is a change in behaviour that a number of our users have hit. We just need to inherit thread.dscr and thread.dscr_inherit from the parent which ends up being much simpler. This was found with the following test case: http://ozlabs.org/~anton/junkcode/dscr_default_test.c Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * powerpc: Fix wrong divisor in usecs_to_cputimeAndreas Schwab2012-08-092-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 9f5072d4f63f28d30d343573830ac6c85fc0deff upstream. Commit d57af9b (taskstats: use real microsecond granularity for CPU times) renamed msecs_to_cputime to usecs_to_cputime, but failed to update all numbers on the way. This causes nonsensical cpu idle/iowait values to be displayed in /proc/stat (the only user of usecs_to_cputime so far). This also renames __cputime_msec_factor to __cputime_usec_factor, adapting its value and using it directly in cputime_to_usecs instead of doing two multiplications. Signed-off-by: Andreas Schwab <schwab@linux-m68k.org> Acked-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michal Hocko <mhocko@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * powerpc: Add "memory" attribute for mfmsr()Tiejun Chen2012-08-091-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | commit b416c9a10baae6a177b4f9ee858b8d309542fbef upstream. Add "memory" attribute in inline assembly language as a compiler barrier to make sure 4.6.x GCC don't reorder mfmsr(). Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * powerpc/ftrace: Fix assembly trampoline register usageroger blofeld2012-08-091-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | commit fd5a42980e1cf327b7240adf5e7b51ea41c23437 upstream. Just like the module loader, ftrace needs to be updated to use r12 instead of r11 with newer gcc's. Signed-off-by: Roger Blofeld <blofeldus@yahoo.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * powerpc/xmon: Use cpumask iterator to avoid warningAnton Blanchard2012-07-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit bc1d7702910c7c7e88eb60b58429dbfe293683ce upstream. We have a bug report where the kernel hits a warning in the cpumask code: WARNING: at include/linux/cpumask.h:107 Which is: WARN_ON_ONCE(cpu >= nr_cpumask_bits); The backtrace is: cpu_cmd cmds xmon_core xmon die xmon is iterating through 0 to NR_CPUS. I'm not sure why we are still open coding this but iterating above nr_cpu_ids is definitely a bug. This patch iterates through all possible cpus, in case we issue a system reset and CPUs in an offline state call in. Perhaps the old code was trying to handle CPUs that were in the partition but were never started (eg kexec into a kernel with an nr_cpus= boot option). They are going to die way before we get into xmon since we haven't set any kernel state up for them. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * powerpc: Fix kernel panic during kernel module loadSteffen Rumler2012-06-171-6/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 3c75296562f43e6fbc6cddd3de948a7b3e4e9bcf upstream. This fixes a problem which can causes kernel oopses while loading a kernel module. According to the PowerPC EABI specification, GPR r11 is assigned the dedicated function to point to the previous stack frame. In the powerpc-specific kernel module loader, do_plt_call() (in arch/powerpc/kernel/module_32.c), GPR r11 is also used to generate trampoline code. This combination crashes the kernel, in the case where the compiler chooses to use a helper function for saving GPRs on entry, and the module loader has placed the .init.text section far away from the .text section, meaning that it has to generate a trampoline for functions in the .init.text section to call the GPR save helper. Because the trampoline trashes r11, references to the stack frame using r11 can cause an oops. The fix just uses GPR r12 instead of GPR r11 for generating the trampoline code. According to the statements from Freescale, this is safe from an EABI perspective. I've tested the fix for kernel 2.6.33 on MPC8541. Signed-off-by: Steffen Rumler <steffen.rumler.ext@nsn.com> [paulus@samba.org: reworded the description] Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | Merge linux-3.0.31 from korg into jellybeancodeworkx2012-09-187-14/+28
|\ \ | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: arch/arm/mm/proc-v7.S drivers/base/core.c drivers/gpu/drm/i915/i915_gem_execbuffer.c drivers/gpu/drm/i915/intel_display.c drivers/gpu/drm/i915/intel_lvds.c drivers/gpu/drm/radeon/evergreen.c drivers/gpu/drm/radeon/r100.c drivers/gpu/drm/radeon/radeon_connectors.c drivers/gpu/drm/radeon/rs600.c drivers/usb/core/hub.c drivers/usb/host/xhci-pci.c drivers/usb/host/xhci.c drivers/usb/serial/qcserial.c fs/proc/base.c Change-Id: Ia98b35db3f8c0bfd95817867d3acb85be8e5e772
| * powerpc/pmac: Fix SMP kernels on pre-core99 UP machinesBenjamin Herrenschmidt2012-03-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 78c5c68a4cf4329d17abfa469345ddf323d4fd62 upstream. The code for "powersurge" SMP would kick in and cause a crash at boot due to the lack of a NULL test. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Jeremy Kerr <jeremy.kerr@canonical.com> Reported-by: Adam Conrad <adconrad@ubuntu.com> Tested-by: Adam Conrad <adconrad@ubuntu.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * powerpc/perf: power_pmu_start restores incorrect values, breaking frequency ↵Anton Blanchard2012-02-291-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | events commit 9a45a9407c69d068500923480884661e2b9cc421 upstream. perf on POWER stopped working after commit e050e3f0a71b (perf: Fix broken interrupt rate throttling). That patch exposed a bug in the POWER perf_events code. Since the PMCs count upwards and take an exception when the top bit is set, we want to write 0x80000000 - left in power_pmu_start. We were instead programming in left which effectively disables the counter until we eventually hit 0x80000000. This could take seconds or longer. With the patch applied I get the expected number of samples: SAMPLE events: 9948 Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * powerpc: Fix unpaired probe_hcall_entry and probe_hcall_exitLi Zhong2012-01-122-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit e4f387d8db3ba3c2dae4d8bdfe7bb5f4fe1bcb0d upstream. Unpaired calling of probe_hcall_entry and probe_hcall_exit might happen as following, which could cause incorrect preempt count. __trace_hcall_entry => trace_hcall_entry -> probe_hcall_entry => get_cpu_var => preempt_disable __trace_hcall_exit => trace_hcall_exit -> probe_hcall_exit => put_cpu_var => preempt_enable where: A => B and A -> B means A calls B, but => means A will call B through function name, and B will definitely be called. -> means A will call B through function pointer, so B might not be called if the function pointer is not set. So error happens when only one of probe_hcall_entry and probe_hcall_exit get called during a hcall. This patch tries to move the preempt count operations from probe_hcall_entry and probe_hcall_exit to its callers. Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * powerpc/time: Handle wrapping of decrementerAnton Blanchard2012-01-123-9/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 37fb9a0231ee43d42d069863bdfd567fca2b61af upstream. When re-enabling interrupts we have code to handle edge sensitive decrementers by resetting the decrementer to 1 whenever it is negative. If interrupts were disabled long enough that the decrementer wrapped to positive we do nothing. This means interrupts can be delayed for a long time until it finally goes negative again. While we hope interrupts are never be disabled long enough for the decrementer to go positive, we have a very good test team that can drive any kernel into the ground. The softlockup data we get back from these fails could be seconds in the future, completely missing the cause of the lockup. We already keep track of the timebase of the next event so use that to work out if we should trigger a decrementer exception. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* | samsung update 1codeworkx2012-06-025-54/+0
|/
* powerpc: Copy down exception vectors after feature fixupsAnton Blanchard2011-11-216-2/+28
| | | | | | | | | | | | | | | | | commit d715e433b7ad19c02fc4becf0d5e9a59f97925de upstream. kdump fails because we try to execute an HV only instruction. Feature fixups are being applied after we copy the exception vectors down to 0 so they miss out on any updates. We have always had this issue but it only became critical in v3.0 when we added CFAR support (breaks POWER5) and v3.1 when we added POWERNV (breaks everyone). Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* powerpc/ps3: Fix lost SMP IPIsGeoff Levand2011-11-213-1/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | commit 72f3bea075287785ed32b777b6dd2636aa7002e8 upstream. Fixes the PS3 bootup hang introduced in 3.0-rc1 by: commit 317f394160e9beb97d19a84c39b7e5eb3d7815a sched: Move the second half of ttwu() to the remote cpu Move the PS3's LV1 EOI call lv1_end_of_interrupt_ext() from ps3_chip_eoi() to ps3_get_irq() for IPI messages. If lv1_send_event_locally() is called between a previous call to lv1_send_event_locally() and the coresponding call to lv1_end_of_interrupt_ext() the second event will not be delivered to the target cpu. The PS3's SMP IPIs are implemented using lv1_send_event_locally(), so if two IPI messages of the same type are sent to the same target in a relatively short period of time the second IPI event can become lost when lv1_end_of_interrupt_ext() is called from ps3_chip_eoi(). Signed-off-by: Geoff Levand <geoff@infradead.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* powerpc: Fix deadlock in icswx codeAnton Blanchard2011-11-111-6/+6
| | | | | | | | | | | | | | | | | | | | | commit 8bdafa39a47265bc029838b35cc6585f69224afa upstream. The icswx code introduced an A-B B-A deadlock: CPU0 CPU1 ---- ---- lock(&anon_vma->mutex); lock(&mm->mmap_sem); lock(&anon_vma->mutex); lock(&mm->mmap_sem); Instead of using the mmap_sem to keep mm_users constant, take the page table spinlock. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* powerpc/eeh: Fix /proc/ppc64/eeh creationThadeu Lima de Souza Cascardo2011-11-111-1/+1
| | | | | | | | | | | | | | | | | commit 8feaa43494cee5e938fd5a57b9e9bf1c827e6ccd upstream. Since commit 188917e183cf9ad0374b571006d0fc6d48a7f447, /proc/ppc64 is a symlink to /proc/powerpc/. That means that creating /proc/ppc64/eeh will end up with a unaccessible file, that is not listed under /proc/powerpc/ and, then, not listed under /proc/ppc64/. Creating /proc/powerpc/eeh fixes that problem and maintain the compatibility intended with the ppc64 symlink. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* powerpc/pseries: Avoid spurious error during hotplug CPU addAnton Blanchard2011-11-111-0/+4
| | | | | | | | | | | | | | | | commit 9c740025c51a26ab00192cfc464064d4ccbfe3fc upstream. During hotplug CPU add we get the following error: Unexpected Error (0) returned from configure-connector ibm,configure-connector returns 0 for configuration complete, so catch this and avoid the error. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* powerpc: Fix oops when echoing bad values to /sys/devices/system/memory/probeAnton Blanchard2011-11-113-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit a11940978bd598e65996b4f807cf4904793f7025 upstream. If we echo an address the hypervisor doesn't like to /sys/devices/system/memory/probe we oops the box: # echo 0x10000000000 > /sys/devices/system/memory/probe kernel BUG at arch/powerpc/mm/hash_utils_64.c:541! The backtrace is: create_section_mapping arch_add_memory add_memory memory_probe_store sysdev_class_store sysfs_write_file vfs_write SyS_write In create_section_mapping we BUG if htab_bolt_mapping returned an error. A better approach is to return an error which will propagate back to userspace. Rerunning the test with this patch applied: # echo 0x10000000000 > /sys/devices/system/memory/probe -bash: echo: write error: Invalid argument Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* powerpc/numa: Remove double of_node_put in hot_add_node_scn_to_nidAnton Blanchard2011-11-111-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 6083184269fd723affca4f6340e491950267622a upstream. During memory hotplug testing, I got the following warning: ERROR: Bad of_node_put() on /memory@0 of_node_release kref_put of_node_put of_find_node_by_type hot_add_node_scn_to_nid hot_add_scn_to_nid memory_add_physaddr_to_nid ... of_find_node_by_type() loop does the of_node_put for us so we only need the handle the case where we terminate the loop early. As suggested by Stephen Rothwell we can do the of_node_put unconditionally outside of the loop since of_node_put handles a NULL argument fine. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* thp: share get_huge_page_tail()Andrea Arcangeli2011-11-111-11/+0
| | | | | | | | | | | | | | | | | | | | | | | commit b35a35b556f5e6b7993ad0baf20173e75c09ce8c upstream. This avoids duplicating the function in every arch gup_fast. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* powerpc: gup_huge_pmd() return 0 if pte changesAndrea Arcangeli2011-11-111-10/+11
| | | | | | | | | | | | | | | | | | | | | | | | | commit cf592bf768c4fa40282b8fce58a80820065de2cb upstream. powerpc didn't return 0 in that case, if it's rolling back the *nr pointer it should also return zero to avoid adding pages to the array at the wrong offset. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: David Gibson <david@gibson.dropbear.id.au> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* powerpc: gup_hugepte() support THP based tail recountingAndrea Arcangeli2011-11-111-1/+23
| | | | | | | | | | | | | | | | | | | | | commit 3526741f0964c88bc2ce511e1078359052bf225b upstream. Up to this point the code assumed old refcounting for hugepages (pre-thp). This updates the code directly to the thp mapcount tail page refcounting. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* powerpc: gup_hugepte() avoid freeing the head page too many timesAndrea Arcangeli2011-11-111-3/+2
| | | | | | | | | | | | | | | | | | | | commit 8596468487e2062cae2aad56e973784e03959245 upstream. We only taken "refs" pins on the head page not "*nr" pins. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* powerpc: get_hugepte() don't put_page() the wrong pageAndrea Arcangeli2011-11-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 405e44f2e312dd5dd63e5a9f459bffcbcd4368ef upstream. "page" may have changed to point to the next hugepage after the loop completed, The references have been taken on the head page, so the put_page must happen there too. This is a longstanding issue pre-thp inclusion. It's totally unclear how these page_cache_add_speculative and pte_val(pte) != pte_val(*ptep) checks are necessary across all the powerpc gup_fast code, when x86 doesn't need any of that: there's no way the page can be freed with irq disabled so we're guaranteed the atomic_inc will happen on a page with page_count > 0 (so not needing the speculative check). The pte check is also meaningless on x86: no need to rollback on x86 if the pte changed, because the pte can still change a CPU tick after the check succeeded and it won't be rolled back in that case. The important thing is we got a reference on a valid page that was mapped there a CPU tick ago. So not knowing the soft tlb refill code of ppc64 in great detail I'm not removing the "speculative" page_count increase and the pte checks across all the code, but unless there's a strong reason for it they should be later cleaned up too. If a pte can change from huge to non-huge (like it could happen with THP) passing a pte_t *ptep to gup_hugepte() would also require to repeat the is_hugepd in gup_hugepte(), but that shouldn't happen with hugetlbfs only so I'm not altering that. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* powerpc: remove superfluous PageTail checks on the pte gup_fastAndrea Arcangeli2011-11-111-13/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | commit 2839bdc1bfc0af76a2f0f11eca011590520a04fa upstream. This part of gup_fast doesn't seem capable of handling hugetlbfs ptes, those should be handled by gup_hugepd only, so these checks are superfluous. Plus if this wasn't a noop, it would have oopsed because, the insistence of using the speculative refcounting would trigger a VM_BUG_ON if a tail page was encountered in the page_cache_get_speculative(). Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* mm: thp: tail page refcounting fixAndrea Arcangeli2011-11-111-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 70b50f94f1644e2aa7cb374819cfd93f3c28d725 upstream. Michel while working on the working set estimation code, noticed that calling get_page_unless_zero() on a random pfn_to_page(random_pfn) wasn't safe, if the pfn ended up being a tail page of a transparent hugepage under splitting by __split_huge_page_refcount(). He then found the problem could also theoretically materialize with page_cache_get_speculative() during the speculative radix tree lookups that uses get_page_unless_zero() in SMP if the radix tree page is freed and reallocated and get_user_pages is called on it before page_cache_get_speculative has a chance to call get_page_unless_zero(). So the best way to fix the problem is to keep page_tail->_count zero at all times. This will guarantee that get_page_unless_zero() can never succeed on any tail page. page_tail->_mapcount is guaranteed zero and is unused for all tail pages of a compound page, so we can simply account the tail page references there and transfer them to tail_page->_count in __split_huge_page_refcount() (in addition to the head_page->_mapcount). While debugging this s/_count/_mapcount/ change I also noticed get_page is called by direct-io.c on pages returned by get_user_pages. That wasn't entirely safe because the two atomic_inc in get_page weren't atomic. As opposed to other get_user_page users like secondary-MMU page fault to establish the shadow pagetables would never call any superflous get_page after get_user_page returns. It's safer to make get_page universally safe for tail pages and to use get_page_foll() within follow_page (inside get_user_pages()). get_page_foll() is safe to do the refcounting for tail pages without taking any locks because it is run within PT lock protected critical sections (PT lock for pte and page_table_lock for pmd_trans_huge). The standard get_page() as invoked by direct-io instead will now take the compound_lock but still only for tail pages. The direct-io paths are usually I/O bound and the compound_lock is per THP so very finegrined, so there's no risk of scalability issues with it. A simple direct-io benchmarks with all lockdep prove locking and spinlock debugging infrastructure enabled shows identical performance and no overhead. So it's worth it. Ideally direct-io should stop calling get_page() on pages returned by get_user_pages(). The spinlock in get_page() is already optimized away for no-THP builds but doing get_page() on tail pages returned by GUP is generally a rare operation and usually only run in I/O paths. This new refcounting on page_tail->_mapcount in addition to avoiding new RCU critical sections will also allow the working set estimation code to work without any further complexity associated to the tail page refcounting with THP. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reported-by: Michel Lespinasse <walken@google.com> Reviewed-by: Michel Lespinasse <walken@google.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* arch/powerpc/sysdev/fsl_rio.c: correct IECSR register clear valueLiu Gang-B341822011-10-031-2/+3
| | | | | | | | | | | | | | | | | | commit 671ee7f0ce62e4b991b47fcf1c161c3f710dabbc upstream. This bug causes the IECSR register clear failure. In this case, the RETE (retry error threshold exceeded) interrupt will be generated and cannot be cleared. So the related ISR may be called persistently. The RETE bit in IECSR is cleared by writing a 1 to it. Signed-off-by: Liu Gang <Gang.Liu@freescale.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>