From ec3918604c896df59632d47bd2ed874cbc2f262b Mon Sep 17 00:00:00 2001 From: Mel Gorman Date: Mon, 11 Feb 2013 14:52:36 +0000 Subject: x86/mm: Check if PUD is large when validating a kernel address commit 0ee364eb316348ddf3e0dfcd986f5f13f528f821 upstream. A user reported the following oops when a backup process reads /proc/kcore: BUG: unable to handle kernel paging request at ffffbb00ff33b000 IP: [] kern_addr_valid+0xbe/0x110 [...] Call Trace: [] read_kcore+0x17a/0x370 [] proc_reg_read+0x77/0xc0 [] vfs_read+0xc7/0x130 [] sys_read+0x53/0xa0 [] system_call_fastpath+0x16/0x1b Investigation determined that the bug triggered when reading system RAM at the 4G mark. On this system, that was the first address using 1G pages for the virt->phys direct mapping so the PUD is pointing to a physical address, not a PMD page. The problem is that the page table walker in kern_addr_valid() is not checking pud_large() and treats the physical address as if it was a PMD. If it happens to look like pmd_none then it'll silently fail, probably returning zeros instead of real data. If the data happens to look like a present PMD though, it will be walked resulting in the oops above. This patch adds the necessary pud_large() check. Unfortunately the problem was not readily reproducible and now they are running the backup program without accessing /proc/kcore so the patch has not been validated but I think it makes sense. Signed-off-by: Mel Gorman Reviewed-by: Rik van Riel Reviewed-by: Michal Hocko Acked-by: Johannes Weiner Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20130211145236.GX21389@suse.de Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/pgtable.h | 5 +++++ arch/x86/mm/init_64.c | 3 +++ 2 files changed, 8 insertions(+) (limited to 'arch') diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 884507e..6be9909 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -142,6 +142,11 @@ static inline unsigned long pmd_pfn(pmd_t pmd) return (pmd_val(pmd) & PTE_PFN_MASK) >> PAGE_SHIFT; } +static inline unsigned long pud_pfn(pud_t pud) +{ + return (pud_val(pud) & PTE_PFN_MASK) >> PAGE_SHIFT; +} + #define pte_page(pte) pfn_to_page(pte_pfn(pte)) static inline int pmd_large(pmd_t pte) diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index bbaaa00..44b93da 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -831,6 +831,9 @@ int kern_addr_valid(unsigned long addr) if (pud_none(*pud)) return 0; + if (pud_large(*pud)) + return pfn_valid(pud_pfn(*pud)); + pmd = pmd_offset(pud, addr); if (pmd_none(*pmd)) return 0; -- cgit v1.1 From 3339af37f35aff045db6a830185dddfac424c937 Mon Sep 17 00:00:00 2001 From: Jan Beulich Date: Thu, 24 Jan 2013 13:11:10 +0000 Subject: x86/xen: don't assume %ds is usable in xen_iret for 32-bit PVOPS. commit 13d2b4d11d69a92574a55bfd985cfb0ca77aebdc upstream. This fixes CVE-2013-0228 / XSA-42 Drew Jones while working on CVE-2013-0190 found that that unprivileged guest user in 32bit PV guest can use to crash the > guest with the panic like this: ------------- general protection fault: 0000 [#1] SMP last sysfs file: /sys/devices/vbd-51712/block/xvda/dev Modules linked in: sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 xen_netfront ext4 mbcache jbd2 xen_blkfront dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] Pid: 1250, comm: r Not tainted 2.6.32-356.el6.i686 #1 EIP: 0061:[] EFLAGS: 00010086 CPU: 0 EIP is at xen_iret+0x12/0x2b EAX: eb8d0000 EBX: 00000001 ECX: 08049860 EDX: 00000010 ESI: 00000000 EDI: 003d0f00 EBP: b77f8388 ESP: eb8d1fe0 DS: 0000 ES: 007b FS: 0000 GS: 00e0 SS: 0069 Process r (pid: 1250, ti=eb8d0000 task=c2953550 task.ti=eb8d0000) Stack: 00000000 0027f416 00000073 00000206 b77f8364 0000007b 00000000 00000000 Call Trace: Code: c3 8b 44 24 18 81 4c 24 38 00 02 00 00 8d 64 24 30 e9 03 00 00 00 8d 76 00 f7 44 24 08 00 00 02 80 75 33 50 b8 00 e0 ff ff 21 e0 <8b> 40 10 8b 04 85 a0 f6 ab c0 8b 80 0c b0 b3 c0 f6 44 24 0d 02 EIP: [] xen_iret+0x12/0x2b SS:ESP 0069:eb8d1fe0 general protection fault: 0000 [#2] ---[ end trace ab0d29a492dcd330 ]--- Kernel panic - not syncing: Fatal exception Pid: 1250, comm: r Tainted: G D --------------- 2.6.32-356.el6.i686 #1 Call Trace: [] ? panic+0x6e/0x122 [] ? oops_end+0xbc/0xd0 [] ? do_general_protection+0x0/0x210 [] ? error_code+0x73/ ------------- Petr says: " I've analysed the bug and I think that xen_iret() cannot cope with mangled DS, in this case zeroed out (null selector/descriptor) by either xen_failsafe_callback() or RESTORE_REGS because the corresponding LDT entry was invalidated by the reproducer. " Jan took a look at the preliminary patch and came up a fix that solves this problem: "This code gets called after all registers other than those handled by IRET got already restored, hence a null selector in %ds or a non-null one that got loaded from a code or read-only data descriptor would cause a kernel mode fault (with the potential of crashing the kernel as a whole, if panic_on_oops is set)." The way to fix this is to realize that the we can only relay on the registers that IRET restores. The two that are guaranteed are the %cs and %ss as they are always fixed GDT selectors. Also they are inaccessible from user mode - so they cannot be altered. This is the approach taken in this patch. Another alternative option suggested by Jan would be to relay on the subtle realization that using the %ebp or %esp relative references uses the %ss segment. In which case we could switch from using %eax to %ebp and would not need the %ss over-rides. That would also require one extra instruction to compensate for the one place where the register is used as scaled index. However Andrew pointed out that is too subtle and if further work was to be done in this code-path it could escape folks attention and lead to accidents. Reviewed-by: Petr Matousek Reported-by: Petr Matousek Reviewed-by: Andrew Cooper Signed-off-by: Jan Beulich Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Greg Kroah-Hartman --- arch/x86/xen/xen-asm_32.S | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) (limited to 'arch') diff --git a/arch/x86/xen/xen-asm_32.S b/arch/x86/xen/xen-asm_32.S index b040b0e..7328f71 100644 --- a/arch/x86/xen/xen-asm_32.S +++ b/arch/x86/xen/xen-asm_32.S @@ -88,11 +88,11 @@ ENTRY(xen_iret) */ #ifdef CONFIG_SMP GET_THREAD_INFO(%eax) - movl TI_cpu(%eax), %eax - movl __per_cpu_offset(,%eax,4), %eax - mov xen_vcpu(%eax), %eax + movl %ss:TI_cpu(%eax), %eax + movl %ss:__per_cpu_offset(,%eax,4), %eax + mov %ss:xen_vcpu(%eax), %eax #else - movl xen_vcpu, %eax + movl %ss:xen_vcpu, %eax #endif /* check IF state we're restoring */ @@ -105,11 +105,11 @@ ENTRY(xen_iret) * resuming the code, so we don't have to be worried about * being preempted to another CPU. */ - setz XEN_vcpu_info_mask(%eax) + setz %ss:XEN_vcpu_info_mask(%eax) xen_iret_start_crit: /* check for unmasked and pending */ - cmpw $0x0001, XEN_vcpu_info_pending(%eax) + cmpw $0x0001, %ss:XEN_vcpu_info_pending(%eax) /* * If there's something pending, mask events again so we can @@ -117,7 +117,7 @@ xen_iret_start_crit: * touch XEN_vcpu_info_mask. */ jne 1f - movb $1, XEN_vcpu_info_mask(%eax) + movb $1, %ss:XEN_vcpu_info_mask(%eax) 1: popl %eax -- cgit v1.1 From dbb694e810c87e7e1760527a783437f26ac5a547 Mon Sep 17 00:00:00 2001 From: "H. Peter Anvin" Date: Thu, 31 Jan 2013 13:53:10 -0800 Subject: x86-32, mm: Remove reference to resume_map_numa_kva() commit bb112aec5ee41427e9b9726e3d57b896709598ed upstream. Remove reference to removed function resume_map_numa_kva(). Signed-off-by: H. Peter Anvin Cc: Dave Hansen Link: http://lkml.kernel.org/r/20130131005616.1C79F411@kernel.stglabs.ibm.com Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/mmzone_32.h | 6 ------ arch/x86/power/hibernate_32.c | 2 -- 2 files changed, 8 deletions(-) (limited to 'arch') diff --git a/arch/x86/include/asm/mmzone_32.h b/arch/x86/include/asm/mmzone_32.h index ffa037f..a6a6414 100644 --- a/arch/x86/include/asm/mmzone_32.h +++ b/arch/x86/include/asm/mmzone_32.h @@ -14,12 +14,6 @@ extern struct pglist_data *node_data[]; #include -extern void resume_map_numa_kva(pgd_t *pgd); - -#else /* !CONFIG_NUMA */ - -static inline void resume_map_numa_kva(pgd_t *pgd) {} - #endif /* CONFIG_NUMA */ #ifdef CONFIG_DISCONTIGMEM diff --git a/arch/x86/power/hibernate_32.c b/arch/x86/power/hibernate_32.c index 3769079..a09ecb9 100644 --- a/arch/x86/power/hibernate_32.c +++ b/arch/x86/power/hibernate_32.c @@ -130,8 +130,6 @@ static int resume_physical_mapping_init(pgd_t *pgd_base) } } - resume_map_numa_kva(pgd_base); - return 0; } -- cgit v1.1 From 20807f47cffc5e85f9127c260621d12dc3b7814b Mon Sep 17 00:00:00 2001 From: Stefan Bader Date: Fri, 15 Feb 2013 09:48:52 +0100 Subject: xen: Send spinlock IPI to all waiters commit 76eaca031f0af2bb303e405986f637811956a422 upstream. There is a loophole between Xen's current implementation of pv-spinlocks and the scheduler. This was triggerable through a testcase until v3.6 changed the TLB flushing code. The problem potentially is still there just not observable in the same way. What could happen was (is): 1. CPU n tries to schedule task x away and goes into a slow wait for the runq lock of CPU n-# (must be one with a lower number). 2. CPU n-#, while processing softirqs, tries to balance domains and goes into a slow wait for its own runq lock (for updating some records). Since this is a spin_lock_irqsave in softirq context, interrupts will be re-enabled for the duration of the poll_irq hypercall used by Xen. 3. Before the runq lock of CPU n-# is unlocked, CPU n-1 receives an interrupt (e.g. endio) and when processing the interrupt, tries to wake up task x. But that is in schedule and still on_cpu, so try_to_wake_up goes into a tight loop. 4. The runq lock of CPU n-# gets unlocked, but the message only gets sent to the first waiter, which is CPU n-# and that is busily stuck. 5. CPU n-# never returns from the nested interruption to take and release the lock because the scheduler uses a busy wait. And CPU n never finishes the task migration because the unlock notification only went to CPU n-#. To avoid this and since the unlocking code has no real sense of which waiter is best suited to grab the lock, just send the IPI to all of them. This causes the waiters to return from the hyper- call (those not interrupted at least) and do active spinlocking. BugLink: http://bugs.launchpad.net/bugs/1011792 Acked-by: Jan Beulich Signed-off-by: Stefan Bader Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Greg Kroah-Hartman --- arch/x86/xen/spinlock.c | 1 - 1 file changed, 1 deletion(-) (limited to 'arch') diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c index cc9b1e1..d99537f 100644 --- a/arch/x86/xen/spinlock.c +++ b/arch/x86/xen/spinlock.c @@ -313,7 +313,6 @@ static noinline void xen_spin_unlock_slow(struct xen_spinlock *xl) if (per_cpu(lock_spinners, cpu) == xl) { ADD_STATS(released_slow_kicked, 1); xen_send_IPI_one(cpu, XEN_SPIN_UNLOCK_VECTOR); - break; } } } -- cgit v1.1 From 58c9ce6fad8e00d9726447f939fe7e78e2aec891 Mon Sep 17 00:00:00 2001 From: Christian Borntraeger Date: Fri, 25 Jan 2013 15:34:15 +0100 Subject: s390/kvm: Fix store status for ACRS/FPRS commit 15bc8d8457875f495c59d933b05770ba88d1eacb upstream. On store status we need to copy the current state of registers into a save area. Currently we might save stale versions: The sie state descriptor doesnt have fields for guest ACRS,FPRS, those registers are simply stored in the host registers. The host program must copy these away if needed. We do that in vcpu_put/load. If we now do a store status in KVM code between vcpu_put/load, the saved values are not up-to-date. Lets collect the ACRS/FPRS before saving them. This also fixes some strange problems with hotplug and virtio-ccw, since the low level machine check handler (on hotplug a machine check will happen) will revalidate all registers with the content of the save area. Signed-off-by: Christian Borntraeger Signed-off-by: Gleb Natapov Signed-off-by: Greg Kroah-Hartman --- arch/s390/kvm/kvm-s390.c | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'arch') diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 2ada634..25ab200 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -584,6 +584,14 @@ int kvm_s390_vcpu_store_status(struct kvm_vcpu *vcpu, unsigned long addr) } else prefix = 0; + /* + * The guest FPRS and ACRS are in the host FPRS/ACRS due to the lazy + * copying in vcpu load/put. Lets update our copies before we save + * it into the save area + */ + save_fp_regs(&vcpu->arch.guest_fpregs); + save_access_regs(vcpu->run->s.regs.acrs); + if (__guestcopy(vcpu, addr + offsetof(struct save_area, fp_regs), vcpu->arch.guest_fpregs.fprs, 128, prefix)) return -EFAULT; -- cgit v1.1 From 928de5bcadf8540f58ba6b12c6b7547d33dcde89 Mon Sep 17 00:00:00 2001 From: Igor Grinberg Date: Sun, 13 Jan 2013 13:49:47 +0200 Subject: ARM: PXA3xx: program the CSMSADRCFG register commit d107a204154ddd79339203c2deeb7433f0cf6777 upstream. The Chip Select Configuration Register must be programmed to 0x2 in order to achieve the correct behavior of the Static Memory Controller. Without this patch devices wired to DFI and accessed through SMC cannot be accessed after resume from S2. Do not rely on the boot loader to program the CSMSADRCFG register by programming it in the kernel smemc module. Signed-off-by: Igor Grinberg Acked-by: Eric Miao Signed-off-by: Haojian Zhuang Signed-off-by: Greg Kroah-Hartman --- arch/arm/mach-pxa/include/mach/smemc.h | 1 + arch/arm/mach-pxa/smemc.c | 15 ++++++++++++++- 2 files changed, 15 insertions(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/arm/mach-pxa/include/mach/smemc.h b/arch/arm/mach-pxa/include/mach/smemc.h index 654adc9..301bf0e 100644 --- a/arch/arm/mach-pxa/include/mach/smemc.h +++ b/arch/arm/mach-pxa/include/mach/smemc.h @@ -37,6 +37,7 @@ #define CSADRCFG1 (SMEMC_VIRT + 0x84) /* Address Configuration Register for CS1 */ #define CSADRCFG2 (SMEMC_VIRT + 0x88) /* Address Configuration Register for CS2 */ #define CSADRCFG3 (SMEMC_VIRT + 0x8C) /* Address Configuration Register for CS3 */ +#define CSMSADRCFG (SMEMC_VIRT + 0xA0) /* Chip Select Configuration Register */ /* * More handy macros for PCMCIA diff --git a/arch/arm/mach-pxa/smemc.c b/arch/arm/mach-pxa/smemc.c index 7992305..f38aa89 100644 --- a/arch/arm/mach-pxa/smemc.c +++ b/arch/arm/mach-pxa/smemc.c @@ -40,6 +40,8 @@ static void pxa3xx_smemc_resume(void) __raw_writel(csadrcfg[1], CSADRCFG1); __raw_writel(csadrcfg[2], CSADRCFG2); __raw_writel(csadrcfg[3], CSADRCFG3); + /* CSMSADRCFG wakes up in its default state (0), so we need to set it */ + __raw_writel(0x2, CSMSADRCFG); } static struct syscore_ops smemc_syscore_ops = { @@ -49,8 +51,19 @@ static struct syscore_ops smemc_syscore_ops = { static int __init smemc_init(void) { - if (cpu_is_pxa3xx()) + if (cpu_is_pxa3xx()) { + /* + * The only documentation we have on the + * Chip Select Configuration Register (CSMSADRCFG) is that + * it must be programmed to 0x2. + * Moreover, in the bit definitions, the second bit + * (CSMSADRCFG[1]) is called "SETALWAYS". + * Other bits are reserved in this register. + */ + __raw_writel(0x2, CSMSADRCFG); + register_syscore_ops(&smemc_syscore_ops); + } return 0; } -- cgit v1.1 From 534aaed9080ba82974e8d2a71e43a362918138f8 Mon Sep 17 00:00:00 2001 From: Phileas Fogg Date: Sat, 23 Feb 2013 00:32:19 +0100 Subject: powerpc/kexec: Disable hard IRQ before kexec commit 8520e443aa56cc157b015205ea53e7b9fc831291 upstream. Disable hard IRQ before kexec a new kernel image. Not doing it can result in corrupted data in the memory segments reserved for the new kernel. Signed-off-by: Phileas Fogg Signed-off-by: Benjamin Herrenschmidt Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/kernel/machine_kexec_64.c | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'arch') diff --git a/arch/powerpc/kernel/machine_kexec_64.c b/arch/powerpc/kernel/machine_kexec_64.c index 583af70..cac9d2c 100644 --- a/arch/powerpc/kernel/machine_kexec_64.c +++ b/arch/powerpc/kernel/machine_kexec_64.c @@ -163,6 +163,8 @@ static int kexec_all_irq_disabled = 0; static void kexec_smp_down(void *arg) { local_irq_disable(); + hard_irq_disable(); + mb(); /* make sure our irqs are disabled before we say they are */ get_paca()->kexec_state = KEXEC_STATE_IRQS_OFF; while(kexec_all_irq_disabled == 0) @@ -245,6 +247,8 @@ static void kexec_prepare_cpus(void) wake_offline_cpus(); smp_call_function(kexec_smp_down, NULL, /* wait */0); local_irq_disable(); + hard_irq_disable(); + mb(); /* make sure IRQs are disabled before we say they are */ get_paca()->kexec_state = KEXEC_STATE_IRQS_OFF; @@ -282,6 +286,7 @@ static void kexec_prepare_cpus(void) if (ppc_md.kexec_cpu_down) ppc_md.kexec_cpu_down(0, 0); local_irq_disable(); + hard_irq_disable(); } #endif /* SMP */ -- cgit v1.1 From ef96576ef50b8bbd4c63b9cebab372cb7cf4ba67 Mon Sep 17 00:00:00 2001 From: John David Anglin Date: Mon, 14 Jan 2013 19:45:00 -0500 Subject: Purge existing TLB entries in set_pte_at and ptep_set_wrprotect MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 7139bc1579901b53db7e898789e916ee2fb52d78 upstream. This patch goes a long way toward fixing the minifail bug, and it  significantly improves the stability of SMP machines such as the rp3440.  When write  protecting a page for COW, we need to purge the existing translation.  Otherwise, the COW break doesn't occur as expected because the TLB may still have a stale entry which allows writes. [jejb: fix up checkpatch errors] Signed-off-by: John David Anglin Signed-off-by: James Bottomley Signed-off-by: Greg Kroah-Hartman --- arch/parisc/include/asm/pgtable.h | 13 ++++++++++--- arch/parisc/kernel/cache.c | 18 ++++++++++++++++++ 2 files changed, 28 insertions(+), 3 deletions(-) (limited to 'arch') diff --git a/arch/parisc/include/asm/pgtable.h b/arch/parisc/include/asm/pgtable.h index 22dadeb..9d35a3e 100644 --- a/arch/parisc/include/asm/pgtable.h +++ b/arch/parisc/include/asm/pgtable.h @@ -12,11 +12,10 @@ #include #include +#include #include #include -struct vm_area_struct; - /* * kern_addr_valid(ADDR) tests if ADDR is pointing to valid kernel * memory. For the return value to be meaningful, ADDR must be >= @@ -40,7 +39,14 @@ struct vm_area_struct; do{ \ *(pteptr) = (pteval); \ } while(0) -#define set_pte_at(mm,addr,ptep,pteval) set_pte(ptep,pteval) + +extern void purge_tlb_entries(struct mm_struct *, unsigned long); + +#define set_pte_at(mm, addr, ptep, pteval) \ + do { \ + set_pte(ptep, pteval); \ + purge_tlb_entries(mm, addr); \ + } while (0) #endif /* !__ASSEMBLY__ */ @@ -464,6 +470,7 @@ static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, old = pte_val(*ptep); new = pte_val(pte_wrprotect(__pte (old))); } while (cmpxchg((unsigned long *) ptep, old, new) != old); + purge_tlb_entries(mm, addr); #else pte_t old_pte = *ptep; set_pte_at(mm, addr, ptep, pte_wrprotect(old_pte)); diff --git a/arch/parisc/kernel/cache.c b/arch/parisc/kernel/cache.c index 83335f3..5241698 100644 --- a/arch/parisc/kernel/cache.c +++ b/arch/parisc/kernel/cache.c @@ -421,6 +421,24 @@ void kunmap_parisc(void *addr) EXPORT_SYMBOL(kunmap_parisc); #endif +void purge_tlb_entries(struct mm_struct *mm, unsigned long addr) +{ + unsigned long flags; + + /* Note: purge_tlb_entries can be called at startup with + no context. */ + + /* Disable preemption while we play with %sr1. */ + preempt_disable(); + mtsp(mm->context, 1); + purge_tlb_start(flags); + pdtlb(addr); + pitlb(addr); + purge_tlb_end(flags); + preempt_enable(); +} +EXPORT_SYMBOL(purge_tlb_entries); + void __flush_tlb_range(unsigned long sid, unsigned long start, unsigned long end) { -- cgit v1.1 From ffbf1423cc7e8ed018d0ba8fbe3f9f3bb816fa4c Mon Sep 17 00:00:00 2001 From: Joerg Roedel Date: Wed, 6 Feb 2013 12:55:23 +0100 Subject: iommu/amd: Initialize device table after dma_ops commit f528d980c17b8714aedc918ba86e058af914d66b upstream. When dma_ops are initialized the unity mappings are created. The init_device_table_dma() function makes sure DMA from all devices is blocked by default. This opens a short window in time where DMA to unity mapped regions is blocked by the IOMMU. Make sure this does not happen by initializing the device table after dma_ops. Signed-off-by: Joerg Roedel Signed-off-by: Shuah Khan Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/amd_iommu_init.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) (limited to 'arch') diff --git a/arch/x86/kernel/amd_iommu_init.c b/arch/x86/kernel/amd_iommu_init.c index 33df6e8..d86aa3f 100644 --- a/arch/x86/kernel/amd_iommu_init.c +++ b/arch/x86/kernel/amd_iommu_init.c @@ -1363,6 +1363,7 @@ static struct syscore_ops amd_iommu_syscore_ops = { */ static int __init amd_iommu_init(void) { + struct amd_iommu *iommu; int i, ret = 0; /* @@ -1411,9 +1412,6 @@ static int __init amd_iommu_init(void) if (amd_iommu_pd_alloc_bitmap == NULL) goto free; - /* init the device table */ - init_device_table(); - /* * let all alias entries point to itself */ @@ -1463,6 +1461,12 @@ static int __init amd_iommu_init(void) if (ret) goto free_disable; + /* init the device table */ + init_device_table(); + + for_each_iommu(iommu) + iommu_flush_all_caches(iommu); + amd_iommu_init_api(); amd_iommu_init_notifier(); -- cgit v1.1 From 93b6b299f7a97a8ce2f8ab7b14195f8d2d131905 Mon Sep 17 00:00:00 2001 From: "H. Peter Anvin" Date: Wed, 27 Feb 2013 12:46:40 -0800 Subject: x86: Make sure we can boot in the case the BDA contains pure garbage commit 7c10093692ed2e6f318387d96b829320aa0ca64c upstream. On non-BIOS platforms it is possible that the BIOS data area contains garbage instead of being zeroed or something equivalent (firmware people: we are talking of 1.5K here, so please do the sane thing.) We need on the order of 20-30K of low memory in order to boot, which may grow up to < 64K in the future. We probably want to avoid the lowest of the low memory. At the same time, it seems extremely unlikely that a legitimate EBDA would ever reach down to the 128K (which would require it to be over half a megabyte in size.) Thus, pick 128K as the cutoff for "this is insane, ignore." We may still end up reserving a bunch of extra memory on the low megabyte, but that is not really a major issue these days. In the worst case we lose 512K of RAM. This code really should be merged with trim_bios_range() in arch/x86/kernel/setup.c, but that is a bigger patch for a later merge window. Reported-by: Darren Hart Signed-off-by: H. Peter Anvin Cc: Matt Fleming Link: http://lkml.kernel.org/n/tip-oebml055yyfm8yxmria09rja@git.kernel.org Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/head.c | 53 ++++++++++++++++++++++++++++++++------------------ 1 file changed, 34 insertions(+), 19 deletions(-) (limited to 'arch') diff --git a/arch/x86/kernel/head.c b/arch/x86/kernel/head.c index af0699b..f6c4674 100644 --- a/arch/x86/kernel/head.c +++ b/arch/x86/kernel/head.c @@ -5,8 +5,6 @@ #include #include -#define BIOS_LOWMEM_KILOBYTES 0x413 - /* * The BIOS places the EBDA/XBDA at the top of conventional * memory, and usually decreases the reported amount of @@ -16,17 +14,30 @@ * chipset: reserve a page before VGA to prevent PCI prefetch * into it (errata #56). Usually the page is reserved anyways, * unless you have no PS/2 mouse plugged in. + * + * This functions is deliberately very conservative. Losing + * memory in the bottom megabyte is rarely a problem, as long + * as we have enough memory to install the trampoline. Using + * memory that is in use by the BIOS or by some DMA device + * the BIOS didn't shut down *is* a big problem. */ + +#define BIOS_LOWMEM_KILOBYTES 0x413 +#define LOWMEM_CAP 0x9f000U /* Absolute maximum */ +#define INSANE_CUTOFF 0x20000U /* Less than this = insane */ + void __init reserve_ebda_region(void) { unsigned int lowmem, ebda_addr; - /* To determine the position of the EBDA and the */ - /* end of conventional memory, we need to look at */ - /* the BIOS data area. In a paravirtual environment */ - /* that area is absent. We'll just have to assume */ - /* that the paravirt case can handle memory setup */ - /* correctly, without our help. */ + /* + * To determine the position of the EBDA and the + * end of conventional memory, we need to look at + * the BIOS data area. In a paravirtual environment + * that area is absent. We'll just have to assume + * that the paravirt case can handle memory setup + * correctly, without our help. + */ if (paravirt_enabled()) return; @@ -37,19 +48,23 @@ void __init reserve_ebda_region(void) /* start of EBDA area */ ebda_addr = get_bios_ebda(); - /* Fixup: bios puts an EBDA in the top 64K segment */ - /* of conventional memory, but does not adjust lowmem. */ - if ((lowmem - ebda_addr) <= 0x10000) - lowmem = ebda_addr; + /* + * Note: some old Dells seem to need 4k EBDA without + * reporting so, so just consider the memory above 0x9f000 + * to be off limits (bugzilla 2990). + */ + + /* If the EBDA address is below 128K, assume it is bogus */ + if (ebda_addr < INSANE_CUTOFF) + ebda_addr = LOWMEM_CAP; - /* Fixup: bios does not report an EBDA at all. */ - /* Some old Dells seem to need 4k anyhow (bugzilla 2990) */ - if ((ebda_addr == 0) && (lowmem >= 0x9f000)) - lowmem = 0x9f000; + /* If lowmem is less than 128K, assume it is bogus */ + if (lowmem < INSANE_CUTOFF) + lowmem = LOWMEM_CAP; - /* Paranoia: should never happen, but... */ - if ((lowmem == 0) || (lowmem >= 0x100000)) - lowmem = 0x9f000; + /* Use the lower of the lowmem and EBDA markers as the cutoff */ + lowmem = min(lowmem, ebda_addr); + lowmem = min(lowmem, LOWMEM_CAP); /* Absolute cap */ /* reserve all memory between lowmem and the 1MB mark */ memblock_x86_reserve_range(lowmem, 0x100000, "* BIOS reserved"); -- cgit v1.1 From f39edfbf6dbf8e29cfbafd67d93fa1e30196701c Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Thu, 7 Feb 2013 09:44:13 -0800 Subject: x86: Do not leak kernel page mapping locations commit e575a86fdc50d013bf3ad3aa81d9100e8e6cc60d upstream. Without this patch, it is trivial to determine kernel page mappings by examining the error code reported to dmesg[1]. Instead, declare the entire kernel memory space as a violation of a present page. Additionally, since show_unhandled_signals is enabled by default, switch branch hinting to the more realistic expectation, and unobfuscate the setting of the PF_PROT bit to improve readability. [1] http://vulnfactory.org/blog/2013/02/06/a-linux-memory-trick/ Reported-by: Dan Rosenberg Suggested-by: Brad Spengler Signed-off-by: Kees Cook Acked-by: H. Peter Anvin Cc: Paul E. McKenney Cc: Frederic Weisbecker Cc: Eric W. Biederman Cc: Linus Torvalds Cc: Andrew Morton Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/20130207174413.GA12485@www.outflux.net Signed-off-by: Ingo Molnar Signed-off-by: CAI Qian Signed-off-by: Greg Kroah-Hartman --- arch/x86/mm/fault.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) (limited to 'arch') diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 2dbf6bf..3b2ad91 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -720,12 +720,15 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code, if (is_errata100(regs, address)) return; - if (unlikely(show_unhandled_signals)) + /* Kernel addresses are always protection faults: */ + if (address >= TASK_SIZE) + error_code |= PF_PROT; + + if (likely(show_unhandled_signals)) show_signal_msg(regs, error_code, address, tsk); - /* Kernel addresses are always protection faults: */ tsk->thread.cr2 = address; - tsk->thread.error_code = error_code | (address >= TASK_SIZE); + tsk->thread.error_code = error_code; tsk->thread.trap_no = 14; force_sig_info_fault(SIGSEGV, si_code, address, tsk, 0); -- cgit v1.1 From 2212f47b734e5b9461b5c3f555dc653ea7aa212f Mon Sep 17 00:00:00 2001 From: Stoney Wang Date: Thu, 7 Feb 2013 10:53:02 -0800 Subject: x86/apic: Work around boot failure on HP ProLiant DL980 G7 Server systems commit cb214ede7657db458fd0b2a25ea0b28dbf900ebc upstream. When a HP ProLiant DL980 G7 Server boots a regular kernel, there will be intermittent lost interrupts which could result in a hang or (in extreme cases) data loss. The reason is that this system only supports x2apic physical mode, while the kernel boots with a logical-cluster default setting. This bug can be worked around by specifying the "x2apic_phys" or "nox2apic" boot option, but we want to handle this system without requiring manual workarounds. The BIOS sets ACPI_FADT_APIC_PHYSICAL in FADT table. As all apicids are smaller than 255, BIOS need to pass the control to the OS with xapic mode, according to x2apic-spec, chapter 2.9. Current code handle x2apic when BIOS pass with xapic mode enabled: When user specifies x2apic_phys, or FADT indicates PHYSICAL: 1. During madt oem check, apic driver is set with xapic logical or xapic phys driver at first. 2. enable_IR_x2apic() will enable x2apic_mode. 3. if user specifies x2apic_phys on the boot line, x2apic_phys_probe() will install the correct x2apic phys driver and use x2apic phys mode. Otherwise it will skip the driver will let x2apic_cluster_probe to take over to install x2apic cluster driver (wrong one) even though FADT indicates PHYSICAL, because x2apic_phys_probe does not check FADT PHYSICAL. Add checking x2apic_fadt_phys in x2apic_phys_probe() to fix the problem. Signed-off-by: Stoney Wang [ updated the changelog and simplified the code ] Signed-off-by: Yinghai Lu Signed-off-by: Zhang Lin-Bao [ make a patch specially for 3.0.66] Link: http://lkml.kernel.org/r/1360263182-16226-1-git-send-email-yinghai@kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/apic/x2apic_phys.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) (limited to 'arch') diff --git a/arch/x86/kernel/apic/x2apic_phys.c b/arch/x86/kernel/apic/x2apic_phys.c index f5373df..db4f704 100644 --- a/arch/x86/kernel/apic/x2apic_phys.c +++ b/arch/x86/kernel/apic/x2apic_phys.c @@ -20,12 +20,19 @@ static int set_x2apic_phys_mode(char *arg) } early_param("x2apic_phys", set_x2apic_phys_mode); +static bool x2apic_fadt_phys(void) +{ + if ((acpi_gbl_FADT.header.revision >= FADT2_REVISION_ID) && + (acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL)) { + printk(KERN_DEBUG "System requires x2apic physical mode\n"); + return true; + } + return false; +} + static int x2apic_acpi_madt_oem_check(char *oem_id, char *oem_table_id) { - if (x2apic_phys) - return x2apic_enabled(); - else - return 0; + return x2apic_enabled() && (x2apic_phys || x2apic_fadt_phys()); } static void @@ -108,7 +115,7 @@ static void init_x2apic_ldr(void) static int x2apic_phys_probe(void) { - if (x2apic_mode && x2apic_phys) + if (x2apic_mode && (x2apic_phys || x2apic_fadt_phys())) return 1; return apic == &apic_x2apic_phys; -- cgit v1.1 From d81d788db85abd39fd7753e2482f748c48de202a Mon Sep 17 00:00:00 2001 From: Jiri Slaby Date: Mon, 4 Mar 2013 06:09:07 +0800 Subject: s390/kvm: Fix store status for ACRS/FPRS fix In 3.0.67, commit 58c9ce6fad8e00d9726447f939fe7e78e2aec891 (s390/kvm: Fix store status for ACRS/FPRS), upstream commit 15bc8d8457875f495c59d933b05770ba88d1eacb, added a call to save_access_regs to save ACRS. But we do not have ARCS in kvm_run in 3.0 yet, so this results in: arch/s390/kvm/kvm-s390.c: In function 'kvm_s390_vcpu_store_status': arch/s390/kvm/kvm-s390.c:593: error: 'struct kvm_run' has no member named 's' Fix it by saving guest_acrs which is where ARCS are in 3.0. Signed-off-by: Jiri Slaby Signed-off-by: Greg Kroah-Hartman --- arch/s390/kvm/kvm-s390.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 25ab200..f9804b7 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -590,7 +590,7 @@ int kvm_s390_vcpu_store_status(struct kvm_vcpu *vcpu, unsigned long addr) * it into the save area */ save_fp_regs(&vcpu->arch.guest_fpregs); - save_access_regs(vcpu->run->s.regs.acrs); + save_access_regs(vcpu->arch.guest_acrs); if (__guestcopy(vcpu, addr + offsetof(struct save_area, fp_regs), vcpu->arch.guest_fpregs.fprs, 128, prefix)) -- cgit v1.1 From 95a2b9b9ce9db0856f0603f394a628e1360f79ae Mon Sep 17 00:00:00 2001 From: Russell King Date: Mon, 25 Feb 2013 16:09:12 +0000 Subject: ARM: VFP: fix emulation of second VFP instruction MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 5e4ba617c1b584b2e376f31a63bd4e734109318a upstream. Martin Storsjö reports that the sequence: ee312ac1 vsub.f32 s4, s3, s2 ee702ac0 vsub.f32 s5, s1, s0 e59f0028 ldr r0, [pc, #40] ee111a90 vmov r1, s3 on Raspberry Pi (implementor 41 architecture 1 part 20 variant b rev 5) where s3 is a denormal and s2 is zero results in incorrect behaviour - the instruction "vsub.f32 s5, s1, s0" is not executed: VFP: bounce: trigger ee111a90 fpexc d0000780 VFP: emulate: INST=0xee312ac1 SCR=0x00000000 ... As we can see, the instruction triggering the exception is the "vmov" instruction, and we emulate the "vsub.f32 s4, s3, s2" but fail to properly take account of the FPEXC_FP2V flag in FPEXC. This is because the test for the second instruction register being valid is bogus, and will always skip emulation of the second instruction. Reported-by: Martin Storsjö Tested-by: Martin Storsjö Signed-off-by: Russell King Signed-off-by: Greg Kroah-Hartman --- arch/arm/vfp/vfpmodule.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c index ce18802..e9c8f53 100644 --- a/arch/arm/vfp/vfpmodule.c +++ b/arch/arm/vfp/vfpmodule.c @@ -369,7 +369,7 @@ void VFP_bounce(u32 trigger, u32 fpexc, struct pt_regs *regs) * If there isn't a second FP instruction, exit now. Note that * the FPEXC.FP2V bit is valid only if FPEXC.EX is 1. */ - if (fpexc ^ (FPEXC_EX | FPEXC_FP2V)) + if ((fpexc & (FPEXC_EX | FPEXC_FP2V)) != (FPEXC_EX | FPEXC_FP2V)) goto exit; /* -- cgit v1.1 From 8b0b58069148d9540e70af1d4963b2fe515efe89 Mon Sep 17 00:00:00 2001 From: Russell King Date: Mon, 25 Feb 2013 16:10:42 +0000 Subject: ARM: fix scheduling while atomic warning in alignment handling code commit b255188f90e2bade1bd11a986dd1ca4861869f4d upstream. Paolo Pisati reports that IPv6 triggers this warning: BUG: scheduling while atomic: swapper/0/0/0x40000100 Modules linked in: [] (unwind_backtrace+0x0/0xf0) from [] (__schedule_bug+0x48/0x5c) [] (__schedule_bug+0x48/0x5c) from [] (__schedule+0x700/0x740) [] (__schedule+0x700/0x740) from [] (__cond_resched+0x24/0x34) [] (__cond_resched+0x24/0x34) from [] (_cond_resched+0x3c/0x44) [] (_cond_resched+0x3c/0x44) from [] (do_alignment+0x178/0x78c) [] (do_alignment+0x178/0x78c) from [] (do_DataAbort+0x34/0x98) [] (do_DataAbort+0x34/0x98) from [] (__dabt_svc+0x40/0x60) Exception stack(0xc0763d70 to 0xc0763db8) 3d60: e97e805e e97e806e 2c000000 11000000 3d80: ea86bb00 0000002c 00000011 e97e807e c076d2a8 e97e805e e97e806e 0000002c 3da0: 3d000000 c0763dbc c04b98fc c02a8490 00000113 ffffffff [] (__dabt_svc+0x40/0x60) from [] (__csum_ipv6_magic+0x8/0xc8) Fix this by using probe_kernel_address() stead of __get_user(). Reported-by: Paolo Pisati Tested-by: Paolo Pisati Signed-off-by: Russell King Signed-off-by: Greg Kroah-Hartman --- arch/arm/mm/alignment.c | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-) (limited to 'arch') diff --git a/arch/arm/mm/alignment.c b/arch/arm/mm/alignment.c index 724ba3b..1aa3a70 100644 --- a/arch/arm/mm/alignment.c +++ b/arch/arm/mm/alignment.c @@ -721,7 +721,6 @@ do_alignment(unsigned long addr, unsigned int fsr, struct pt_regs *regs) unsigned long instr = 0, instrptr; int (*handler)(unsigned long addr, unsigned long instr, struct pt_regs *regs); unsigned int type; - mm_segment_t fs; unsigned int fault; u16 tinstr = 0; int isize = 4; @@ -729,16 +728,15 @@ do_alignment(unsigned long addr, unsigned int fsr, struct pt_regs *regs) instrptr = instruction_pointer(regs); - fs = get_fs(); - set_fs(KERNEL_DS); if (thumb_mode(regs)) { - fault = __get_user(tinstr, (u16 *)(instrptr & ~1)); + u16 *ptr = (u16 *)(instrptr & ~1); + fault = probe_kernel_address(ptr, tinstr); if (!fault) { if (cpu_architecture() >= CPU_ARCH_ARMv7 && IS_T32(tinstr)) { /* Thumb-2 32-bit */ u16 tinst2 = 0; - fault = __get_user(tinst2, (u16 *)(instrptr+2)); + fault = probe_kernel_address(ptr + 1, tinst2); instr = (tinstr << 16) | tinst2; thumb2_32b = 1; } else { @@ -747,8 +745,7 @@ do_alignment(unsigned long addr, unsigned int fsr, struct pt_regs *regs) } } } else - fault = __get_user(instr, (u32 *)instrptr); - set_fs(fs); + fault = probe_kernel_address(instrptr, instr); if (fault) { type = TYPE_FAULT; -- cgit v1.1 From 99c817e4571710eca014345ac44c5ba41e77a853 Mon Sep 17 00:00:00 2001 From: Benjamin Herrenschmidt Date: Wed, 13 Mar 2013 09:55:02 +1100 Subject: powerpc: Fix cputable entry for 970MP rev 1.0 commit d63ac5f6cf31c8a83170a9509b350c1489a7262b upstream. Commit 44ae3ab3358e962039c36ad4ae461ae9fb29596c forgot to update the entry for the 970MP rev 1.0 processor when moving some CPU features bits to the MMU feature bit mask. This breaks booting on some rare G5 models using that chip revision. Reported-by: Phileas Fogg Signed-off-by: Benjamin Herrenschmidt Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/kernel/cputable.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c index 9fb9332..421c9a0 100644 --- a/arch/powerpc/kernel/cputable.c +++ b/arch/powerpc/kernel/cputable.c @@ -268,7 +268,7 @@ static struct cpu_spec __initdata cpu_specs[] = { .cpu_features = CPU_FTRS_PPC970, .cpu_user_features = COMMON_USER_POWER4 | PPC_FEATURE_HAS_ALTIVEC_COMP, - .mmu_features = MMU_FTR_HPTE_TABLE, + .mmu_features = MMU_FTRS_PPC970, .icache_bsize = 128, .dcache_bsize = 128, .num_pmcs = 8, -- cgit v1.1 From 87a42f27adef5e88b8907edbc168de1380e7129e Mon Sep 17 00:00:00 2001 From: Stephane Eranian Date: Fri, 15 Mar 2013 14:26:07 +0100 Subject: perf,x86: fix kernel crash with PEBS/BTS after suspend/resume commit 1d9d8639c063caf6efc2447f5f26aa637f844ff6 upstream. This patch fixes a kernel crash when using precise sampling (PEBS) after a suspend/resume. Turns out the CPU notifier code is not invoked on CPU0 (BP). Therefore, the DS_AREA (used by PEBS) is not restored properly by the kernel and keeps it power-on/resume value of 0 causing any PEBS measurement to crash when running on CPU0. The workaround is to add a hook in the actual resume code to restore the DS Area MSR value. It is invoked for all CPUS. So for all but CPU0, the DS_AREA will be restored twice but this is harmless. Reported-by: Linus Torvalds Signed-off-by: Stephane Eranian Signed-off-by: Linus Torvalds Signed-off-by: Shuah Khan Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/cpu/perf_event_intel_ds.c | 8 ++++++++ arch/x86/power/cpu.c | 2 ++ 2 files changed, 10 insertions(+) (limited to 'arch') diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c index d812fe2..cf82ee5 100644 --- a/arch/x86/kernel/cpu/perf_event_intel_ds.c +++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c @@ -754,6 +754,14 @@ static void intel_ds_init(void) } } +void perf_restore_debug_store(void) +{ + if (!x86_pmu.bts && !x86_pmu.pebs) + return; + + init_debug_store_on_cpu(smp_processor_id()); +} + #else /* CONFIG_CPU_SUP_INTEL */ static void reserve_ds_buffers(void) diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c index 87bb35e..0ea8bd2 100644 --- a/arch/x86/power/cpu.c +++ b/arch/x86/power/cpu.c @@ -10,6 +10,7 @@ #include #include +#include #include #include @@ -224,6 +225,7 @@ static void __restore_processor_state(struct saved_context *ctxt) do_fpu_end(); mtrr_bp_restore(); + perf_restore_debug_store(); } /* Needed by apm.c */ -- cgit v1.1 From fe204aa40cedb6c34c5865be223da8f77d6a1545 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Sun, 17 Mar 2013 15:44:43 -0700 Subject: perf,x86: fix wrmsr_on_cpu() warning on suspend/resume commit 2a6e06b2aed6995af401dcd4feb5e79a0c7ea554 upstream. Commit 1d9d8639c063 ("perf,x86: fix kernel crash with PEBS/BTS after suspend/resume") fixed a crash when doing PEBS performance profiling after resuming, but in using init_debug_store_on_cpu() to restore the DS_AREA mtrr it also resulted in a new WARN_ON() triggering. init_debug_store_on_cpu() uses "wrmsr_on_cpu()", which in turn uses CPU cross-calls to do the MSR update. Which is not really valid at the early resume stage, and the warning is quite reasonable. Now, it all happens to _work_, for the simple reason that smp_call_function_single() ends up just doing the call directly on the CPU when the CPU number matches, but we really should just do the wrmsr() directly instead. This duplicates the wrmsr() logic, but hopefully we can just remove the wrmsr_on_cpu() version eventually. Reported-and-tested-by: Parag Warudkar Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/cpu/perf_event_intel_ds.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c index cf82ee5..c81d1f8 100644 --- a/arch/x86/kernel/cpu/perf_event_intel_ds.c +++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c @@ -756,10 +756,12 @@ static void intel_ds_init(void) void perf_restore_debug_store(void) { + struct debug_store *ds = __this_cpu_read(cpu_hw_events.ds); + if (!x86_pmu.bts && !x86_pmu.pebs) return; - init_debug_store_on_cpu(smp_processor_id()); + wrmsrl(MSR_IA32_DS_AREA, (unsigned long)ds); } #else /* CONFIG_CPU_SUP_INTEL */ -- cgit v1.1 From 2932ef21c24f5f248b869a92c1604e531750df17 Mon Sep 17 00:00:00 2001 From: Heiko Carstens Date: Mon, 4 Mar 2013 14:14:11 +0100 Subject: s390/mm: fix flush_tlb_kernel_range() commit f6a70a07079518280022286a1dceb797d12e1edf upstream. Our flush_tlb_kernel_range() implementation calls __tlb_flush_mm() with &init_mm as argument. __tlb_flush_mm() however will only flush tlbs for the passed in mm if its mm_cpumask is not empty. For the init_mm however its mm_cpumask has never any bits set. Which in turn means that our flush_tlb_kernel_range() implementation doesn't work at all. This can be easily verified with a vmalloc/vfree loop which allocates a page, writes to it and then frees the page again. A crash will follow almost instantly. To fix this remove the cpumask_empty() check in __tlb_flush_mm() since there shouldn't be too many mms with a zero mm_cpumask, besides the init_mm of course. Signed-off-by: Heiko Carstens Signed-off-by: Martin Schwidefsky Signed-off-by: Greg Kroah-Hartman --- arch/s390/include/asm/tlbflush.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'arch') diff --git a/arch/s390/include/asm/tlbflush.h b/arch/s390/include/asm/tlbflush.h index b7a4f2e..d7862ad 100644 --- a/arch/s390/include/asm/tlbflush.h +++ b/arch/s390/include/asm/tlbflush.h @@ -73,8 +73,6 @@ static inline void __tlb_flush_idte(unsigned long asce) static inline void __tlb_flush_mm(struct mm_struct * mm) { - if (unlikely(cpumask_empty(mm_cpumask(mm)))) - return; /* * If the machine has IDTE we prefer to do a per mm flush * on all cpus instead of doing a local flush if the mm -- cgit v1.1 From 84bde6521f50d67ecdb52777da3901430470bd5d Mon Sep 17 00:00:00 2001 From: CQ Tang Date: Mon, 18 Mar 2013 11:02:21 -0400 Subject: x86-64: Fix the failure case in copy_user_handle_tail() commit 66db3feb486c01349f767b98ebb10b0c3d2d021b upstream. The increment of "to" in copy_user_handle_tail() will have incremented before a failure has been noted. This causes us to skip a byte in the failure case. Only do the increment when assured there is no failure. Signed-off-by: CQ Tang Link: http://lkml.kernel.org/r/20130318150221.8439.993.stgit@phlsvslse11.ph.intel.com Signed-off-by: Mike Marciniszyn Signed-off-by: H. Peter Anvin Signed-off-by: Greg Kroah-Hartman --- arch/x86/lib/usercopy_64.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'arch') diff --git a/arch/x86/lib/usercopy_64.c b/arch/x86/lib/usercopy_64.c index b7c2849..554b7b5 100644 --- a/arch/x86/lib/usercopy_64.c +++ b/arch/x86/lib/usercopy_64.c @@ -169,10 +169,10 @@ copy_user_handle_tail(char *to, char *from, unsigned len, unsigned zerorest) char c; unsigned zero_len; - for (; len; --len) { + for (; len; --len, to++) { if (__get_user_nocheck(c, from++, sizeof(char))) break; - if (__put_user_nocheck(c, to++, sizeof(char))) + if (__put_user_nocheck(c, to, sizeof(char))) break; } -- cgit v1.1 From 1c190534db77a019936d95a1826a55bf34d7ed23 Mon Sep 17 00:00:00 2001 From: Ben Hutchings Date: Sun, 25 Nov 2012 22:24:19 -0500 Subject: signal: Define __ARCH_HAS_SA_RESTORER so we know whether to clear sa_restorer Vaguely based on upstream commit 574c4866e33d 'consolidate kernel-side struct sigaction declarations'. flush_signal_handlers() needs to know whether sigaction::sa_restorer is defined, not whether SA_RESTORER is defined. Define the __ARCH_HAS_SA_RESTORER macro to indicate this. Signed-off-by: Ben Hutchings Cc: Al Viro Signed-off-by: Greg Kroah-Hartman --- arch/arm/include/asm/signal.h | 1 + arch/avr32/include/asm/signal.h | 1 + arch/cris/include/asm/signal.h | 1 + arch/h8300/include/asm/signal.h | 1 + arch/m32r/include/asm/signal.h | 1 + arch/m68k/include/asm/signal.h | 1 + arch/mn10300/include/asm/signal.h | 1 + arch/powerpc/include/asm/signal.h | 1 + arch/s390/include/asm/signal.h | 1 + arch/sparc/include/asm/signal.h | 1 + arch/x86/include/asm/signal.h | 2 ++ arch/xtensa/include/asm/signal.h | 1 + 12 files changed, 13 insertions(+) (limited to 'arch') diff --git a/arch/arm/include/asm/signal.h b/arch/arm/include/asm/signal.h index 43ba0fb..559ee24 100644 --- a/arch/arm/include/asm/signal.h +++ b/arch/arm/include/asm/signal.h @@ -127,6 +127,7 @@ struct sigaction { __sigrestore_t sa_restorer; sigset_t sa_mask; /* mask last for extensibility */ }; +#define __ARCH_HAS_SA_RESTORER struct k_sigaction { struct sigaction sa; diff --git a/arch/avr32/include/asm/signal.h b/arch/avr32/include/asm/signal.h index 8790dfc..e6952a0 100644 --- a/arch/avr32/include/asm/signal.h +++ b/arch/avr32/include/asm/signal.h @@ -128,6 +128,7 @@ struct sigaction { __sigrestore_t sa_restorer; sigset_t sa_mask; /* mask last for extensibility */ }; +#define __ARCH_HAS_SA_RESTORER struct k_sigaction { struct sigaction sa; diff --git a/arch/cris/include/asm/signal.h b/arch/cris/include/asm/signal.h index ea6af9a..057fea2 100644 --- a/arch/cris/include/asm/signal.h +++ b/arch/cris/include/asm/signal.h @@ -122,6 +122,7 @@ struct sigaction { void (*sa_restorer)(void); sigset_t sa_mask; /* mask last for extensibility */ }; +#define __ARCH_HAS_SA_RESTORER struct k_sigaction { struct sigaction sa; diff --git a/arch/h8300/include/asm/signal.h b/arch/h8300/include/asm/signal.h index fd8b66e..8695707 100644 --- a/arch/h8300/include/asm/signal.h +++ b/arch/h8300/include/asm/signal.h @@ -121,6 +121,7 @@ struct sigaction { void (*sa_restorer)(void); sigset_t sa_mask; /* mask last for extensibility */ }; +#define __ARCH_HAS_SA_RESTORER struct k_sigaction { struct sigaction sa; diff --git a/arch/m32r/include/asm/signal.h b/arch/m32r/include/asm/signal.h index b2eeb0d..802d561 100644 --- a/arch/m32r/include/asm/signal.h +++ b/arch/m32r/include/asm/signal.h @@ -123,6 +123,7 @@ struct sigaction { __sigrestore_t sa_restorer; sigset_t sa_mask; /* mask last for extensibility */ }; +#define __ARCH_HAS_SA_RESTORER struct k_sigaction { struct sigaction sa; diff --git a/arch/m68k/include/asm/signal.h b/arch/m68k/include/asm/signal.h index 0b6b0e5..ee80858 100644 --- a/arch/m68k/include/asm/signal.h +++ b/arch/m68k/include/asm/signal.h @@ -119,6 +119,7 @@ struct sigaction { __sigrestore_t sa_restorer; sigset_t sa_mask; /* mask last for extensibility */ }; +#define __ARCH_HAS_SA_RESTORER struct k_sigaction { struct sigaction sa; diff --git a/arch/mn10300/include/asm/signal.h b/arch/mn10300/include/asm/signal.h index 1865d72..eecaa76 100644 --- a/arch/mn10300/include/asm/signal.h +++ b/arch/mn10300/include/asm/signal.h @@ -131,6 +131,7 @@ struct sigaction { __sigrestore_t sa_restorer; sigset_t sa_mask; /* mask last for extensibility */ }; +#define __ARCH_HAS_SA_RESTORER struct k_sigaction { struct sigaction sa; diff --git a/arch/powerpc/include/asm/signal.h b/arch/powerpc/include/asm/signal.h index 3eb13be..ec63a0a 100644 --- a/arch/powerpc/include/asm/signal.h +++ b/arch/powerpc/include/asm/signal.h @@ -109,6 +109,7 @@ struct sigaction { __sigrestore_t sa_restorer; sigset_t sa_mask; /* mask last for extensibility */ }; +#define __ARCH_HAS_SA_RESTORER struct k_sigaction { struct sigaction sa; diff --git a/arch/s390/include/asm/signal.h b/arch/s390/include/asm/signal.h index cdf5cb2..c872626 100644 --- a/arch/s390/include/asm/signal.h +++ b/arch/s390/include/asm/signal.h @@ -131,6 +131,7 @@ struct sigaction { void (*sa_restorer)(void); sigset_t sa_mask; /* mask last for extensibility */ }; +#define __ARCH_HAS_SA_RESTORER struct k_sigaction { struct sigaction sa; diff --git a/arch/sparc/include/asm/signal.h b/arch/sparc/include/asm/signal.h index e49b828..4929431 100644 --- a/arch/sparc/include/asm/signal.h +++ b/arch/sparc/include/asm/signal.h @@ -191,6 +191,7 @@ struct __old_sigaction { unsigned long sa_flags; void (*sa_restorer)(void); /* not used by Linux/SPARC yet */ }; +#define __ARCH_HAS_SA_RESTORER typedef struct sigaltstack { void __user *ss_sp; diff --git a/arch/x86/include/asm/signal.h b/arch/x86/include/asm/signal.h index 598457c..6cbc795 100644 --- a/arch/x86/include/asm/signal.h +++ b/arch/x86/include/asm/signal.h @@ -125,6 +125,8 @@ typedef unsigned long sigset_t; extern void do_notify_resume(struct pt_regs *, void *, __u32); # endif /* __KERNEL__ */ +#define __ARCH_HAS_SA_RESTORER + #ifdef __i386__ # ifdef __KERNEL__ struct old_sigaction { diff --git a/arch/xtensa/include/asm/signal.h b/arch/xtensa/include/asm/signal.h index 633ba73..75edf8a 100644 --- a/arch/xtensa/include/asm/signal.h +++ b/arch/xtensa/include/asm/signal.h @@ -133,6 +133,7 @@ struct sigaction { void (*sa_restorer)(void); sigset_t sa_mask; /* mask last for extensibility */ }; +#define __ARCH_HAS_SA_RESTORER struct k_sigaction { struct sigaction sa; -- cgit v1.1 From d104388ff9bdb5ec76d5337cd94f9ed4bbf73fbc Mon Sep 17 00:00:00 2001 From: Jan Kiszka Date: Tue, 19 Mar 2013 12:36:46 +0100 Subject: KVM: Clean up error handling during VCPU creation commit d780592b99d7d8a5ff905f6bacca519d4a342c76 upstream. So far kvm_arch_vcpu_setup is responsible for freeing the vcpu struct if it fails. Move this confusing resonsibility back into the hands of kvm_vm_ioctl_create_vcpu. Only kvm_arch_vcpu_setup of x86 is affected, all other archs cannot fail. Signed-off-by: Jan Kiszka Signed-off-by: Avi Kivity Signed-off-by: Jiri Slaby Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/x86.c | 5 ----- 1 file changed, 5 deletions(-) (limited to 'arch') diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index fbb0936..681eab7 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6116,12 +6116,7 @@ int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu) if (r == 0) r = kvm_mmu_setup(vcpu); vcpu_put(vcpu); - if (r < 0) - goto free_vcpu; - return 0; -free_vcpu: - kvm_x86_ops->vcpu_free(vcpu); return r; } -- cgit v1.1 From 0072625c351588b8fde9e6f46fb60ba2e521fb47 Mon Sep 17 00:00:00 2001 From: Jan Kiszka Date: Tue, 19 Mar 2013 12:36:51 +0100 Subject: KVM: x86: Prevent starting PIT timers in the absence of irqchip support commit 0924ab2cfa98b1ece26c033d696651fd62896c69 upstream. User space may create the PIT and forgets about setting up the irqchips. In that case, firing PIT IRQs will crash the host: BUG: unable to handle kernel NULL pointer dereference at 0000000000000128 IP: [] kvm_set_irq+0x30/0x170 [kvm] ... Call Trace: [] pit_do_work+0x51/0xd0 [kvm] [] process_one_work+0x111/0x4d0 [] worker_thread+0x152/0x340 [] kthread+0x7e/0x90 [] kernel_thread_helper+0x4/0x10 Prevent this by checking the irqchip mode before starting a timer. We can't deny creating the PIT if the irqchips aren't set up yet as current user land expects this order to work. Signed-off-by: Jan Kiszka Signed-off-by: Marcelo Tosatti Signed-off-by: Jiri Slaby Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/i8254.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) (limited to 'arch') diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c index efad723..43e04d1 100644 --- a/arch/x86/kvm/i8254.c +++ b/arch/x86/kvm/i8254.c @@ -338,11 +338,15 @@ static enum hrtimer_restart pit_timer_fn(struct hrtimer *data) return HRTIMER_NORESTART; } -static void create_pit_timer(struct kvm_kpit_state *ps, u32 val, int is_period) +static void create_pit_timer(struct kvm *kvm, u32 val, int is_period) { + struct kvm_kpit_state *ps = &kvm->arch.vpit->pit_state; struct kvm_timer *pt = &ps->pit_timer; s64 interval; + if (!irqchip_in_kernel(kvm)) + return; + interval = muldiv64(val, NSEC_PER_SEC, KVM_PIT_FREQ); pr_debug("create pit timer, interval is %llu nsec\n", interval); @@ -394,13 +398,13 @@ static void pit_load_count(struct kvm *kvm, int channel, u32 val) /* FIXME: enhance mode 4 precision */ case 4: if (!(ps->flags & KVM_PIT_FLAGS_HPET_LEGACY)) { - create_pit_timer(ps, val, 0); + create_pit_timer(kvm, val, 0); } break; case 2: case 3: if (!(ps->flags & KVM_PIT_FLAGS_HPET_LEGACY)){ - create_pit_timer(ps, val, 1); + create_pit_timer(kvm, val, 1); } break; default: -- cgit v1.1 From 8868daebc1b6240d07d5c6428f8bc8631b2bed42 Mon Sep 17 00:00:00 2001 From: Avi Kivity Date: Tue, 19 Mar 2013 12:36:55 +0100 Subject: KVM: Ensure all vcpus are consistent with in-kernel irqchip settings commit 3e515705a1f46beb1c942bb8043c16f8ac7b1e9e upstream. If some vcpus are created before KVM_CREATE_IRQCHIP, then irqchip_in_kernel() and vcpu->arch.apic will be inconsistent, leading to potential NULL pointer dereferences. Fix by: - ensuring that no vcpus are installed when KVM_CREATE_IRQCHIP is called - ensuring that a vcpu has an apic if it is installed after KVM_CREATE_IRQCHIP This is somewhat long winded because vcpu->arch.apic is created without kvm->lock held. Based on earlier patch by Michael Ellerman. Signed-off-by: Michael Ellerman Signed-off-by: Avi Kivity Signed-off-by: Jiri Slaby Signed-off-by: Greg Kroah-Hartman --- arch/ia64/kvm/kvm-ia64.c | 5 +++++ arch/x86/kvm/x86.c | 8 ++++++++ 2 files changed, 13 insertions(+) (limited to 'arch') diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c index 8213efe..a874213 100644 --- a/arch/ia64/kvm/kvm-ia64.c +++ b/arch/ia64/kvm/kvm-ia64.c @@ -1168,6 +1168,11 @@ out: #define PALE_RESET_ENTRY 0x80000000ffffffb0UL +bool kvm_vcpu_compatible(struct kvm_vcpu *vcpu) +{ + return irqchip_in_kernel(vcpu->kcm) == (vcpu->arch.apic != NULL); +} + int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu) { struct kvm_vcpu *v; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 681eab7..024ee68 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3410,6 +3410,9 @@ long kvm_arch_vm_ioctl(struct file *filp, r = -EEXIST; if (kvm->arch.vpic) goto create_irqchip_unlock; + r = -EINVAL; + if (atomic_read(&kvm->online_vcpus)) + goto create_irqchip_unlock; r = -ENOMEM; vpic = kvm_create_pic(kvm); if (vpic) { @@ -6189,6 +6192,11 @@ void kvm_arch_check_processor_compat(void *rtn) kvm_x86_ops->check_processor_compatibility(rtn); } +bool kvm_vcpu_compatible(struct kvm_vcpu *vcpu) +{ + return irqchip_in_kernel(vcpu->kvm) == (vcpu->arch.apic != NULL); +} + int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu) { struct page *page; -- cgit v1.1 From 956fc762ae9fb5f8cf6cd456f508ad431a4653b7 Mon Sep 17 00:00:00 2001 From: Petr Matousek Date: Tue, 19 Mar 2013 12:36:59 +0100 Subject: KVM: x86: invalid opcode oops on SET_SREGS with OSXSAVE bit set (CVE-2012-4461) commit 6d1068b3a98519247d8ba4ec85cd40ac136dbdf9 upstream. On hosts without the XSAVE support unprivileged local user can trigger oops similar to the one below by setting X86_CR4_OSXSAVE bit in guest cr4 register using KVM_SET_SREGS ioctl and later issuing KVM_RUN ioctl. invalid opcode: 0000 [#2] SMP Modules linked in: tun ip6table_filter ip6_tables ebtable_nat ebtables ... Pid: 24935, comm: zoog_kvm_monito Tainted: G D 3.2.0-3-686-pae EIP: 0060:[] EFLAGS: 00210246 CPU: 0 EIP is at kvm_arch_vcpu_ioctl_run+0x92a/0xd13 [kvm] EAX: 00000001 EBX: 000f387e ECX: 00000000 EDX: 00000000 ESI: 00000000 EDI: 00000000 EBP: ef5a0060 ESP: d7c63e70 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 Process zoog_kvm_monito (pid: 24935, ti=d7c62000 task=ed84a0c0 task.ti=d7c62000) Stack: 00000001 f70a1200 f8b940a9 ef5a0060 00000000 00200202 f8769009 00000000 ef5a0060 000f387e eda5c020 8722f9c8 00015bae 00000000 ed84a0c0 ed84a0c0 c12bf02d 0000ae80 ef7f8740 fffffffb f359b740 ef5a0060 f8b85dc1 0000ae80 Call Trace: [] ? kvm_arch_vcpu_ioctl_set_sregs+0x2fe/0x308 [kvm] ... [] ? syscall_call+0x7/0xb Code: 89 e8 e8 14 ee ff ff ba 00 00 04 00 89 e8 e8 98 48 ff ff 85 c0 74 1e 83 7d 48 00 75 18 8b 85 08 07 00 00 31 c9 8b 95 0c 07 00 00 <0f> 01 d1 c7 45 48 01 00 00 00 c7 45 1c 01 00 00 00 0f ae f0 89 EIP: [] kvm_arch_vcpu_ioctl_run+0x92a/0xd13 [kvm] SS:ESP 0068:d7c63e70 QEMU first retrieves the supported features via KVM_GET_SUPPORTED_CPUID and then sets them later. So guest's X86_FEATURE_XSAVE should be masked out on hosts without X86_FEATURE_XSAVE, making kvm_set_cr4 with X86_CR4_OSXSAVE fail. Userspaces that allow specifying guest cpuid with X86_FEATURE_XSAVE even on hosts that do not support it, might be susceptible to this attack from inside the guest as well. Allow setting X86_CR4_OSXSAVE bit only if host has XSAVE support. Signed-off-by: Petr Matousek Signed-off-by: Marcelo Tosatti Signed-off-by: Jiri Slaby Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/x86.c | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'arch') diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 024ee68..e329dc5 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -575,6 +575,9 @@ static bool guest_cpuid_has_xsave(struct kvm_vcpu *vcpu) { struct kvm_cpuid_entry2 *best; + if (!cpu_has_xsave) + return 0; + best = kvm_find_cpuid_entry(vcpu, 1, 0); return best && (best->ecx & bit(X86_FEATURE_XSAVE)); } @@ -5854,6 +5857,9 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu, int pending_vec, max_bits, idx; struct desc_ptr dt; + if (!guest_cpuid_has_xsave(vcpu) && (sregs->cr4 & X86_CR4_OSXSAVE)) + return -EINVAL; + dt.size = sregs->idt.limit; dt.address = sregs->idt.base; kvm_x86_ops->set_idt(vcpu, &dt); -- cgit v1.1 From 31f516f1f359bed25b6f6ebe5752326145303b3c Mon Sep 17 00:00:00 2001 From: Joerg Roedel Date: Tue, 26 Mar 2013 22:48:23 +0100 Subject: iommu/amd: Make sure dma_ops are set for hotplug devices commit c2a2876e863356b092967ea62bebdb4dd663af80 upstream. There is a bug introduced with commit 27c2127 that causes devices which are hot unplugged and then hot-replugged to not have per-device dma_ops set. This causes these devices to not function correctly. Fixed with this patch. Reported-by: Andreas Degert Signed-off-by: Joerg Roedel Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/amd_iommu.c | 24 ++++++++++++++---------- 1 file changed, 14 insertions(+), 10 deletions(-) (limited to 'arch') diff --git a/arch/x86/kernel/amd_iommu.c b/arch/x86/kernel/amd_iommu.c index bfd75ff..d9302b7 100644 --- a/arch/x86/kernel/amd_iommu.c +++ b/arch/x86/kernel/amd_iommu.c @@ -53,6 +53,8 @@ static struct protection_domain *pt_domain; static struct iommu_ops amd_iommu_ops; +static struct dma_map_ops amd_iommu_dma_ops; + /* * general struct to manage commands send to an IOMMU */ @@ -1778,18 +1780,20 @@ static int device_change_notifier(struct notifier_block *nb, domain = domain_for_device(dev); - /* allocate a protection domain if a device is added */ dma_domain = find_protection_domain(devid); - if (dma_domain) - goto out; - dma_domain = dma_ops_domain_alloc(); - if (!dma_domain) - goto out; - dma_domain->target_dev = devid; + if (!dma_domain) { + /* allocate a protection domain if a device is added */ + dma_domain = dma_ops_domain_alloc(); + if (!dma_domain) + goto out; + dma_domain->target_dev = devid; + + spin_lock_irqsave(&iommu_pd_list_lock, flags); + list_add_tail(&dma_domain->list, &iommu_pd_list); + spin_unlock_irqrestore(&iommu_pd_list_lock, flags); + } - spin_lock_irqsave(&iommu_pd_list_lock, flags); - list_add_tail(&dma_domain->list, &iommu_pd_list); - spin_unlock_irqrestore(&iommu_pd_list_lock, flags); + dev->archdata.dma_ops = &amd_iommu_dma_ops; break; case BUS_NOTIFY_DEL_DEVICE: -- cgit v1.1 From 48631b65db235d68acbde42a1cb6804afbfd283e Mon Sep 17 00:00:00 2001 From: Jay Estabrook Date: Sun, 7 Apr 2013 21:36:09 +1200 Subject: alpha: Add irongate_io to PCI bus resources commit aa8b4be3ac049c8b1df2a87e4d1d902ccfc1f7a9 upstream. Fixes a NULL pointer dereference at boot on UP1500. Reviewed-and-Tested-by: Matt Turner Signed-off-by: Jay Estabrook Signed-off-by: Matt Turner Signed-off-by: Michael Cree Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- arch/alpha/kernel/sys_nautilus.c | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'arch') diff --git a/arch/alpha/kernel/sys_nautilus.c b/arch/alpha/kernel/sys_nautilus.c index 99c0f46..dc616b3 100644 --- a/arch/alpha/kernel/sys_nautilus.c +++ b/arch/alpha/kernel/sys_nautilus.c @@ -189,6 +189,10 @@ nautilus_machine_check(unsigned long vector, unsigned long la_ptr) extern void free_reserved_mem(void *, void *); extern void pcibios_claim_one_bus(struct pci_bus *); +static struct resource irongate_io = { + .name = "Irongate PCI IO", + .flags = IORESOURCE_IO, +}; static struct resource irongate_mem = { .name = "Irongate PCI MEM", .flags = IORESOURCE_MEM, @@ -210,6 +214,7 @@ nautilus_init_pci(void) irongate = pci_get_bus_and_slot(0, 0); bus->self = irongate; + bus->resource[0] = &irongate_io; bus->resource[1] = &irongate_mem; pci_bus_size_bridges(bus); -- cgit v1.1 From 8a7adba6f5b486e00f03d88d185a25ec4c1b6175 Mon Sep 17 00:00:00 2001 From: Michael Wolf Date: Fri, 5 Apr 2013 10:41:40 +0000 Subject: powerpc: pSeries_lpar_hpte_remove fails from Adjunct partition being performed before the ANDCOND test commit 9fb2640159f9d4f5a2a9d60e490482d4cbecafdb upstream. Some versions of pHyp will perform the adjunct partition test before the ANDCOND test. The result of this is that H_RESOURCE can be returned and cause the BUG_ON condition to occur. The HPTE is not removed. So add a check for H_RESOURCE, it is ok if this HPTE is not removed as pSeries_lpar_hpte_remove is looking for an HPTE to remove and not a specific HPTE to remove. So it is ok to just move on to the next slot and try again. Signed-off-by: Michael Wolf Signed-off-by: Stephen Rothwell Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/platforms/pseries/lpar.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c index 81e30d9..2e0b2a7 100644 --- a/arch/powerpc/platforms/pseries/lpar.c +++ b/arch/powerpc/platforms/pseries/lpar.c @@ -377,7 +377,13 @@ static long pSeries_lpar_hpte_remove(unsigned long hpte_group) (0x1UL << 4), &dummy1, &dummy2); if (lpar_rc == H_SUCCESS) return i; - BUG_ON(lpar_rc != H_NOT_FOUND); + + /* + * The test for adjunct partition is performed before the + * ANDCOND test. H_RESOURCE may be returned, so we need to + * check for that as well. + */ + BUG_ON(lpar_rc != H_NOT_FOUND && lpar_rc != H_RESOURCE); slot_offset++; slot_offset &= 0x7; -- cgit v1.1 From ab82a79e3cb3c52e635620a65a016eddbf9db144 Mon Sep 17 00:00:00 2001 From: Dave Hansen Date: Wed, 30 Jan 2013 16:56:16 -0800 Subject: x86-32, mm: Rip out x86_32 NUMA remapping code commit f03574f2d5b2d6229dcdf2d322848065f72953c7 upstream. This code was an optimization for 32-bit NUMA systems. It has probably been the cause of a number of subtle bugs over the years, although the conditions to excite them would have been hard to trigger. Essentially, we remap part of the kernel linear mapping area, and then sometimes part of that area gets freed back in to the bootmem allocator. If those pages get used by kernel data structures (say mem_map[] or a dentry), there's no big deal. But, if anyone ever tried to use the linear mapping for these pages _and_ cared about their physical address, bad things happen. For instance, say you passed __GFP_ZERO to the page allocator and then happened to get handed one of these pages, it zero the remapped page, but it would make a pte to the _old_ page. There are probably a hundred other ways that it could screw with things. We don't need to hang on to performance optimizations for these old boxes any more. All my 32-bit NUMA systems are long dead and buried, and I probably had access to more than most people. This code is causing real things to break today: https://lkml.org/lkml/2013/1/9/376 I looked in to actually fixing this, but it requires surgery to way too much brittle code, as well as stuff like per_cpu_ptr_to_phys(). [ hpa: Cc: this for -stable, since it is a memory corruption issue. However, an alternative is to simply mark NUMA as depends BROKEN rather than EXPERIMENTAL in the X86_32 subclause... ] Link: http://lkml.kernel.org/r/20130131005616.1C79F411@kernel.stglabs.ibm.com Signed-off-by: H. Peter Anvin Cc: Jiri Slaby Signed-off-by: Greg Kroah-Hartman --- arch/x86/Kconfig | 4 ---- arch/x86/mm/numa.c | 3 --- arch/x86/mm/numa_internal.h | 6 ------ 3 files changed, 13 deletions(-) (limited to 'arch') diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index a0e9bda..90bf314 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1219,10 +1219,6 @@ config HAVE_ARCH_BOOTMEM def_bool y depends on X86_32 && NUMA -config HAVE_ARCH_ALLOC_REMAP - def_bool y - depends on X86_32 && NUMA - config ARCH_HAVE_MEMORY_PRESENT def_bool y depends on X86_32 && DISCONTIGMEM diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c index f5510d8..469ccae 100644 --- a/arch/x86/mm/numa.c +++ b/arch/x86/mm/numa.c @@ -207,9 +207,6 @@ static void __init setup_node_data(int nid, u64 start, u64 end) if (end && (end - start) < NODE_MIN_SIZE) return; - /* initialize remap allocator before aligning to ZONE_ALIGN */ - init_alloc_remap(nid, start, end); - start = roundup(start, ZONE_ALIGN); printk(KERN_INFO "Initmem setup node %d %016Lx-%016Lx\n", diff --git a/arch/x86/mm/numa_internal.h b/arch/x86/mm/numa_internal.h index 7178c3a..ad86ec9 100644 --- a/arch/x86/mm/numa_internal.h +++ b/arch/x86/mm/numa_internal.h @@ -21,12 +21,6 @@ void __init numa_reset_distance(void); void __init x86_numa_init(void); -#ifdef CONFIG_X86_64 -static inline void init_alloc_remap(int nid, u64 start, u64 end) { } -#else -void __init init_alloc_remap(int nid, u64 start, u64 end); -#endif - #ifdef CONFIG_NUMA_EMU void __init numa_emulation(struct numa_meminfo *numa_meminfo, int numa_dist_cnt); -- cgit v1.1 From facbcede9edd28f9f3290a83fdb6ea4b781ffcd6 Mon Sep 17 00:00:00 2001 From: Dave Hansen Date: Wed, 30 Jan 2013 16:56:16 -0800 Subject: x86-32, mm: Rip out x86_32 NUMA remapping code commit f03574f2d5b2d6229dcdf2d322848065f72953c7 upstream. [was already included in 3.0, but I missed the patch hunk for arch/x86/mm/numa_32.c - gregkh] This code was an optimization for 32-bit NUMA systems. It has probably been the cause of a number of subtle bugs over the years, although the conditions to excite them would have been hard to trigger. Essentially, we remap part of the kernel linear mapping area, and then sometimes part of that area gets freed back in to the bootmem allocator. If those pages get used by kernel data structures (say mem_map[] or a dentry), there's no big deal. But, if anyone ever tried to use the linear mapping for these pages _and_ cared about their physical address, bad things happen. For instance, say you passed __GFP_ZERO to the page allocator and then happened to get handed one of these pages, it zero the remapped page, but it would make a pte to the _old_ page. There are probably a hundred other ways that it could screw with things. We don't need to hang on to performance optimizations for these old boxes any more. All my 32-bit NUMA systems are long dead and buried, and I probably had access to more than most people. This code is causing real things to break today: https://lkml.org/lkml/2013/1/9/376 I looked in to actually fixing this, but it requires surgery to way too much brittle code, as well as stuff like per_cpu_ptr_to_phys(). [ hpa: Cc: this for -stable, since it is a memory corruption issue. However, an alternative is to simply mark NUMA as depends BROKEN rather than EXPERIMENTAL in the X86_32 subclause... ] Link: http://lkml.kernel.org/r/20130131005616.1C79F411@kernel.stglabs.ibm.com Signed-off-by: H. Peter Anvin Cc: Jiri Slaby Signed-off-by: Greg Kroah-Hartman --- arch/x86/mm/numa_32.c | 161 -------------------------------------------------- 1 file changed, 161 deletions(-) (limited to 'arch') diff --git a/arch/x86/mm/numa_32.c b/arch/x86/mm/numa_32.c index 849a975..025d469 100644 --- a/arch/x86/mm/numa_32.c +++ b/arch/x86/mm/numa_32.c @@ -73,167 +73,6 @@ unsigned long node_memmap_size_bytes(int nid, unsigned long start_pfn, extern unsigned long highend_pfn, highstart_pfn; -#define LARGE_PAGE_BYTES (PTRS_PER_PTE * PAGE_SIZE) - -static void *node_remap_start_vaddr[MAX_NUMNODES]; -void set_pmd_pfn(unsigned long vaddr, unsigned long pfn, pgprot_t flags); - -/* - * Remap memory allocator - */ -static unsigned long node_remap_start_pfn[MAX_NUMNODES]; -static void *node_remap_end_vaddr[MAX_NUMNODES]; -static void *node_remap_alloc_vaddr[MAX_NUMNODES]; - -/** - * alloc_remap - Allocate remapped memory - * @nid: NUMA node to allocate memory from - * @size: The size of allocation - * - * Allocate @size bytes from the remap area of NUMA node @nid. The - * size of the remap area is predetermined by init_alloc_remap() and - * only the callers considered there should call this function. For - * more info, please read the comment on top of init_alloc_remap(). - * - * The caller must be ready to handle allocation failure from this - * function and fall back to regular memory allocator in such cases. - * - * CONTEXT: - * Single CPU early boot context. - * - * RETURNS: - * Pointer to the allocated memory on success, %NULL on failure. - */ -void *alloc_remap(int nid, unsigned long size) -{ - void *allocation = node_remap_alloc_vaddr[nid]; - - size = ALIGN(size, L1_CACHE_BYTES); - - if (!allocation || (allocation + size) > node_remap_end_vaddr[nid]) - return NULL; - - node_remap_alloc_vaddr[nid] += size; - memset(allocation, 0, size); - - return allocation; -} - -#ifdef CONFIG_HIBERNATION -/** - * resume_map_numa_kva - add KVA mapping to the temporary page tables created - * during resume from hibernation - * @pgd_base - temporary resume page directory - */ -void resume_map_numa_kva(pgd_t *pgd_base) -{ - int node; - - for_each_online_node(node) { - unsigned long start_va, start_pfn, nr_pages, pfn; - - start_va = (unsigned long)node_remap_start_vaddr[node]; - start_pfn = node_remap_start_pfn[node]; - nr_pages = (node_remap_end_vaddr[node] - - node_remap_start_vaddr[node]) >> PAGE_SHIFT; - - printk(KERN_DEBUG "%s: node %d\n", __func__, node); - - for (pfn = 0; pfn < nr_pages; pfn += PTRS_PER_PTE) { - unsigned long vaddr = start_va + (pfn << PAGE_SHIFT); - pgd_t *pgd = pgd_base + pgd_index(vaddr); - pud_t *pud = pud_offset(pgd, vaddr); - pmd_t *pmd = pmd_offset(pud, vaddr); - - set_pmd(pmd, pfn_pmd(start_pfn + pfn, - PAGE_KERNEL_LARGE_EXEC)); - - printk(KERN_DEBUG "%s: %08lx -> pfn %08lx\n", - __func__, vaddr, start_pfn + pfn); - } - } -} -#endif - -/** - * init_alloc_remap - Initialize remap allocator for a NUMA node - * @nid: NUMA node to initizlie remap allocator for - * - * NUMA nodes may end up without any lowmem. As allocating pgdat and - * memmap on a different node with lowmem is inefficient, a special - * remap allocator is implemented which can be used by alloc_remap(). - * - * For each node, the amount of memory which will be necessary for - * pgdat and memmap is calculated and two memory areas of the size are - * allocated - one in the node and the other in lowmem; then, the area - * in the node is remapped to the lowmem area. - * - * As pgdat and memmap must be allocated in lowmem anyway, this - * doesn't waste lowmem address space; however, the actual lowmem - * which gets remapped over is wasted. The amount shouldn't be - * problematic on machines this feature will be used. - * - * Initialization failure isn't fatal. alloc_remap() is used - * opportunistically and the callers will fall back to other memory - * allocation mechanisms on failure. - */ -void __init init_alloc_remap(int nid, u64 start, u64 end) -{ - unsigned long start_pfn = start >> PAGE_SHIFT; - unsigned long end_pfn = end >> PAGE_SHIFT; - unsigned long size, pfn; - u64 node_pa, remap_pa; - void *remap_va; - - /* - * The acpi/srat node info can show hot-add memroy zones where - * memory could be added but not currently present. - */ - printk(KERN_DEBUG "node %d pfn: [%lx - %lx]\n", - nid, start_pfn, end_pfn); - - /* calculate the necessary space aligned to large page size */ - size = node_memmap_size_bytes(nid, start_pfn, end_pfn); - size += ALIGN(sizeof(pg_data_t), PAGE_SIZE); - size = ALIGN(size, LARGE_PAGE_BYTES); - - /* allocate node memory and the lowmem remap area */ - node_pa = memblock_find_in_range(start, end, size, LARGE_PAGE_BYTES); - if (node_pa == MEMBLOCK_ERROR) { - pr_warning("remap_alloc: failed to allocate %lu bytes for node %d\n", - size, nid); - return; - } - memblock_x86_reserve_range(node_pa, node_pa + size, "KVA RAM"); - - remap_pa = memblock_find_in_range(min_low_pfn << PAGE_SHIFT, - max_low_pfn << PAGE_SHIFT, - size, LARGE_PAGE_BYTES); - if (remap_pa == MEMBLOCK_ERROR) { - pr_warning("remap_alloc: failed to allocate %lu bytes remap area for node %d\n", - size, nid); - memblock_x86_free_range(node_pa, node_pa + size); - return; - } - memblock_x86_reserve_range(remap_pa, remap_pa + size, "KVA PG"); - remap_va = phys_to_virt(remap_pa); - - /* perform actual remap */ - for (pfn = 0; pfn < size >> PAGE_SHIFT; pfn += PTRS_PER_PTE) - set_pmd_pfn((unsigned long)remap_va + (pfn << PAGE_SHIFT), - (node_pa >> PAGE_SHIFT) + pfn, - PAGE_KERNEL_LARGE); - - /* initialize remap allocator parameters */ - node_remap_start_pfn[nid] = node_pa >> PAGE_SHIFT; - node_remap_start_vaddr[nid] = remap_va; - node_remap_end_vaddr[nid] = remap_va + size; - node_remap_alloc_vaddr[nid] = remap_va; - - printk(KERN_DEBUG "remap_alloc: node %d [%08llx-%08llx) -> [%p-%p)\n", - nid, node_pa, node_pa + size, remap_va, remap_va + size); -} - void __init initmem_init(void) { x86_numa_init(); -- cgit v1.1 From cfe9f98bf529186fa6365127f089ea69dafb84d5 Mon Sep 17 00:00:00 2001 From: Samu Kallio Date: Sat, 23 Mar 2013 09:36:35 -0400 Subject: x86, mm, paravirt: Fix vmalloc_fault oops during lazy MMU updates commit 1160c2779b826c6f5c08e5cc542de58fd1f667d5 upstream. In paravirtualized x86_64 kernels, vmalloc_fault may cause an oops when lazy MMU updates are enabled, because set_pgd effects are being deferred. One instance of this problem is during process mm cleanup with memory cgroups enabled. The chain of events is as follows: - zap_pte_range enables lazy MMU updates - zap_pte_range eventually calls mem_cgroup_charge_statistics, which accesses the vmalloc'd mem_cgroup per-cpu stat area - vmalloc_fault is triggered which tries to sync the corresponding PGD entry with set_pgd, but the update is deferred - vmalloc_fault oopses due to a mismatch in the PUD entries The OOPs usually looks as so: ------------[ cut here ]------------ kernel BUG at arch/x86/mm/fault.c:396! invalid opcode: 0000 [#1] SMP .. snip .. CPU 1 Pid: 10866, comm: httpd Not tainted 3.6.10-4.fc18.x86_64 #1 RIP: e030:[] [] vmalloc_fault+0x11f/0x208 .. snip .. Call Trace: [] do_page_fault+0x399/0x4b0 [] ? xen_mc_extend_args+0xec/0x110 [] page_fault+0x25/0x30 [] ? mem_cgroup_charge_statistics.isra.13+0x13/0x50 [] __mem_cgroup_uncharge_common+0xd8/0x350 [] mem_cgroup_uncharge_page+0x57/0x60 [] page_remove_rmap+0xe0/0x150 [] ? vm_normal_page+0x1a/0x80 [] unmap_single_vma+0x531/0x870 [] unmap_vmas+0x52/0xa0 [] ? pte_mfn_to_pfn+0x72/0x100 [] exit_mmap+0x98/0x170 [] ? __raw_callee_save_xen_pmd_val+0x11/0x1e [] mmput+0x83/0xf0 [] exit_mm+0x104/0x130 [] do_exit+0x15a/0x8c0 [] do_group_exit+0x3f/0xa0 [] sys_exit_group+0x17/0x20 [] system_call_fastpath+0x16/0x1b Calling arch_flush_lazy_mmu_mode immediately after set_pgd makes the changes visible to the consistency checks. RedHat-Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=914737 Tested-by: Josh Boyer Reported-and-Tested-by: Krishna Raman Signed-off-by: Samu Kallio Link: http://lkml.kernel.org/r/1364045796-10720-1-git-send-email-konrad.wilk@oracle.com Tested-by: Konrad Rzeszutek Wilk Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: H. Peter Anvin Signed-off-by: Greg Kroah-Hartman --- arch/x86/mm/fault.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) (limited to 'arch') diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 3b2ad91..7653f14 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -376,10 +376,12 @@ static noinline __kprobes int vmalloc_fault(unsigned long address) if (pgd_none(*pgd_ref)) return -1; - if (pgd_none(*pgd)) + if (pgd_none(*pgd)) { set_pgd(pgd, *pgd_ref); - else + arch_flush_lazy_mmu_mode(); + } else { BUG_ON(pgd_page_vaddr(*pgd) != pgd_page_vaddr(*pgd_ref)); + } /* * Below here mismatches are bugs because these lower tables -- cgit v1.1 From b1cf3728932d0e6beb0a09812cbc71618939069a Mon Sep 17 00:00:00 2001 From: Boris Ostrovsky Date: Sat, 23 Mar 2013 09:36:36 -0400 Subject: x86, mm: Patch out arch_flush_lazy_mmu_mode() when running on bare metal commit 511ba86e1d386f671084b5d0e6f110bb30b8eeb2 upstream. Invoking arch_flush_lazy_mmu_mode() results in calls to preempt_enable()/disable() which may have performance impact. Since lazy MMU is not used on bare metal we can patch away arch_flush_lazy_mmu_mode() so that it is never called in such environment. [ hpa: the previous patch "Fix vmalloc_fault oops during lazy MMU updates" may cause a minor performance regression on bare metal. This patch resolves that performance regression. It is somewhat unclear to me if this is a good -stable candidate. ] Signed-off-by: Boris Ostrovsky Link: http://lkml.kernel.org/r/1364045796-10720-2-git-send-email-konrad.wilk@oracle.com Tested-by: Josh Boyer Tested-by: Konrad Rzeszutek Wilk Acked-by: Borislav Petkov Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: H. Peter Anvin Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/paravirt.h | 5 ++++- arch/x86/include/asm/paravirt_types.h | 2 ++ arch/x86/kernel/paravirt.c | 25 +++++++++++++------------ arch/x86/lguest/boot.c | 1 + arch/x86/xen/mmu.c | 1 + 5 files changed, 21 insertions(+), 13 deletions(-) (limited to 'arch') diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h index ebbc4d8..2fdfe31 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -731,7 +731,10 @@ static inline void arch_leave_lazy_mmu_mode(void) PVOP_VCALL0(pv_mmu_ops.lazy_mode.leave); } -void arch_flush_lazy_mmu_mode(void); +static inline void arch_flush_lazy_mmu_mode(void) +{ + PVOP_VCALL0(pv_mmu_ops.lazy_mode.flush); +} static inline void __set_fixmap(unsigned /* enum fixed_addresses */ idx, phys_addr_t phys, pgprot_t flags) diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h index 8288509..4b67ec9 100644 --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h @@ -85,6 +85,7 @@ struct pv_lazy_ops { /* Set deferred update mode, used for batching operations. */ void (*enter)(void); void (*leave)(void); + void (*flush)(void); }; struct pv_time_ops { @@ -673,6 +674,7 @@ void paravirt_end_context_switch(struct task_struct *next); void paravirt_enter_lazy_mmu(void); void paravirt_leave_lazy_mmu(void); +void paravirt_flush_lazy_mmu(void); void _paravirt_nop(void); u32 _paravirt_ident_32(u32); diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c index 869e1ae..704faba 100644 --- a/arch/x86/kernel/paravirt.c +++ b/arch/x86/kernel/paravirt.c @@ -253,6 +253,18 @@ void paravirt_leave_lazy_mmu(void) leave_lazy(PARAVIRT_LAZY_MMU); } +void paravirt_flush_lazy_mmu(void) +{ + preempt_disable(); + + if (paravirt_get_lazy_mode() == PARAVIRT_LAZY_MMU) { + arch_leave_lazy_mmu_mode(); + arch_enter_lazy_mmu_mode(); + } + + preempt_enable(); +} + void paravirt_start_context_switch(struct task_struct *prev) { BUG_ON(preemptible()); @@ -282,18 +294,6 @@ enum paravirt_lazy_mode paravirt_get_lazy_mode(void) return percpu_read(paravirt_lazy_mode); } -void arch_flush_lazy_mmu_mode(void) -{ - preempt_disable(); - - if (paravirt_get_lazy_mode() == PARAVIRT_LAZY_MMU) { - arch_leave_lazy_mmu_mode(); - arch_enter_lazy_mmu_mode(); - } - - preempt_enable(); -} - struct pv_info pv_info = { .name = "bare hardware", .paravirt_enabled = 0, @@ -462,6 +462,7 @@ struct pv_mmu_ops pv_mmu_ops = { .lazy_mode = { .enter = paravirt_nop, .leave = paravirt_nop, + .flush = paravirt_nop, }, .set_fixmap = native_set_fixmap, diff --git a/arch/x86/lguest/boot.c b/arch/x86/lguest/boot.c index db832fd..2d45247 100644 --- a/arch/x86/lguest/boot.c +++ b/arch/x86/lguest/boot.c @@ -1309,6 +1309,7 @@ __init void lguest_init(void) pv_mmu_ops.read_cr3 = lguest_read_cr3; pv_mmu_ops.lazy_mode.enter = paravirt_enter_lazy_mmu; pv_mmu_ops.lazy_mode.leave = lguest_leave_lazy_mmu_mode; + pv_mmu_ops.lazy_mode.flush = paravirt_flush_lazy_mmu; pv_mmu_ops.pte_update = lguest_pte_update; pv_mmu_ops.pte_update_defer = lguest_pte_update; diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c index d957dce..a0aed70 100644 --- a/arch/x86/xen/mmu.c +++ b/arch/x86/xen/mmu.c @@ -2011,6 +2011,7 @@ static const struct pv_mmu_ops xen_mmu_ops __initconst = { .lazy_mode = { .enter = paravirt_enter_lazy_mmu, .leave = xen_leave_lazy_mmu, + .flush = paravirt_flush_lazy_mmu, }, .set_fixmap = xen_set_fixmap, -- cgit v1.1 From d7709255affba50d2ff4087d28308e03d1154afa Mon Sep 17 00:00:00 2001 From: Andy Honig Date: Mon, 11 Mar 2013 09:34:52 -0700 Subject: KVM: x86: fix for buffer overflow in handling of MSR_KVM_SYSTEM_TIME (CVE-2013-1796) commit c300aa64ddf57d9c5d9c898a64b36877345dd4a9 upstream. If the guest sets the GPA of the time_page so that the request to update the time straddles a page then KVM will write onto an incorrect page. The write is done byusing kmap atomic to get a pointer to the page for the time structure and then performing a memcpy to that page starting at an offset that the guest controls. Well behaved guests always provide a 32-byte aligned address, however a malicious guest could use this to corrupt host kernel memory. Tested: Tested against kvmclock unit test. Signed-off-by: Andrew Honig Signed-off-by: Marcelo Tosatti Cc: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/x86.c | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'arch') diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index e329dc5..e525b9e 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1539,6 +1539,11 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data) /* ...but clean it before doing the actual write */ vcpu->arch.time_offset = data & ~(PAGE_MASK | 1); + /* Check that the address is 32-byte aligned. */ + if (vcpu->arch.time_offset & + (sizeof(struct pvclock_vcpu_time_info) - 1)) + break; + vcpu->arch.time_page = gfn_to_page(vcpu->kvm, data >> PAGE_SHIFT); -- cgit v1.1 From df0ed3450c217a1cd571c0d4efa4dc6c458894a9 Mon Sep 17 00:00:00 2001 From: Andy Honig Date: Wed, 20 Feb 2013 14:48:10 -0800 Subject: KVM: x86: Convert MSR_KVM_SYSTEM_TIME to use gfn_to_hva_cache functions (CVE-2013-1797) commit 0b79459b482e85cb7426aa7da683a9f2c97aeae1 upstream. There is a potential use after free issue with the handling of MSR_KVM_SYSTEM_TIME. If the guest specifies a GPA in a movable or removable memory such as frame buffers then KVM might continue to write to that address even after it's removed via KVM_SET_USER_MEMORY_REGION. KVM pins the page in memory so it's unlikely to cause an issue, but if the user space component re-purposes the memory previously used for the guest, then the guest will be able to corrupt that memory. Tested: Tested against kvmclock unit test Signed-off-by: Andrew Honig Signed-off-by: Marcelo Tosatti Cc: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/kvm_host.h | 4 ++-- arch/x86/kvm/x86.c | 39 ++++++++++++++------------------------- 2 files changed, 16 insertions(+), 27 deletions(-) (limited to 'arch') diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index d2ac8e2..1eb45de 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -391,8 +391,8 @@ struct kvm_vcpu_arch { gpa_t time; struct pvclock_vcpu_time_info hv_clock; unsigned int hw_tsc_khz; - unsigned int time_offset; - struct page *time_page; + struct gfn_to_hva_cache pv_time; + bool pv_time_enabled; u64 last_guest_tsc; u64 last_kernel_ns; u64 last_tsc_nsec; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index e525b9e..b2d5baf 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1073,7 +1073,6 @@ static int kvm_guest_time_update(struct kvm_vcpu *v) { unsigned long flags; struct kvm_vcpu_arch *vcpu = &v->arch; - void *shared_kaddr; unsigned long this_tsc_khz; s64 kernel_ns, max_kernel_ns; u64 tsc_timestamp; @@ -1109,7 +1108,7 @@ static int kvm_guest_time_update(struct kvm_vcpu *v) local_irq_restore(flags); - if (!vcpu->time_page) + if (!vcpu->pv_time_enabled) return 0; /* @@ -1167,14 +1166,9 @@ static int kvm_guest_time_update(struct kvm_vcpu *v) */ vcpu->hv_clock.version += 2; - shared_kaddr = kmap_atomic(vcpu->time_page, KM_USER0); - - memcpy(shared_kaddr + vcpu->time_offset, &vcpu->hv_clock, - sizeof(vcpu->hv_clock)); - - kunmap_atomic(shared_kaddr, KM_USER0); - - mark_page_dirty(v->kvm, vcpu->time >> PAGE_SHIFT); + kvm_write_guest_cached(v->kvm, &vcpu->pv_time, + &vcpu->hv_clock, + sizeof(vcpu->hv_clock)); return 0; } @@ -1464,10 +1458,7 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data) static void kvmclock_reset(struct kvm_vcpu *vcpu) { - if (vcpu->arch.time_page) { - kvm_release_page_dirty(vcpu->arch.time_page); - vcpu->arch.time_page = NULL; - } + vcpu->arch.pv_time_enabled = false; } int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data) @@ -1527,6 +1518,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data) break; case MSR_KVM_SYSTEM_TIME_NEW: case MSR_KVM_SYSTEM_TIME: { + u64 gpa_offset; kvmclock_reset(vcpu); vcpu->arch.time = data; @@ -1536,21 +1528,17 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data) if (!(data & 1)) break; - /* ...but clean it before doing the actual write */ - vcpu->arch.time_offset = data & ~(PAGE_MASK | 1); + gpa_offset = data & ~(PAGE_MASK | 1); /* Check that the address is 32-byte aligned. */ - if (vcpu->arch.time_offset & - (sizeof(struct pvclock_vcpu_time_info) - 1)) + if (gpa_offset & (sizeof(struct pvclock_vcpu_time_info) - 1)) break; - vcpu->arch.time_page = - gfn_to_page(vcpu->kvm, data >> PAGE_SHIFT); - - if (is_error_page(vcpu->arch.time_page)) { - kvm_release_page_clean(vcpu->arch.time_page); - vcpu->arch.time_page = NULL; - } + if (kvm_gfn_to_hva_cache_init(vcpu->kvm, + &vcpu->arch.pv_time, data & ~1ULL)) + vcpu->arch.pv_time_enabled = false; + else + vcpu->arch.pv_time_enabled = true; break; } case MSR_KVM_ASYNC_PF_EN: @@ -6257,6 +6245,7 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu) if (!zalloc_cpumask_var(&vcpu->arch.wbinvd_dirty_mask, GFP_KERNEL)) goto fail_free_mce_banks; + vcpu->arch.pv_time_enabled = false; kvm_async_pf_hash_reset(vcpu); return 0; -- cgit v1.1 From d715cdddb8cdf1c17bf1c5ff8fcc9852cd6ba79e Mon Sep 17 00:00:00 2001 From: Andrew Honig Date: Fri, 29 Mar 2013 09:35:21 -0700 Subject: KVM: Allow cross page reads and writes from cached translations. commit 8f964525a121f2ff2df948dac908dcc65be21b5b upstream. This patch adds support for kvm_gfn_to_hva_cache_init functions for reads and writes that will cross a page. If the range falls within the same memslot, then this will be a fast operation. If the range is split between two memslots, then the slower kvm_read_guest and kvm_write_guest are used. Tested: Test against kvm_clock unit tests. Signed-off-by: Andrew Honig Signed-off-by: Gleb Natapov Cc: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/x86.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) (limited to 'arch') diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index b2d5baf..15e79a6 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1448,7 +1448,8 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data) return 0; } - if (kvm_gfn_to_hva_cache_init(vcpu->kvm, &vcpu->arch.apf.data, gpa)) + if (kvm_gfn_to_hva_cache_init(vcpu->kvm, &vcpu->arch.apf.data, gpa, + sizeof(u32))) return 1; vcpu->arch.apf.send_user_only = !(data & KVM_ASYNC_PF_SEND_ALWAYS); @@ -1530,12 +1531,9 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data) gpa_offset = data & ~(PAGE_MASK | 1); - /* Check that the address is 32-byte aligned. */ - if (gpa_offset & (sizeof(struct pvclock_vcpu_time_info) - 1)) - break; - if (kvm_gfn_to_hva_cache_init(vcpu->kvm, - &vcpu->arch.pv_time, data & ~1ULL)) + &vcpu->arch.pv_time, data & ~1ULL, + sizeof(struct pvclock_vcpu_time_info))) vcpu->arch.pv_time_enabled = false; else vcpu->arch.pv_time_enabled = true; -- cgit v1.1 From cef72624c31364e7020450571393a4d5a0e44b34 Mon Sep 17 00:00:00 2001 From: Illia Ragozin Date: Wed, 10 Apr 2013 19:43:34 +0100 Subject: ARM: 7696/1: Fix kexec by setting outer_cache.inv_all for Feroceon commit cd272d1ea71583170e95dde02c76166c7f9017e6 upstream. On Feroceon the L2 cache becomes non-coherent with the CPU when the L1 caches are disabled. Thus the L2 needs to be invalidated after both L1 caches are disabled. On kexec before the starting the code for relocation the kernel, the L1 caches are disabled in cpu_froc_fin (cpu_v7_proc_fin for Feroceon), but after L2 cache is never invalidated, because inv_all is not set in cache-feroceon-l2.c. So kernel relocation and decompression may has (and usually has) errors. Setting the function enables L2 invalidation and fixes the issue. Signed-off-by: Illia Ragozin Acked-by: Jason Cooper Signed-off-by: Russell King Signed-off-by: Greg Kroah-Hartman --- arch/arm/mm/cache-feroceon-l2.c | 1 + 1 file changed, 1 insertion(+) (limited to 'arch') diff --git a/arch/arm/mm/cache-feroceon-l2.c b/arch/arm/mm/cache-feroceon-l2.c index e0b0e7a..09f8851 100644 --- a/arch/arm/mm/cache-feroceon-l2.c +++ b/arch/arm/mm/cache-feroceon-l2.c @@ -342,6 +342,7 @@ void __init feroceon_l2_init(int __l2_wt_override) outer_cache.inv_range = feroceon_l2_inv_range; outer_cache.clean_range = feroceon_l2_clean_range; outer_cache.flush_range = feroceon_l2_flush_range; + outer_cache.inv_all = l2_inv_all; enable_l2(); -- cgit v1.1 From 9758b79c56ae6dc93f660928a0d389ba45e530ed Mon Sep 17 00:00:00 2001 From: "David S. Miller" Date: Fri, 19 Apr 2013 17:26:26 -0400 Subject: sparc64: Fix race in TLB batch processing. [ Commits f36391d2790d04993f48da6a45810033a2cdf847 and f0af97070acbad5d6a361f485828223a4faaa0ee upstream. ] As reported by Dave Kleikamp, when we emit cross calls to do batched TLB flush processing we have a race because we do not synchronize on the sibling cpus completing the cross call. So meanwhile the TLB batch can be reset (tb->tlb_nr set to zero, etc.) and either flushes are missed or flushes will flush the wrong addresses. Fix this by using generic infrastructure to synchonize on the completion of the cross call. This first required getting the flush_tlb_pending() call out from switch_to() which operates with locks held and interrupts disabled. The problem is that smp_call_function_many() cannot be invoked with IRQs disabled and this is explicitly checked for with WARN_ON_ONCE(). We get the batch processing outside of locked IRQ disabled sections by using some ideas from the powerpc port. Namely, we only batch inside of arch_{enter,leave}_lazy_mmu_mode() calls. If we're not in such a region, we flush TLBs synchronously. 1) Get rid of xcall_flush_tlb_pending and per-cpu type implementations. 2) Do TLB batch cross calls instead via: smp_call_function_many() tlb_pending_func() __flush_tlb_pending() 3) Batch only in lazy mmu sequences: a) Add 'active' member to struct tlb_batch b) Define __HAVE_ARCH_ENTER_LAZY_MMU_MODE c) Set 'active' in arch_enter_lazy_mmu_mode() d) Run batch and clear 'active' in arch_leave_lazy_mmu_mode() e) Check 'active' in tlb_batch_add_one() and do a synchronous flush if it's clear. 4) Add infrastructure for synchronous TLB page flushes. a) Implement __flush_tlb_page and per-cpu variants, patch as needed. b) Likewise for xcall_flush_tlb_page. c) Implement smp_flush_tlb_page() to invoke the cross-call. d) Wire up global_flush_tlb_page() to the right routine based upon CONFIG_SMP 5) It turns out that singleton batches are very common, 2 out of every 3 batch flushes have only a single entry in them. The batch flush waiting is very expensive, both because of the poll on sibling cpu completeion, as well as because passing the tlb batch pointer to the sibling cpus invokes a shared memory dereference. Therefore, in flush_tlb_pending(), if there is only one entry in the batch perform a completely asynchronous global_flush_tlb_page() instead. Reported-by: Dave Kleikamp Signed-off-by: David S. Miller Acked-by: Dave Kleikamp Signed-off-by: Greg Kroah-Hartman --- arch/sparc/include/asm/pgtable_64.h | 1 + arch/sparc/include/asm/system_64.h | 3 +- arch/sparc/include/asm/tlbflush_64.h | 37 +++++++++-- arch/sparc/kernel/smp_64.c | 41 ++++++++++-- arch/sparc/mm/tlb.c | 39 ++++++++++-- arch/sparc/mm/tsb.c | 57 ++++++++++++----- arch/sparc/mm/ultra.S | 119 ++++++++++++++++++++++++++++------- 7 files changed, 242 insertions(+), 55 deletions(-) (limited to 'arch') diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h index 9822628..ba63d08 100644 --- a/arch/sparc/include/asm/pgtable_64.h +++ b/arch/sparc/include/asm/pgtable_64.h @@ -774,6 +774,7 @@ static inline int io_remap_pfn_range(struct vm_area_struct *vma, return remap_pfn_range(vma, from, phys_base >> PAGE_SHIFT, size, prot); } +#include #include /* We provide our own get_unmapped_area to cope with VA holes and diff --git a/arch/sparc/include/asm/system_64.h b/arch/sparc/include/asm/system_64.h index 10bcabc..f856c7f 100644 --- a/arch/sparc/include/asm/system_64.h +++ b/arch/sparc/include/asm/system_64.h @@ -140,8 +140,7 @@ do { \ * and 2 stores in this critical code path. -DaveM */ #define switch_to(prev, next, last) \ -do { flush_tlb_pending(); \ - save_and_clear_fpu(); \ +do { save_and_clear_fpu(); \ /* If you are tempted to conditionalize the following */ \ /* so that ASI is only written if it changes, think again. */ \ __asm__ __volatile__("wr %%g0, %0, %%asi" \ diff --git a/arch/sparc/include/asm/tlbflush_64.h b/arch/sparc/include/asm/tlbflush_64.h index 2ef4634..f0d6a97 100644 --- a/arch/sparc/include/asm/tlbflush_64.h +++ b/arch/sparc/include/asm/tlbflush_64.h @@ -11,24 +11,40 @@ struct tlb_batch { struct mm_struct *mm; unsigned long tlb_nr; + unsigned long active; unsigned long vaddrs[TLB_BATCH_NR]; }; extern void flush_tsb_kernel_range(unsigned long start, unsigned long end); extern void flush_tsb_user(struct tlb_batch *tb); +extern void flush_tsb_user_page(struct mm_struct *mm, unsigned long vaddr); /* TLB flush operations. */ -extern void flush_tlb_pending(void); +static inline void flush_tlb_mm(struct mm_struct *mm) +{ +} + +static inline void flush_tlb_page(struct vm_area_struct *vma, + unsigned long vmaddr) +{ +} + +static inline void flush_tlb_range(struct vm_area_struct *vma, + unsigned long start, unsigned long end) +{ +} + +#define __HAVE_ARCH_ENTER_LAZY_MMU_MODE -#define flush_tlb_range(vma,start,end) \ - do { (void)(start); flush_tlb_pending(); } while (0) -#define flush_tlb_page(vma,addr) flush_tlb_pending() -#define flush_tlb_mm(mm) flush_tlb_pending() +extern void flush_tlb_pending(void); +extern void arch_enter_lazy_mmu_mode(void); +extern void arch_leave_lazy_mmu_mode(void); +#define arch_flush_lazy_mmu_mode() do {} while (0) /* Local cpu only. */ extern void __flush_tlb_all(void); - +extern void __flush_tlb_page(unsigned long context, unsigned long vaddr); extern void __flush_tlb_kernel_range(unsigned long start, unsigned long end); #ifndef CONFIG_SMP @@ -38,15 +54,24 @@ do { flush_tsb_kernel_range(start,end); \ __flush_tlb_kernel_range(start,end); \ } while (0) +static inline void global_flush_tlb_page(struct mm_struct *mm, unsigned long vaddr) +{ + __flush_tlb_page(CTX_HWBITS(mm->context), vaddr); +} + #else /* CONFIG_SMP */ extern void smp_flush_tlb_kernel_range(unsigned long start, unsigned long end); +extern void smp_flush_tlb_page(struct mm_struct *mm, unsigned long vaddr); #define flush_tlb_kernel_range(start, end) \ do { flush_tsb_kernel_range(start,end); \ smp_flush_tlb_kernel_range(start, end); \ } while (0) +#define global_flush_tlb_page(mm, vaddr) \ + smp_flush_tlb_page(mm, vaddr) + #endif /* ! CONFIG_SMP */ #endif /* _SPARC64_TLBFLUSH_H */ diff --git a/arch/sparc/kernel/smp_64.c b/arch/sparc/kernel/smp_64.c index 99cb172..e82c18e 100644 --- a/arch/sparc/kernel/smp_64.c +++ b/arch/sparc/kernel/smp_64.c @@ -856,7 +856,7 @@ void smp_tsb_sync(struct mm_struct *mm) } extern unsigned long xcall_flush_tlb_mm; -extern unsigned long xcall_flush_tlb_pending; +extern unsigned long xcall_flush_tlb_page; extern unsigned long xcall_flush_tlb_kernel_range; extern unsigned long xcall_fetch_glob_regs; extern unsigned long xcall_receive_signal; @@ -1070,23 +1070,56 @@ local_flush_and_out: put_cpu(); } +struct tlb_pending_info { + unsigned long ctx; + unsigned long nr; + unsigned long *vaddrs; +}; + +static void tlb_pending_func(void *info) +{ + struct tlb_pending_info *t = info; + + __flush_tlb_pending(t->ctx, t->nr, t->vaddrs); +} + void smp_flush_tlb_pending(struct mm_struct *mm, unsigned long nr, unsigned long *vaddrs) { u32 ctx = CTX_HWBITS(mm->context); + struct tlb_pending_info info; int cpu = get_cpu(); + info.ctx = ctx; + info.nr = nr; + info.vaddrs = vaddrs; + if (mm == current->mm && atomic_read(&mm->mm_users) == 1) cpumask_copy(mm_cpumask(mm), cpumask_of(cpu)); else - smp_cross_call_masked(&xcall_flush_tlb_pending, - ctx, nr, (unsigned long) vaddrs, - mm_cpumask(mm)); + smp_call_function_many(mm_cpumask(mm), tlb_pending_func, + &info, 1); __flush_tlb_pending(ctx, nr, vaddrs); put_cpu(); } +void smp_flush_tlb_page(struct mm_struct *mm, unsigned long vaddr) +{ + unsigned long context = CTX_HWBITS(mm->context); + int cpu = get_cpu(); + + if (mm == current->mm && atomic_read(&mm->mm_users) == 1) + cpumask_copy(mm_cpumask(mm), cpumask_of(cpu)); + else + smp_cross_call_masked(&xcall_flush_tlb_page, + context, vaddr, 0, + mm_cpumask(mm)); + __flush_tlb_page(context, vaddr); + + put_cpu(); +} + void smp_flush_tlb_kernel_range(unsigned long start, unsigned long end) { start &= PAGE_MASK; diff --git a/arch/sparc/mm/tlb.c b/arch/sparc/mm/tlb.c index b1f279c..afd021e 100644 --- a/arch/sparc/mm/tlb.c +++ b/arch/sparc/mm/tlb.c @@ -24,11 +24,17 @@ static DEFINE_PER_CPU(struct tlb_batch, tlb_batch); void flush_tlb_pending(void) { struct tlb_batch *tb = &get_cpu_var(tlb_batch); + struct mm_struct *mm = tb->mm; - if (tb->tlb_nr) { - flush_tsb_user(tb); + if (!tb->tlb_nr) + goto out; - if (CTX_VALID(tb->mm->context)) { + flush_tsb_user(tb); + + if (CTX_VALID(mm->context)) { + if (tb->tlb_nr == 1) { + global_flush_tlb_page(mm, tb->vaddrs[0]); + } else { #ifdef CONFIG_SMP smp_flush_tlb_pending(tb->mm, tb->tlb_nr, &tb->vaddrs[0]); @@ -37,12 +43,30 @@ void flush_tlb_pending(void) tb->tlb_nr, &tb->vaddrs[0]); #endif } - tb->tlb_nr = 0; } + tb->tlb_nr = 0; + +out: put_cpu_var(tlb_batch); } +void arch_enter_lazy_mmu_mode(void) +{ + struct tlb_batch *tb = &__get_cpu_var(tlb_batch); + + tb->active = 1; +} + +void arch_leave_lazy_mmu_mode(void) +{ + struct tlb_batch *tb = &__get_cpu_var(tlb_batch); + + if (tb->tlb_nr) + flush_tlb_pending(); + tb->active = 0; +} + void tlb_batch_add(struct mm_struct *mm, unsigned long vaddr, pte_t *ptep, pte_t orig, int fullmm) { @@ -90,6 +114,12 @@ no_cache_flush: nr = 0; } + if (!tb->active) { + global_flush_tlb_page(mm, vaddr); + flush_tsb_user_page(mm, vaddr); + goto out; + } + if (nr == 0) tb->mm = mm; @@ -98,5 +128,6 @@ no_cache_flush: if (nr >= TLB_BATCH_NR) flush_tlb_pending(); +out: put_cpu_var(tlb_batch); } diff --git a/arch/sparc/mm/tsb.c b/arch/sparc/mm/tsb.c index a5f51b2..cb16ff3 100644 --- a/arch/sparc/mm/tsb.c +++ b/arch/sparc/mm/tsb.c @@ -8,11 +8,10 @@ #include #include #include -#include -#include -#include #include +#include #include +#include #include extern struct tsb swapper_tsb[KERNEL_TSB_NENTRIES]; @@ -47,23 +46,27 @@ void flush_tsb_kernel_range(unsigned long start, unsigned long end) } } -static void __flush_tsb_one(struct tlb_batch *tb, unsigned long hash_shift, - unsigned long tsb, unsigned long nentries) +static void __flush_tsb_one_entry(unsigned long tsb, unsigned long v, + unsigned long hash_shift, + unsigned long nentries) { - unsigned long i; + unsigned long tag, ent, hash; - for (i = 0; i < tb->tlb_nr; i++) { - unsigned long v = tb->vaddrs[i]; - unsigned long tag, ent, hash; + v &= ~0x1UL; + hash = tsb_hash(v, hash_shift, nentries); + ent = tsb + (hash * sizeof(struct tsb)); + tag = (v >> 22UL); - v &= ~0x1UL; + tsb_flush(ent, tag); +} - hash = tsb_hash(v, hash_shift, nentries); - ent = tsb + (hash * sizeof(struct tsb)); - tag = (v >> 22UL); +static void __flush_tsb_one(struct tlb_batch *tb, unsigned long hash_shift, + unsigned long tsb, unsigned long nentries) +{ + unsigned long i; - tsb_flush(ent, tag); - } + for (i = 0; i < tb->tlb_nr; i++) + __flush_tsb_one_entry(tsb, tb->vaddrs[i], hash_shift, nentries); } void flush_tsb_user(struct tlb_batch *tb) @@ -91,6 +94,30 @@ void flush_tsb_user(struct tlb_batch *tb) spin_unlock_irqrestore(&mm->context.lock, flags); } +void flush_tsb_user_page(struct mm_struct *mm, unsigned long vaddr) +{ + unsigned long nentries, base, flags; + + spin_lock_irqsave(&mm->context.lock, flags); + + base = (unsigned long) mm->context.tsb_block[MM_TSB_BASE].tsb; + nentries = mm->context.tsb_block[MM_TSB_BASE].tsb_nentries; + if (tlb_type == cheetah_plus || tlb_type == hypervisor) + base = __pa(base); + __flush_tsb_one_entry(base, vaddr, PAGE_SHIFT, nentries); + +#if defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE) + if (mm->context.tsb_block[MM_TSB_HUGE].tsb) { + base = (unsigned long) mm->context.tsb_block[MM_TSB_HUGE].tsb; + nentries = mm->context.tsb_block[MM_TSB_HUGE].tsb_nentries; + if (tlb_type == cheetah_plus || tlb_type == hypervisor) + base = __pa(base); + __flush_tsb_one_entry(base, vaddr, HPAGE_SHIFT, nentries); + } +#endif + spin_unlock_irqrestore(&mm->context.lock, flags); +} + #if defined(CONFIG_SPARC64_PAGE_SIZE_8KB) #define HV_PGSZ_IDX_BASE HV_PGSZ_IDX_8K #define HV_PGSZ_MASK_BASE HV_PGSZ_MASK_8K diff --git a/arch/sparc/mm/ultra.S b/arch/sparc/mm/ultra.S index 874162a..dd10caa 100644 --- a/arch/sparc/mm/ultra.S +++ b/arch/sparc/mm/ultra.S @@ -53,6 +53,33 @@ __flush_tlb_mm: /* 18 insns */ nop .align 32 + .globl __flush_tlb_page +__flush_tlb_page: /* 22 insns */ + /* %o0 = context, %o1 = vaddr */ + rdpr %pstate, %g7 + andn %g7, PSTATE_IE, %g2 + wrpr %g2, %pstate + mov SECONDARY_CONTEXT, %o4 + ldxa [%o4] ASI_DMMU, %g2 + stxa %o0, [%o4] ASI_DMMU + andcc %o1, 1, %g0 + andn %o1, 1, %o3 + be,pn %icc, 1f + or %o3, 0x10, %o3 + stxa %g0, [%o3] ASI_IMMU_DEMAP +1: stxa %g0, [%o3] ASI_DMMU_DEMAP + membar #Sync + stxa %g2, [%o4] ASI_DMMU + sethi %hi(KERNBASE), %o4 + flush %o4 + retl + wrpr %g7, 0x0, %pstate + nop + nop + nop + nop + + .align 32 .globl __flush_tlb_pending __flush_tlb_pending: /* 26 insns */ /* %o0 = context, %o1 = nr, %o2 = vaddrs[] */ @@ -203,6 +230,31 @@ __cheetah_flush_tlb_mm: /* 19 insns */ retl wrpr %g7, 0x0, %pstate +__cheetah_flush_tlb_page: /* 22 insns */ + /* %o0 = context, %o1 = vaddr */ + rdpr %pstate, %g7 + andn %g7, PSTATE_IE, %g2 + wrpr %g2, 0x0, %pstate + wrpr %g0, 1, %tl + mov PRIMARY_CONTEXT, %o4 + ldxa [%o4] ASI_DMMU, %g2 + srlx %g2, CTX_PGSZ1_NUC_SHIFT, %o3 + sllx %o3, CTX_PGSZ1_NUC_SHIFT, %o3 + or %o0, %o3, %o0 /* Preserve nucleus page size fields */ + stxa %o0, [%o4] ASI_DMMU + andcc %o1, 1, %g0 + be,pn %icc, 1f + andn %o1, 1, %o3 + stxa %g0, [%o3] ASI_IMMU_DEMAP +1: stxa %g0, [%o3] ASI_DMMU_DEMAP + membar #Sync + stxa %g2, [%o4] ASI_DMMU + sethi %hi(KERNBASE), %o4 + flush %o4 + wrpr %g0, 0, %tl + retl + wrpr %g7, 0x0, %pstate + __cheetah_flush_tlb_pending: /* 27 insns */ /* %o0 = context, %o1 = nr, %o2 = vaddrs[] */ rdpr %pstate, %g7 @@ -269,6 +321,20 @@ __hypervisor_flush_tlb_mm: /* 10 insns */ retl nop +__hypervisor_flush_tlb_page: /* 11 insns */ + /* %o0 = context, %o1 = vaddr */ + mov %o0, %g2 + mov %o1, %o0 /* ARG0: vaddr + IMMU-bit */ + mov %g2, %o1 /* ARG1: mmu context */ + mov HV_MMU_ALL, %o2 /* ARG2: flags */ + srlx %o0, PAGE_SHIFT, %o0 + sllx %o0, PAGE_SHIFT, %o0 + ta HV_MMU_UNMAP_ADDR_TRAP + brnz,pn %o0, __hypervisor_tlb_tl0_error + mov HV_MMU_UNMAP_ADDR_TRAP, %o1 + retl + nop + __hypervisor_flush_tlb_pending: /* 16 insns */ /* %o0 = context, %o1 = nr, %o2 = vaddrs[] */ sllx %o1, 3, %g1 @@ -339,6 +405,13 @@ cheetah_patch_cachetlbops: call tlb_patch_one mov 19, %o2 + sethi %hi(__flush_tlb_page), %o0 + or %o0, %lo(__flush_tlb_page), %o0 + sethi %hi(__cheetah_flush_tlb_page), %o1 + or %o1, %lo(__cheetah_flush_tlb_page), %o1 + call tlb_patch_one + mov 22, %o2 + sethi %hi(__flush_tlb_pending), %o0 or %o0, %lo(__flush_tlb_pending), %o0 sethi %hi(__cheetah_flush_tlb_pending), %o1 @@ -397,10 +470,9 @@ xcall_flush_tlb_mm: /* 21 insns */ nop nop - .globl xcall_flush_tlb_pending -xcall_flush_tlb_pending: /* 21 insns */ - /* %g5=context, %g1=nr, %g7=vaddrs[] */ - sllx %g1, 3, %g1 + .globl xcall_flush_tlb_page +xcall_flush_tlb_page: /* 17 insns */ + /* %g5=context, %g1=vaddr */ mov PRIMARY_CONTEXT, %g4 ldxa [%g4] ASI_DMMU, %g2 srlx %g2, CTX_PGSZ1_NUC_SHIFT, %g4 @@ -408,20 +480,16 @@ xcall_flush_tlb_pending: /* 21 insns */ or %g5, %g4, %g5 mov PRIMARY_CONTEXT, %g4 stxa %g5, [%g4] ASI_DMMU -1: sub %g1, (1 << 3), %g1 - ldx [%g7 + %g1], %g5 - andcc %g5, 0x1, %g0 + andcc %g1, 0x1, %g0 be,pn %icc, 2f - - andn %g5, 0x1, %g5 + andn %g1, 0x1, %g5 stxa %g0, [%g5] ASI_IMMU_DEMAP 2: stxa %g0, [%g5] ASI_DMMU_DEMAP membar #Sync - brnz,pt %g1, 1b - nop stxa %g2, [%g4] ASI_DMMU retry nop + nop .globl xcall_flush_tlb_kernel_range xcall_flush_tlb_kernel_range: /* 25 insns */ @@ -596,15 +664,13 @@ __hypervisor_xcall_flush_tlb_mm: /* 21 insns */ membar #Sync retry - .globl __hypervisor_xcall_flush_tlb_pending -__hypervisor_xcall_flush_tlb_pending: /* 21 insns */ - /* %g5=ctx, %g1=nr, %g7=vaddrs[], %g2,%g3,%g4,g6=scratch */ - sllx %g1, 3, %g1 + .globl __hypervisor_xcall_flush_tlb_page +__hypervisor_xcall_flush_tlb_page: /* 17 insns */ + /* %g5=ctx, %g1=vaddr */ mov %o0, %g2 mov %o1, %g3 mov %o2, %g4 -1: sub %g1, (1 << 3), %g1 - ldx [%g7 + %g1], %o0 /* ARG0: virtual address */ + mov %g1, %o0 /* ARG0: virtual address */ mov %g5, %o1 /* ARG1: mmu context */ mov HV_MMU_ALL, %o2 /* ARG2: flags */ srlx %o0, PAGE_SHIFT, %o0 @@ -613,8 +679,6 @@ __hypervisor_xcall_flush_tlb_pending: /* 21 insns */ mov HV_MMU_UNMAP_ADDR_TRAP, %g6 brnz,a,pn %o0, __hypervisor_tlb_xcall_error mov %o0, %g5 - brnz,pt %g1, 1b - nop mov %g2, %o0 mov %g3, %o1 mov %g4, %o2 @@ -697,6 +761,13 @@ hypervisor_patch_cachetlbops: call tlb_patch_one mov 10, %o2 + sethi %hi(__flush_tlb_page), %o0 + or %o0, %lo(__flush_tlb_page), %o0 + sethi %hi(__hypervisor_flush_tlb_page), %o1 + or %o1, %lo(__hypervisor_flush_tlb_page), %o1 + call tlb_patch_one + mov 11, %o2 + sethi %hi(__flush_tlb_pending), %o0 or %o0, %lo(__flush_tlb_pending), %o0 sethi %hi(__hypervisor_flush_tlb_pending), %o1 @@ -728,12 +799,12 @@ hypervisor_patch_cachetlbops: call tlb_patch_one mov 21, %o2 - sethi %hi(xcall_flush_tlb_pending), %o0 - or %o0, %lo(xcall_flush_tlb_pending), %o0 - sethi %hi(__hypervisor_xcall_flush_tlb_pending), %o1 - or %o1, %lo(__hypervisor_xcall_flush_tlb_pending), %o1 + sethi %hi(xcall_flush_tlb_page), %o0 + or %o0, %lo(xcall_flush_tlb_page), %o0 + sethi %hi(__hypervisor_xcall_flush_tlb_page), %o1 + or %o1, %lo(__hypervisor_xcall_flush_tlb_page), %o1 call tlb_patch_one - mov 21, %o2 + mov 17, %o2 sethi %hi(xcall_flush_tlb_kernel_range), %o0 or %o0, %lo(xcall_flush_tlb_kernel_range), %o0 -- cgit v1.1 From 7a0db699f49f9045484cf256316689cd6668f949 Mon Sep 17 00:00:00 2001 From: Sam Ravnborg Date: Tue, 27 Dec 2011 21:46:53 +0100 Subject: sparc32: support atomic64_t commit aea1181b0bd0a09c54546399768f359d1e198e45 upstream, Needed to compile ext4 for sparc32 since commit 503f4bdcc078e7abee273a85ce322de81b18a224 There is no-one that really require atomic64_t support on sparc32. But several drivers fails to build without proper atomic64 support. And for an allyesconfig build for sparc32 this is annoying. Include the generic atomic64_t support for sparc32. This has a text footprint cost: $size vmlinux (before atomic64_t support) text data bss dec hex filename 3578860 134260 108781 3821901 3a514d vmlinux $size vmlinux (after atomic64_t support) text data bss dec hex filename 3579892 130684 108781 3819357 3a475d vmlinux text increase (3579892 - 3578860) = 1032 bytes data decreases - but I fail to explain why! I have rebuild twice to check my numbers. Signed-off-by: Sam Ravnborg Signed-off-by: David S. Miller Signed-off-by: Andreas Larsson Signed-off-by: Greg Kroah-Hartman --- arch/sparc/Kconfig | 1 + arch/sparc/include/asm/atomic_32.h | 2 ++ 2 files changed, 3 insertions(+) (limited to 'arch') diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig index 9e70257..bc31e5e 100644 --- a/arch/sparc/Kconfig +++ b/arch/sparc/Kconfig @@ -31,6 +31,7 @@ config SPARC config SPARC32 def_bool !64BIT + select GENERIC_ATOMIC64 config SPARC64 def_bool 64BIT diff --git a/arch/sparc/include/asm/atomic_32.h b/arch/sparc/include/asm/atomic_32.h index 7ae128b..98f223a 100644 --- a/arch/sparc/include/asm/atomic_32.h +++ b/arch/sparc/include/asm/atomic_32.h @@ -15,6 +15,8 @@ #ifdef __KERNEL__ +#include + #include #define ATOMIC_INIT(i) { (i) } -- cgit v1.1 From 74f31cf3c186bc0189ad560fadfc03dd9aa2f806 Mon Sep 17 00:00:00 2001 From: Michael Neuling Date: Wed, 24 Apr 2013 00:30:09 +0000 Subject: powerpc: Add isync to copy_and_flush commit 29ce3c5073057991217916abc25628e906911757 upstream. In __after_prom_start we copy the kernel down to zero in two calls to copy_and_flush. After the first call (copy from 0 to copy_to_here:) we jump to the newly copied code soon after. Unfortunately there's no isync between the copy of this code and the jump to it. Hence it's possible that stale instructions could still be in the icache or pipeline before we branch to it. We've seen this on real machines and it's results in no console output after: calling quiesce... returning from prom_init The below adds an isync to ensure that the copy and flushing has completed before any branching to the new instructions occurs. Signed-off-by: Michael Neuling Signed-off-by: Benjamin Herrenschmidt Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/kernel/head_64.S | 1 + 1 file changed, 1 insertion(+) (limited to 'arch') diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S index e8befef..a5031c3 100644 --- a/arch/powerpc/kernel/head_64.S +++ b/arch/powerpc/kernel/head_64.S @@ -492,6 +492,7 @@ _GLOBAL(copy_and_flush) sync addi r5,r5,8 addi r6,r6,8 + isync blr .align 8 -- cgit v1.1 From ea70316c0e035731348f6be194e6332388944029 Mon Sep 17 00:00:00 2001 From: Michael Ellerman Date: Tue, 23 Apr 2013 15:13:14 +0000 Subject: powerpc/spufs: Initialise inode->i_ino in spufs_new_inode() commit 6747e83235caecd30b186d1282e4eba7679f81b7 upstream. In commit 85fe402 (fs: do not assign default i_ino in new_inode), the initialisation of i_ino was removed from new_inode() and pushed down into the callers. However spufs_new_inode() was not updated. This exhibits as no files appearing in /spu, because all our dirents have a zero inode, which readdir() seems to dislike. Signed-off-by: Michael Ellerman Signed-off-by: Benjamin Herrenschmidt Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/platforms/cell/spufs/inode.c | 1 + 1 file changed, 1 insertion(+) (limited to 'arch') diff --git a/arch/powerpc/platforms/cell/spufs/inode.c b/arch/powerpc/platforms/cell/spufs/inode.c index 856e9c3..6786f9d 100644 --- a/arch/powerpc/platforms/cell/spufs/inode.c +++ b/arch/powerpc/platforms/cell/spufs/inode.c @@ -100,6 +100,7 @@ spufs_new_inode(struct super_block *sb, int mode) if (!inode) goto out; + inode->i_ino = get_next_ino(); inode->i_mode = mode; inode->i_uid = current_fsuid(); inode->i_gid = current_fsgid(); -- cgit v1.1 From f7cfcd277732f50bbdaf56880546faddbb2a73ba Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk Date: Tue, 16 Apr 2013 15:18:00 -0400 Subject: xen/time: Fix kasprintf splat when allocating timer%d IRQ line. commit 7918c92ae9638eb8a6ec18e2b4a0de84557cccc8 upstream. When we online the CPU, we get this splat: smpboot: Booting Node 0 Processor 1 APIC 0x2 installing Xen timer for CPU 1 BUG: sleeping function called from invalid context at /home/konrad/ssd/konrad/linux/mm/slab.c:3179 in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/1 Pid: 0, comm: swapper/1 Not tainted 3.9.0-rc6upstream-00001-g3884fad #1 Call Trace: [] __might_sleep+0xda/0x100 [] __kmalloc_track_caller+0x1e7/0x2c0 [] ? kasprintf+0x38/0x40 [] kvasprintf+0x5b/0x90 [] kasprintf+0x38/0x40 [] xen_setup_timer+0x30/0xb0 [] xen_hvm_setup_cpu_clockevents+0x1f/0x30 [] start_secondary+0x19c/0x1a8 The solution to that is use kasprintf in the CPU hotplug path that 'online's the CPU. That is, do it in in xen_hvm_cpu_notify, and remove the call to in xen_hvm_setup_cpu_clockevents. Unfortunatly the later is not a good idea as the bootup path does not use xen_hvm_cpu_notify so we would end up never allocating timer%d interrupt lines when booting. As such add the check for atomic() to continue. Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Greg Kroah-Hartman --- arch/x86/xen/enlighten.c | 5 ++++- arch/x86/xen/time.c | 6 +++++- 2 files changed, 9 insertions(+), 2 deletions(-) (limited to 'arch') diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c index 9f808af..063ce1f 100644 --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -1365,8 +1365,11 @@ static int __cpuinit xen_hvm_cpu_notify(struct notifier_block *self, switch (action) { case CPU_UP_PREPARE: xen_vcpu_setup(cpu); - if (xen_have_vector_callback) + if (xen_have_vector_callback) { xen_init_lock_cpu(cpu); + if (xen_feature(XENFEAT_hvm_safe_pvclock)) + xen_setup_timer(cpu); + } break; default: break; diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c index 5158c50..4b0fb29 100644 --- a/arch/x86/xen/time.c +++ b/arch/x86/xen/time.c @@ -482,7 +482,11 @@ static void xen_hvm_setup_cpu_clockevents(void) { int cpu = smp_processor_id(); xen_setup_runstate_info(cpu); - xen_setup_timer(cpu); + /* + * xen_setup_timer(cpu) - snprintf is bad in atomic context. Hence + * doing it xen_hvm_cpu_notify (which gets called by smp_init during + * early bootup and also during CPU hotplug events). + */ xen_setup_cpu_clockevents(); } -- cgit v1.1 From c7baad48c3986e9949a7d42a41dd5081e2177044 Mon Sep 17 00:00:00 2001 From: Tony Luck Date: Wed, 20 Mar 2013 10:30:15 -0700 Subject: Fix initialization of CMCI/CMCP interrupts commit d303e9e98fce56cdb3c6f2ac92f626fc2bd51c77 upstream. Back 2010 during a revamp of the irq code some initializations were moved from ia64_mca_init() to ia64_mca_late_init() in commit c75f2aa13f5b268aba369b5dc566088b5194377c Cannot use register_percpu_irq() from ia64_mca_init() But this was hideously wrong. First of all these initializations are now down far too late. Specifically after all the other cpus have been brought up and initialized their own CMC vectors from smp_callin(). Also ia64_mca_late_init() may be called from any cpu so the line: ia64_mca_cmc_vector_setup(); /* Setup vector on BSP */ is generally not executed on the BSP, and so the CMC vector isn't setup at all on that processor. Make use of the arch_early_irq_init() hook to get this code executed at just the right moment: not too early, not too late. Reported-by: Fred Hartnett Tested-by: Fred Hartnett Signed-off-by: Tony Luck Signed-off-by: Greg Kroah-Hartman --- arch/ia64/include/asm/mca.h | 1 + arch/ia64/kernel/irq.c | 8 ++++++++ arch/ia64/kernel/mca.c | 37 ++++++++++++++++++++++++------------- 3 files changed, 33 insertions(+), 13 deletions(-) (limited to 'arch') diff --git a/arch/ia64/include/asm/mca.h b/arch/ia64/include/asm/mca.h index 43f96ab..8c70961 100644 --- a/arch/ia64/include/asm/mca.h +++ b/arch/ia64/include/asm/mca.h @@ -143,6 +143,7 @@ extern unsigned long __per_cpu_mca[NR_CPUS]; extern int cpe_vector; extern int ia64_cpe_irq; extern void ia64_mca_init(void); +extern void ia64_mca_irq_init(void); extern void ia64_mca_cpu_init(void *); extern void ia64_os_mca_dispatch(void); extern void ia64_os_mca_dispatch_end(void); diff --git a/arch/ia64/kernel/irq.c b/arch/ia64/kernel/irq.c index ad69606..f2c41828 100644 --- a/arch/ia64/kernel/irq.c +++ b/arch/ia64/kernel/irq.c @@ -23,6 +23,8 @@ #include #include +#include + /* * 'what should we do if we get a hw irq event on an illegal vector'. * each architecture has to answer this themselves. @@ -83,6 +85,12 @@ bool is_affinity_mask_valid(const struct cpumask *cpumask) #endif /* CONFIG_SMP */ +int __init arch_early_irq_init(void) +{ + ia64_mca_irq_init(); + return 0; +} + #ifdef CONFIG_HOTPLUG_CPU unsigned int vectors_in_migration[NR_IRQS]; diff --git a/arch/ia64/kernel/mca.c b/arch/ia64/kernel/mca.c index 84fb405..9b97303 100644 --- a/arch/ia64/kernel/mca.c +++ b/arch/ia64/kernel/mca.c @@ -2071,22 +2071,16 @@ ia64_mca_init(void) printk(KERN_INFO "MCA related initialization done\n"); } + /* - * ia64_mca_late_init - * - * Opportunity to setup things that require initialization later - * than ia64_mca_init. Setup a timer to poll for CPEs if the - * platform doesn't support an interrupt driven mechanism. - * - * Inputs : None - * Outputs : Status + * These pieces cannot be done in ia64_mca_init() because it is called before + * early_irq_init() which would wipe out our percpu irq registrations. But we + * cannot leave them until ia64_mca_late_init() because by then all the other + * processors have been brought online and have set their own CMC vectors to + * point at a non-existant action. Called from arch_early_irq_init(). */ -static int __init -ia64_mca_late_init(void) +void __init ia64_mca_irq_init(void) { - if (!mca_init) - return 0; - /* * Configure the CMCI/P vector and handler. Interrupts for CMC are * per-processor, so AP CMC interrupts are setup in smp_callin() (smpboot.c). @@ -2105,6 +2099,23 @@ ia64_mca_late_init(void) /* Setup the CPEI/P handler */ register_percpu_irq(IA64_CPEP_VECTOR, &mca_cpep_irqaction); #endif +} + +/* + * ia64_mca_late_init + * + * Opportunity to setup things that require initialization later + * than ia64_mca_init. Setup a timer to poll for CPEs if the + * platform doesn't support an interrupt driven mechanism. + * + * Inputs : None + * Outputs : Status + */ +static int __init +ia64_mca_late_init(void) +{ + if (!mca_init) + return 0; register_hotcpu_notifier(&mca_cpu_notifier); -- cgit v1.1 From 40b1161af55b80168e0188e9e34ee39b3dd8e2ed Mon Sep 17 00:00:00 2001 From: Stephan Schreiber Date: Tue, 19 Mar 2013 15:22:27 -0700 Subject: Wrong asm register contraints in the futex implementation commit 136f39ddc53db3bcee2befbe323a56d4fbf06da8 upstream. The Linux Kernel contains some inline assembly source code which has wrong asm register constraints in arch/ia64/include/asm/futex.h. I observed this on Kernel 3.2.23 but it is also true on the most recent Kernel 3.9-rc1. File arch/ia64/include/asm/futex.h: static inline int futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr, u32 oldval, u32 newval) { if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32))) return -EFAULT; { register unsigned long r8 __asm ("r8"); unsigned long prev; __asm__ __volatile__( " mf;; \n" " mov %0=r0 \n" " mov ar.ccv=%4;; \n" "[1:] cmpxchg4.acq %1=[%2],%3,ar.ccv \n" " .xdata4 \"__ex_table\", 1b-., 2f-. \n" "[2:]" : "=r" (r8), "=r" (prev) : "r" (uaddr), "r" (newval), "rO" ((long) (unsigned) oldval) : "memory"); *uval = prev; return r8; } } The list of output registers is : "=r" (r8), "=r" (prev) The constraint "=r" means that the GCC has to maintain that these vars are in registers and contain valid info when the program flow leaves the assembly block (output registers). But "=r" also means that GCC can put them in registers that are used as input registers. Input registers are uaddr, newval, oldval on the example. The second assembly instruction " mov %0=r0 \n" is the first one which writes to a register; it sets %0 to 0. %0 means the first register operand; it is r8 here. (The r0 is read-only and always 0 on the Itanium; it can be used if an immediate zero value is needed.) This instruction might overwrite one of the other registers which are still needed. Whether it really happens depends on how GCC decides what registers it uses and how it optimizes the code. The objdump utility can give us disassembly. The futex_atomic_cmpxchg_inatomic() function is inline, so we have to look for a module that uses the funtion. This is the cmpxchg_futex_value_locked() function in kernel/futex.c: static int cmpxchg_futex_value_locked(u32 *curval, u32 __user *uaddr, u32 uval, u32 newval) { int ret; pagefault_disable(); ret = futex_atomic_cmpxchg_inatomic(curval, uaddr, uval, newval); pagefault_enable(); return ret; } Now the disassembly. At first from the Kernel package 3.2.23 which has been compiled with GCC 4.4, remeber this Kernel seemed to work: objdump -d linux-3.2.23/debian/build/build_ia64_none_mckinley/kernel/futex.o 0000000000000230 : 230: 0b 18 80 1b 18 21 [MMI] adds r3=3168,r13;; 236: 80 40 0d 00 42 00 adds r8=40,r3 23c: 00 00 04 00 nop.i 0x0;; 240: 0b 50 00 10 10 10 [MMI] ld4 r10=[r8];; 246: 90 08 28 00 42 00 adds r9=1,r10 24c: 00 00 04 00 nop.i 0x0;; 250: 09 00 00 00 01 00 [MMI] nop.m 0x0 256: 00 48 20 20 23 00 st4 [r8]=r9 25c: 00 00 04 00 nop.i 0x0;; 260: 08 10 80 06 00 21 [MMI] adds r2=32,r3 266: 00 00 00 02 00 00 nop.m 0x0 26c: 02 08 f1 52 extr.u r16=r33,0,61 270: 05 40 88 00 08 e0 [MLX] addp4 r8=r34,r0 276: ff ff 0f 00 00 e0 movl r15=0xfffffffbfff;; 27c: f1 f7 ff 65 280: 09 70 00 04 18 10 [MMI] ld8 r14=[r2] 286: 00 00 00 02 00 c0 nop.m 0x0 28c: f0 80 1c d0 cmp.ltu p6,p7=r15,r16;; 290: 08 40 fc 1d 09 3b [MMI] cmp.eq p8,p9=-1,r14 296: 00 00 00 02 00 40 nop.m 0x0 29c: e1 08 2d d0 cmp.ltu p10,p11=r14,r33 2a0: 56 01 10 00 40 10 [BBB] (p10) br.cond.spnt.few 2e0 2a6: 02 08 00 80 21 03 (p08) br.cond.dpnt.few 2b0 2ac: 40 00 00 41 (p06) br.cond.spnt.few 2e0 2b0: 0a 00 00 00 22 00 [MMI] mf;; 2b6: 80 00 00 00 42 00 mov r8=r0 2bc: 00 00 04 00 nop.i 0x0 2c0: 0b 00 20 40 2a 04 [MMI] mov.m ar.ccv=r8;; 2c6: 10 1a 85 22 20 00 cmpxchg4.acq r33=[r33],r35,ar.ccv 2cc: 00 00 04 00 nop.i 0x0;; 2d0: 10 00 84 40 90 11 [MIB] st4 [r32]=r33 2d6: 00 00 00 02 00 00 nop.i 0x0 2dc: 20 00 00 40 br.few 2f0 2e0: 09 40 c8 f9 ff 27 [MMI] mov r8=-14 2e6: 00 00 00 02 00 00 nop.m 0x0 2ec: 00 00 04 00 nop.i 0x0;; 2f0: 0b 58 20 1a 19 21 [MMI] adds r11=3208,r13;; 2f6: 20 01 2c 20 20 00 ld4 r18=[r11] 2fc: 00 00 04 00 nop.i 0x0;; 300: 0b 88 fc 25 3f 23 [MMI] adds r17=-1,r18;; 306: 00 88 2c 20 23 00 st4 [r11]=r17 30c: 00 00 04 00 nop.i 0x0;; 310: 11 00 00 00 01 00 [MIB] nop.m 0x0 316: 00 00 00 02 00 80 nop.i 0x0 31c: 08 00 84 00 br.ret.sptk.many b0;; The lines 2b0: 0a 00 00 00 22 00 [MMI] mf;; 2b6: 80 00 00 00 42 00 mov r8=r0 2bc: 00 00 04 00 nop.i 0x0 2c0: 0b 00 20 40 2a 04 [MMI] mov.m ar.ccv=r8;; 2c6: 10 1a 85 22 20 00 cmpxchg4.acq r33=[r33],r35,ar.ccv 2cc: 00 00 04 00 nop.i 0x0;; are the instructions of the assembly block. The line 2b6: 80 00 00 00 42 00 mov r8=r0 sets the r8 register to 0 and after that 2c0: 0b 00 20 40 2a 04 [MMI] mov.m ar.ccv=r8;; prepares the 'oldvalue' for the cmpxchg but it takes it from r8. This is wrong. What happened here is what I explained above: An input register is overwritten which is still needed. The register operand constraints in futex.h are wrong. (The problem doesn't occur when the Kernel is compiled with GCC 4.6.) The attached patch fixes the register operand constraints in futex.h. The code after patching of it: static inline int futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr, u32 oldval, u32 newval) { if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32))) return -EFAULT; { register unsigned long r8 __asm ("r8") = 0; unsigned long prev; __asm__ __volatile__( " mf;; \n" " mov ar.ccv=%4;; \n" "[1:] cmpxchg4.acq %1=[%2],%3,ar.ccv \n" " .xdata4 \"__ex_table\", 1b-., 2f-. \n" "[2:]" : "+r" (r8), "=&r" (prev) : "r" (uaddr), "r" (newval), "rO" ((long) (unsigned) oldval) : "memory"); *uval = prev; return r8; } } I also initialized the 'r8' var with the C programming language. The _asm qualifier on the definition of the 'r8' var forces GCC to use the r8 processor register for it. I don't believe that we should use inline assembly for zeroing out a local variable. The constraint is "+r" (r8) what means that it is both an input register and an output register. Note that the page fault handler will modify the r8 register which will be the return value of the function. The real fix is "=&r" (prev) The & means that GCC must not use any of the input registers to place this output register in. Patched the Kernel 3.2.23 and compiled it with GCC4.4: 0000000000000230 : 230: 0b 18 80 1b 18 21 [MMI] adds r3=3168,r13;; 236: 80 40 0d 00 42 00 adds r8=40,r3 23c: 00 00 04 00 nop.i 0x0;; 240: 0b 50 00 10 10 10 [MMI] ld4 r10=[r8];; 246: 90 08 28 00 42 00 adds r9=1,r10 24c: 00 00 04 00 nop.i 0x0;; 250: 09 00 00 00 01 00 [MMI] nop.m 0x0 256: 00 48 20 20 23 00 st4 [r8]=r9 25c: 00 00 04 00 nop.i 0x0;; 260: 08 10 80 06 00 21 [MMI] adds r2=32,r3 266: 20 12 01 10 40 00 addp4 r34=r34,r0 26c: 02 08 f1 52 extr.u r16=r33,0,61 270: 05 40 00 00 00 e1 [MLX] mov r8=r0 276: ff ff 0f 00 00 e0 movl r15=0xfffffffbfff;; 27c: f1 f7 ff 65 280: 09 70 00 04 18 10 [MMI] ld8 r14=[r2] 286: 00 00 00 02 00 c0 nop.m 0x0 28c: f0 80 1c d0 cmp.ltu p6,p7=r15,r16;; 290: 08 40 fc 1d 09 3b [MMI] cmp.eq p8,p9=-1,r14 296: 00 00 00 02 00 40 nop.m 0x0 29c: e1 08 2d d0 cmp.ltu p10,p11=r14,r33 2a0: 56 01 10 00 40 10 [BBB] (p10) br.cond.spnt.few 2e0 2a6: 02 08 00 80 21 03 (p08) br.cond.dpnt.few 2b0 2ac: 40 00 00 41 (p06) br.cond.spnt.few 2e0 2b0: 0b 00 00 00 22 00 [MMI] mf;; 2b6: 00 10 81 54 08 00 mov.m ar.ccv=r34 2bc: 00 00 04 00 nop.i 0x0;; 2c0: 09 58 8c 42 11 10 [MMI] cmpxchg4.acq r11=[r33],r35,ar.ccv 2c6: 00 00 00 02 00 00 nop.m 0x0 2cc: 00 00 04 00 nop.i 0x0;; 2d0: 10 00 2c 40 90 11 [MIB] st4 [r32]=r11 2d6: 00 00 00 02 00 00 nop.i 0x0 2dc: 20 00 00 40 br.few 2f0 2e0: 09 40 c8 f9 ff 27 [MMI] mov r8=-14 2e6: 00 00 00 02 00 00 nop.m 0x0 2ec: 00 00 04 00 nop.i 0x0;; 2f0: 0b 88 20 1a 19 21 [MMI] adds r17=3208,r13;; 2f6: 30 01 44 20 20 00 ld4 r19=[r17] 2fc: 00 00 04 00 nop.i 0x0;; 300: 0b 90 fc 27 3f 23 [MMI] adds r18=-1,r19;; 306: 00 90 44 20 23 00 st4 [r17]=r18 30c: 00 00 04 00 nop.i 0x0;; 310: 11 00 00 00 01 00 [MIB] nop.m 0x0 316: 00 00 00 02 00 80 nop.i 0x0 31c: 08 00 84 00 br.ret.sptk.many b0;; Much better. There is a 270: 05 40 00 00 00 e1 [MLX] mov r8=r0 which was generated by C code r8 = 0. Below 2b6: 00 10 81 54 08 00 mov.m ar.ccv=r34 what means that oldval is no longer overwritten. This is Debian bug#702641 (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=702641). The patch is applicable on Kernel 3.9-rc1, 3.2.23 and many other versions. Signed-off-by: Stephan Schreiber Signed-off-by: Tony Luck Signed-off-by: Greg Kroah-Hartman --- arch/ia64/include/asm/futex.h | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) (limited to 'arch') diff --git a/arch/ia64/include/asm/futex.h b/arch/ia64/include/asm/futex.h index 21ab376..1bd14d5 100644 --- a/arch/ia64/include/asm/futex.h +++ b/arch/ia64/include/asm/futex.h @@ -107,16 +107,15 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr, return -EFAULT; { - register unsigned long r8 __asm ("r8"); + register unsigned long r8 __asm ("r8") = 0; unsigned long prev; __asm__ __volatile__( " mf;; \n" - " mov %0=r0 \n" " mov ar.ccv=%4;; \n" "[1:] cmpxchg4.acq %1=[%2],%3,ar.ccv \n" " .xdata4 \"__ex_table\", 1b-., 2f-. \n" "[2:]" - : "=r" (r8), "=r" (prev) + : "+r" (r8), "=&r" (prev) : "r" (uaddr), "r" (newval), "rO" ((long) (unsigned) oldval) : "memory"); -- cgit v1.1 From 2f0441ee08f711413b85c8c3a75734913fc6bca9 Mon Sep 17 00:00:00 2001 From: Stephan Schreiber Date: Tue, 19 Mar 2013 15:27:12 -0700 Subject: Wrong asm register contraints in the kvm implementation commit de53e9caa4c6149ef4a78c2f83d7f5b655848767 upstream. The Linux Kernel contains some inline assembly source code which has wrong asm register constraints in arch/ia64/kvm/vtlb.c. I observed this on Kernel 3.2.35 but it is also true on the most recent Kernel 3.9-rc1. File arch/ia64/kvm/vtlb.c: u64 guest_vhpt_lookup(u64 iha, u64 *pte) { u64 ret; struct thash_data *data; data = __vtr_lookup(current_vcpu, iha, D_TLB); if (data != NULL) thash_vhpt_insert(current_vcpu, data->page_flags, data->itir, iha, D_TLB); asm volatile ( "rsm psr.ic|psr.i;;" "srlz.d;;" "ld8.s r9=[%1];;" "tnat.nz p6,p7=r9;;" "(p6) mov %0=1;" "(p6) mov r9=r0;" "(p7) extr.u r9=r9,0,53;;" "(p7) mov %0=r0;" "(p7) st8 [%2]=r9;;" "ssm psr.ic;;" "srlz.d;;" "ssm psr.i;;" "srlz.d;;" : "=r"(ret) : "r"(iha), "r"(pte):"memory"); return ret; } The list of output registers is : "=r"(ret) : "r"(iha), "r"(pte):"memory"); The constraint "=r" means that the GCC has to maintain that these vars are in registers and contain valid info when the program flow leaves the assembly block (output registers). But "=r" also means that GCC can put them in registers that are used as input registers. Input registers are iha, pte on the example. If the predicate p7 is true, the 8th assembly instruction "(p7) mov %0=r0;" is the first one which writes to a register which is maintained by the register constraints; it sets %0. %0 means the first register operand; it is ret here. This instruction might overwrite the %2 register (pte) which is needed by the next instruction: "(p7) st8 [%2]=r9;;" Whether it really happens depends on how GCC decides what registers it uses and how it optimizes the code. The attached patch fixes the register operand constraints in arch/ia64/kvm/vtlb.c. The register constraints should be : "=&r"(ret) : "r"(iha), "r"(pte):"memory"); The & means that GCC must not use any of the input registers to place this output register in. This is Debian bug#702639 (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=702639). The patch is applicable on Kernel 3.9-rc1, 3.2.35 and many other versions. Signed-off-by: Stephan Schreiber Signed-off-by: Tony Luck Signed-off-by: Greg Kroah-Hartman --- arch/ia64/kvm/vtlb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/ia64/kvm/vtlb.c b/arch/ia64/kvm/vtlb.c index 4332f7e..a7869f8 100644 --- a/arch/ia64/kvm/vtlb.c +++ b/arch/ia64/kvm/vtlb.c @@ -256,7 +256,7 @@ u64 guest_vhpt_lookup(u64 iha, u64 *pte) "srlz.d;;" "ssm psr.i;;" "srlz.d;;" - : "=r"(ret) : "r"(iha), "r"(pte):"memory"); + : "=&r"(ret) : "r"(iha), "r"(pte) : "memory"); return ret; } -- cgit v1.1 From e34eca4c2d2f1783c94fad22d72ebb304c3f0728 Mon Sep 17 00:00:00 2001 From: Li Fei Date: Fri, 26 Apr 2013 20:50:11 +0800 Subject: x86: Eliminate irq_mis_count counted in arch_irq_stat commit f7b0e1055574ce06ab53391263b4e205bf38daf3 upstream. With the current implementation, kstat_cpu(cpu).irqs_sum is also increased in case of irq_mis_count increment. So there is no need to count irq_mis_count in arch_irq_stat, otherwise irq_mis_count will be counted twice in the sum of /proc/stat. Reported-by: Liu Chuansheng Signed-off-by: Li Fei Acked-by: Liu Chuansheng Cc: tomoki.sekiyama.qu@hitachi.com Cc: joe@perches.com Link: http://lkml.kernel.org/r/1366980611.32469.7.camel@fli24-HP-Compaq-8100-Elite-CMT-PC Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/irq.c | 4 ---- 1 file changed, 4 deletions(-) (limited to 'arch') diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c index 6c0802e..a669961 100644 --- a/arch/x86/kernel/irq.c +++ b/arch/x86/kernel/irq.c @@ -159,10 +159,6 @@ u64 arch_irq_stat_cpu(unsigned int cpu) u64 arch_irq_stat(void) { u64 sum = atomic_read(&irq_err_count); - -#ifdef CONFIG_X86_IO_APIC - sum += atomic_read(&irq_mis_count); -#endif return sum; } -- cgit v1.1 From 2232c3d8b3d44591be5e7426b368858da68048ac Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Wed, 17 Apr 2013 08:46:19 -0700 Subject: s390: move dummy io_remap_pfn_range() to asm/pgtable.h commit 4f2e29031e6c67802e7370292dd050fd62f337ee upstream. Commit b4cbb197c7e7 ("vm: add vm_iomap_memory() helper function") added a helper function wrapper around io_remap_pfn_range(), and every other architecture defined it in . The s390 choice of may make sense, but is not very convenient for this case, and gratuitous differences like that cause unexpected errors like this: mm/memory.c: In function 'vm_iomap_memory': mm/memory.c:2439:2: error: implicit declaration of function 'io_remap_pfn_range' [-Werror=implicit-function-declaration] Glory be the kbuild test robot who noticed this, bisected it, and reported it to the guilty parties (ie me). Signed-off-by: Linus Torvalds Cc: Martin Schwidefsky Cc: Heiko Carstens Cc: Jiri Slaby Signed-off-by: Greg Kroah-Hartman --- arch/s390/include/asm/pgtable.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'arch') diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h index 801fbe1..4e15253 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -67,6 +67,10 @@ static inline int is_zero_pfn(unsigned long pfn) #define my_zero_pfn(addr) page_to_pfn(ZERO_PAGE(addr)) +/* TODO: s390 cannot support io_remap_pfn_range... */ +#define io_remap_pfn_range(vma, vaddr, pfn, size, prot) \ + remap_pfn_range(vma, vaddr, pfn, size, prot) + #endif /* !__ASSEMBLY__ */ /* -- cgit v1.1 From 5013bcf5cbd6969b8873b964e0d7aaab430cf643 Mon Sep 17 00:00:00 2001 From: Vaidyanathan Srinivasan Date: Fri, 22 Mar 2013 05:49:35 +0000 Subject: powerpc: fix numa distance for form0 device tree commit 7122beeee7bc1757682049780179d7c216dd1c83 upstream. The following commit breaks numa distance setup for old powerpc systems that use form0 encoding in device tree. commit 41eab6f88f24124df89e38067b3766b7bef06ddb powerpc/numa: Use form 1 affinity to setup node distance Device tree node /rtas/ibm,associativity-reference-points would index into /cpus/PowerPCxxxx/ibm,associativity based on form0 or form1 encoding detected by ibm,architecture-vec-5 property. All modern systems use form1 and current kernel code is correct. However, on older systems with form0 encoding, the numa distance will get hard coded as LOCAL_DISTANCE for all nodes. This causes task scheduling anomaly since scheduler will skip building numa level domain (topmost domain with all cpus) if all numa distances are same. (value of 'level' in sched_init_numa() will remain 0) Prior to the above commit: ((from) == (to) ? LOCAL_DISTANCE : REMOTE_DISTANCE) Restoring compatible behavior with this patch for old powerpc systems with device tree where numa distance are encoded as form0. Signed-off-by: Vaidyanathan Srinivasan Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/mm/numa.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index 2c1ae7a..97042c6 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -221,7 +221,7 @@ int __node_distance(int a, int b) int distance = LOCAL_DISTANCE; if (!form1_affinity) - return distance; + return ((a == b) ? LOCAL_DISTANCE : REMOTE_DISTANCE); for (i = 0; i < distance_ref_points_depth; i++) { if (distance_lookup_table[a][i] == distance_lookup_table[b][i]) -- cgit v1.1 From dadd72be605e99445bedfacce8d07a85ac84eb41 Mon Sep 17 00:00:00 2001 From: Jerry Hoemann Date: Tue, 30 Apr 2013 15:15:55 -0600 Subject: x86/mm: account for PGDIR_SIZE alignment Patch for -stable. Function find_early_table_space removed upstream. Fixes panic in alloc_low_page due to pgt_buf overflow during init_memory_mapping. find_early_table_space sizes pgt_buf based upon the size of the memory being mapped, but it does not take into account the alignment of the memory. When the region being mapped spans a 512GB (PGDIR_SIZE) alignment, a panic from alloc_low_pages occurs. kernel_physical_mapping_init takes into account PGDIR_SIZE alignment. This causes an extra call to alloc_low_page to be made. This extra call isn't accounted for by find_early_table_space and causes a kernel panic. Change is to take into account PGDIR_SIZE alignment in find_early_table_space. Signed-off-by: Jerry Hoemann Signed-off-by: Greg Kroah-Hartman --- arch/x86/mm/init.c | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'arch') diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index c22c423..96c4577 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -44,11 +44,15 @@ static void __init find_early_table_space(struct map_range *mr, int nr_range) int i; unsigned long puds = 0, pmds = 0, ptes = 0, tables; unsigned long start = 0, good_end; + unsigned long pgd_extra = 0; phys_addr_t base; for (i = 0; i < nr_range; i++) { unsigned long range, extra; + if ((mr[i].end >> PGDIR_SHIFT) - (mr[i].start >> PGDIR_SHIFT)) + pgd_extra++; + range = mr[i].end - mr[i].start; puds += (range + PUD_SIZE - 1) >> PUD_SHIFT; @@ -73,6 +77,7 @@ static void __init find_early_table_space(struct map_range *mr, int nr_range) tables = roundup(puds * sizeof(pud_t), PAGE_SIZE); tables += roundup(pmds * sizeof(pmd_t), PAGE_SIZE); tables += roundup(ptes * sizeof(pte_t), PAGE_SIZE); + tables += (pgd_extra * PAGE_SIZE); #ifdef CONFIG_X86_32 /* for fixmap */ -- cgit v1.1 From 1183e651202ae381263309f6f122e0bf1234b5e2 Mon Sep 17 00:00:00 2001 From: Andre Przywara Date: Wed, 31 Oct 2012 17:20:50 +0100 Subject: Revert "x86, amd: Disable way access filter on Piledriver CPUs" it is duplicated Revert 5e3fe67e02c53e5a5fcf0e2b0d91dd93f757d50b which is commit 2bbf0a1427c377350f001fbc6260995334739ad7 upstream. Willy pointed out that I messed up and applied this one twice to the 3.0-stable tree, so revert the second instance of it. Reported by: Willy Tarreau Cc: Andre Przywara Cc: H. Peter Anvin Cc: CAI Qian Signed-off-by: Greg Kroah-Hartman reverted: --- arch/x86/kernel/cpu/amd.c | 14 -------------- 1 file changed, 14 deletions(-) (limited to 'arch') diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c index a93741d..3f4b6da 100644 --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -568,20 +568,6 @@ static void __cpuinit init_amd(struct cpuinfo_x86 *c) } } - /* - * The way access filter has a performance penalty on some workloads. - * Disable it on the affected CPUs. - */ - if ((c->x86 == 0x15) && - (c->x86_model >= 0x02) && (c->x86_model < 0x20)) { - u64 val; - - if (!rdmsrl_safe(0xc0011021, &val) && !(val & 0x1E)) { - val |= 0x1E; - checking_wrmsrl(0xc0011021, val); - } - } - cpu_detect_cache_sizes(c); /* Multi core CPU? */ -- cgit v1.1 From e171327c07f33c79dab763e08feb7b0ad24dfe71 Mon Sep 17 00:00:00 2001 From: Gleb Natapov Date: Wed, 8 May 2013 18:38:44 +0300 Subject: KVM: VMX: fix halt emulation while emulating invalid guest sate commit 8d76c49e9ffeee839bc0b7a3278a23f99101263e upstream. The invalid guest state emulation loop does not check halt_request which causes 100% cpu loop while guest is in halt and in invalid state, but more serious issue is that this leaves halt_request set, so random instruction emulated by vm86 #GP exit can be interpreted as halt which causes guest hang. Fix both problems by handling halt_request in emulation loop. Reported-by: Tomas Papan Tested-by: Tomas Papan Reviewed-by: Paolo Bonzini Signed-off-by: Gleb Natapov Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/vmx.c | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'arch') diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 2ad060a..be1d830 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3836,6 +3836,12 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu) if (err != EMULATE_DONE) return 0; + if (vcpu->arch.halt_request) { + vcpu->arch.halt_request = 0; + ret = kvm_emulate_halt(vcpu); + goto out; + } + if (signal_pending(current)) goto out; if (need_resched()) -- cgit v1.1 From f3fb49dfccbf7bb7005bd57bb7385c3ee80d8c52 Mon Sep 17 00:00:00 2001 From: Aaro Koskinen Date: Wed, 8 May 2013 16:48:00 -0700 Subject: ARM: OMAP: RX-51: change probe order of touchscreen and panel SPI devices MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit e65f131a14726e5f1b880a528271a52428e5b3a5 upstream. Commit 9fdca9df (spi: omap2-mcspi: convert to module_platform_driver) broke the SPI display/panel driver probe on RX-51/N900. The exact cause is not fully understood, but it seems to be related to the probe order. SPI communication to the panel driver (spi1.2) fails unless the touchscreen (spi1.0) has been probed/initialized before. When the omap2-mcspi driver was converted to a platform driver, it resulted in that the devices are probed immediately after the board registers them in the order they are listed in the board file. Fix the issue by moving the touchscreen before the panel in the SPI device list. The patch fixes the following failure: [ 1.260955] acx565akm spi1.2: invalid display ID [ 1.265899] panel-acx565akm display0: acx_panel_probe panel detect error [ 1.273071] omapdss CORE error: driver probe failed: -19 Tested-by: Sebastian Reichel Signed-off-by: Aaro Koskinen Cc: Pali Rohár Cc: Joni Lapilainen Cc: Tomi Valkeinen Cc: Felipe Balbi Signed-off-by: Tony Lindgren Signed-off-by: Greg Kroah-Hartman --- arch/arm/mach-omap2/board-rx51-peripherals.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'arch') diff --git a/arch/arm/mach-omap2/board-rx51-peripherals.c b/arch/arm/mach-omap2/board-rx51-peripherals.c index c565971..9a1e1f7 100644 --- a/arch/arm/mach-omap2/board-rx51-peripherals.c +++ b/arch/arm/mach-omap2/board-rx51-peripherals.c @@ -56,11 +56,11 @@ #define RX51_USB_TRANSCEIVER_RST_GPIO 67 -/* list all spi devices here */ +/* List all SPI devices here. Note that the list/probe order seems to matter! */ enum { RX51_SPI_WL1251, - RX51_SPI_MIPID, /* LCD panel */ RX51_SPI_TSC2005, /* Touch Controller */ + RX51_SPI_MIPID, /* LCD panel */ }; static struct wl12xx_platform_data wl1251_pdata; -- cgit v1.1 From a3b5b07e0d750c300d771a0a8e5ad24898bcdd9b Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk Date: Sun, 5 May 2013 09:30:09 -0400 Subject: xen/vcpu/pvhvm: Fix vcpu hotplugging hanging. commit 7f1fc268c47491fd5e63548f6415fc8604e13003 upstream. If a user did: echo 0 > /sys/devices/system/cpu/cpu1/online echo 1 > /sys/devices/system/cpu/cpu1/online we would (this a build with DEBUG enabled) get to: smpboot: ++++++++++++++++++++=_---CPU UP 1 .. snip.. smpboot: Stack at about ffff880074c0ff44 smpboot: CPU1: has booted. and hang. The RCU mechanism would kick in an try to IPI the CPU1 but the IPIs (and all other interrupts) would never arrive at the CPU1. At first glance at least. A bit digging in the hypervisor trace shows that (using xenanalyze): [vla] d4v1 vec 243 injecting 0.043163027 --|x d4v1 intr_window vec 243 src 5(vector) intr f3 ] 0.043163639 --|x d4v1 vmentry cycles 1468 ] 0.043164913 --|x d4v1 vmexit exit_reason PENDING_INTERRUPT eip ffffffff81673254 0.043164913 --|x d4v1 inj_virq vec 243 real [vla] d4v1 vec 243 injecting 0.043164913 --|x d4v1 intr_window vec 243 src 5(vector) intr f3 ] 0.043165526 --|x d4v1 vmentry cycles 1472 ] 0.043166800 --|x d4v1 vmexit exit_reason PENDING_INTERRUPT eip ffffffff81673254 0.043166800 --|x d4v1 inj_virq vec 243 real [vla] d4v1 vec 243 injecting there is a pending event (subsequent debugging shows it is the IPI from the VCPU0 when smpboot.c on VCPU1 has done "set_cpu_online(smp_processor_id(), true)") and the guest VCPU1 is interrupted with the callback IPI (0xf3 aka 243) which ends up calling __xen_evtchn_do_upcall. The __xen_evtchn_do_upcall seems to do *something* but not acknowledge the pending events. And the moment the guest does a 'cli' (that is the ffffffff81673254 in the log above) the hypervisor is invoked again to inject the IPI (0xf3) to tell the guest it has pending interrupts. This repeats itself forever. The culprit was the per_cpu(xen_vcpu, cpu) pointer. At the bootup we set each per_cpu(xen_vcpu, cpu) to point to the shared_info->vcpu_info[vcpu] but later on use the VCPUOP_register_vcpu_info to register per-CPU structures (xen_vcpu_setup). This is used to allow events for more than 32 VCPUs and for performance optimizations reasons. When the user performs the VCPU hotplug we end up calling the the xen_vcpu_setup once more. We make the hypercall which returns -EINVAL as it does not allow multiple registration calls (and already has re-assigned where the events are being set). We pick the fallback case and set per_cpu(xen_vcpu, cpu) to point to the shared_info->vcpu_info[vcpu] (which is a good fallback during bootup). However the hypervisor is still setting events in the register per-cpu structure (per_cpu(xen_vcpu_info, cpu)). As such when the events are set by the hypervisor (such as timer one), and when we iterate in __xen_evtchn_do_upcall we end up reading stale events from the shared_info->vcpu_info[vcpu] instead of the per_cpu(xen_vcpu_info, cpu) structures. Hence we never acknowledge the events that the hypervisor has set and the hypervisor keeps on reminding us to ack the events which we never do. The fix is simple. Don't on the second time when xen_vcpu_setup is called over-write the per_cpu(xen_vcpu, cpu) if it points to per_cpu(xen_vcpu_info). Acked-by: Stefano Stabellini Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Greg Kroah-Hartman --- arch/x86/xen/enlighten.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) (limited to 'arch') diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c index 063ce1f..e11efbd 100644 --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -129,6 +129,21 @@ static void xen_vcpu_setup(int cpu) BUG_ON(HYPERVISOR_shared_info == &xen_dummy_shared_info); + /* + * This path is called twice on PVHVM - first during bootup via + * smp_init -> xen_hvm_cpu_notify, and then if the VCPU is being + * hotplugged: cpu_up -> xen_hvm_cpu_notify. + * As we can only do the VCPUOP_register_vcpu_info once lets + * not over-write its result. + * + * For PV it is called during restore (xen_vcpu_restore) and bootup + * (xen_setup_vcpu_info_placement). The hotplug mechanism does not + * use this function. + */ + if (xen_hvm_domain()) { + if (per_cpu(xen_vcpu, cpu) == &per_cpu(xen_vcpu_info, cpu)) + return; + } if (cpu < MAX_VIRT_CPUS) per_cpu(xen_vcpu,cpu) = &HYPERVISOR_shared_info->vcpu_info[cpu]; -- cgit v1.1 From e9a91cb47886388540eaf68f981e7a3d4b04a27c Mon Sep 17 00:00:00 2001 From: Hans-Christian Egtvedt Date: Mon, 13 May 2013 22:22:10 +0200 Subject: avr32: fix relocation check for signed 18-bit offset commit e68c636d88db3fda74e664ecb1a213ae0d50a7d8 upstream. Caught by static code analysis by David. Reported-by: David Binderman Signed-off-by: Hans-Christian Egtvedt Signed-off-by: Greg Kroah-Hartman --- arch/avr32/kernel/module.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/avr32/kernel/module.c b/arch/avr32/kernel/module.c index a727f54..9c266ab 100644 --- a/arch/avr32/kernel/module.c +++ b/arch/avr32/kernel/module.c @@ -271,7 +271,7 @@ int apply_relocate_add(Elf32_Shdr *sechdrs, const char *strtab, break; case R_AVR32_GOT18SW: if ((relocation & 0xfffe0003) != 0 - && (relocation & 0xfffc0003) != 0xffff0000) + && (relocation & 0xfffc0000) != 0xfffc0000) return reloc_overflow(module, "R_AVR32_GOT18SW", relocation); relocation >>= 2; -- cgit v1.1 From 8a3e6d89936003e13011ab01dacdf96c66a0e465 Mon Sep 17 00:00:00 2001 From: Gregory CLEMENT Date: Sun, 19 May 2013 22:12:43 +0200 Subject: ARM: plat-orion: Fix num_resources and id for ge10 and ge11 commit 2b8b2797142c7951e635c6eec5d1705ee9bc45c5 upstream. When platform data were moved from arch/arm/mach-mv78xx0/common.c to arch/arm/plat-orion/common.c with the commit "7e3819d ARM: orion: Consolidate ethernet platform data", there were few typo made on gigabit Ethernet interface ge10 and ge11. This commit writes back their initial value, which allows to use this interfaces again. Signed-off-by: Gregory CLEMENT Acked-by: Andrew Lunn Signed-off-by: Jason Cooper Signed-off-by: Greg Kroah-Hartman --- arch/arm/plat-orion/common.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) (limited to 'arch') diff --git a/arch/arm/plat-orion/common.c b/arch/arm/plat-orion/common.c index 11dce87..214b002 100644 --- a/arch/arm/plat-orion/common.c +++ b/arch/arm/plat-orion/common.c @@ -343,7 +343,7 @@ static struct resource orion_ge10_shared_resources[] = { static struct platform_device orion_ge10_shared = { .name = MV643XX_ETH_SHARED_NAME, - .id = 1, + .id = 2, .dev = { .platform_data = &orion_ge10_shared_data, }, @@ -358,8 +358,8 @@ static struct resource orion_ge10_resources[] = { static struct platform_device orion_ge10 = { .name = MV643XX_ETH_NAME, - .id = 1, - .num_resources = 2, + .id = 2, + .num_resources = 1, .resource = orion_ge10_resources, .dev = { .coherent_dma_mask = DMA_BIT_MASK(32), @@ -397,7 +397,7 @@ static struct resource orion_ge11_shared_resources[] = { static struct platform_device orion_ge11_shared = { .name = MV643XX_ETH_SHARED_NAME, - .id = 1, + .id = 3, .dev = { .platform_data = &orion_ge11_shared_data, }, @@ -412,8 +412,8 @@ static struct resource orion_ge11_resources[] = { static struct platform_device orion_ge11 = { .name = MV643XX_ETH_NAME, - .id = 1, - .num_resources = 2, + .id = 3, + .num_resources = 1, .resource = orion_ge11_resources, .dev = { .coherent_dma_mask = DMA_BIT_MASK(32), -- cgit v1.1 From 891694374dbdf88b12f41fa412ead40a4d255071 Mon Sep 17 00:00:00 2001 From: Martin Michlmayr Date: Sun, 21 Apr 2013 17:14:00 +0100 Subject: Kirkwood: Enable PCIe port 1 on QNAP TS-11x/TS-21x commit 99e11334dcb846f9b76fb808196c7f47aa83abb3 upstream. Enable KW_PCIE1 on QNAP TS-11x/TS-21x devices as newer revisions (rev 1.3) have a USB 3.0 chip from Etron on PCIe port 1. Thanks to Marek Vasut for identifying this issue! Signed-off-by: Martin Michlmayr Tested-by: Marek Vasut Acked-by: Andrew Lunn Signed-off-by: Jason Cooper Signed-off-by: Greg Kroah-Hartman --- arch/arm/mach-kirkwood/ts219-setup.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/arm/mach-kirkwood/ts219-setup.c b/arch/arm/mach-kirkwood/ts219-setup.c index 68f32f2..eb1a7ba 100644 --- a/arch/arm/mach-kirkwood/ts219-setup.c +++ b/arch/arm/mach-kirkwood/ts219-setup.c @@ -124,7 +124,7 @@ static void __init qnap_ts219_init(void) static int __init ts219_pci_init(void) { if (machine_is_ts219()) - kirkwood_pcie_init(KW_PCIE0); + kirkwood_pcie_init(KW_PCIE1 | KW_PCIE0); return 0; } -- cgit v1.1 From 9392bf7c8a7fd63c1ff1dbba237d67b95dae5cf9 Mon Sep 17 00:00:00 2001 From: Richard Weinberger Date: Tue, 7 Feb 2012 01:22:47 +0100 Subject: um: Serve io_remap_pfn_range() commit 4d94d6d030adfdea4837694d293ec6918d133ab2 upstream. At some places io_remap_pfn_range() is needed. UML has to serve it like all other archs do. Signed-off-by: Richard Weinberger Tested-by: Antoine Martin Signed-off-by: Greg Kroah-Hartman --- arch/um/include/asm/pgtable.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'arch') diff --git a/arch/um/include/asm/pgtable.h b/arch/um/include/asm/pgtable.h index aa365c5..5888f1b 100644 --- a/arch/um/include/asm/pgtable.h +++ b/arch/um/include/asm/pgtable.h @@ -69,6 +69,8 @@ extern unsigned long end_iomem; #define PAGE_KERNEL __pgprot(_PAGE_PRESENT | _PAGE_RW | _PAGE_DIRTY | _PAGE_ACCESSED) #define PAGE_KERNEL_EXEC __pgprot(__PAGE_KERNEL_EXEC) +#define io_remap_pfn_range remap_pfn_range + /* * The i386 can't do page protection for execute, and considers that the same * are read. -- cgit v1.1 From 0ffdfdbe55c84906dd65627f069619bec54e5422 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Wed, 5 Jun 2013 11:47:18 -0700 Subject: x86: Fix typo in kexec register clearing commit c8a22d19dd238ede87aa0ac4f7dbea8da039b9c1 upstream. Fixes a typo in register clearing code. Thanks to PaX Team for fixing this originally, and James Troup for pointing it out. Signed-off-by: Kees Cook Link: http://lkml.kernel.org/r/20130605184718.GA8396@www.outflux.net Cc: PaX Team Signed-off-by: H. Peter Anvin Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/relocate_kernel_64.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/x86/kernel/relocate_kernel_64.S b/arch/x86/kernel/relocate_kernel_64.S index 7a6f3b3..f2bb9c9 100644 --- a/arch/x86/kernel/relocate_kernel_64.S +++ b/arch/x86/kernel/relocate_kernel_64.S @@ -160,7 +160,7 @@ identity_mapped: xorq %rbp, %rbp xorq %r8, %r8 xorq %r9, %r9 - xorq %r10, %r9 + xorq %r10, %r10 xorq %r11, %r11 xorq %r12, %r12 xorq %r13, %r13 -- cgit v1.1 From a0631b300bac987a591ae485d8a19a08aa57b4d2 Mon Sep 17 00:00:00 2001 From: Chris Metcalf Date: Sat, 15 Jun 2013 16:47:47 -0400 Subject: tilepro: work around module link error with gcc 4.7 commit 3cb3f839d306443f3d1e79b0bde1a2ad2c12b555 upstream. gcc 4.7.x is emitting calls to __ffsdi2 where previously it used to inline the appropriate ctz instructions. While this needs to be fixed in gcc, it's also easy to avoid having it cause build failures when building with those compilers by exporting __ffsdi2 to modules. Signed-off-by: Chris Metcalf Signed-off-by: Greg Kroah-Hartman --- arch/tile/lib/exports.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'arch') diff --git a/arch/tile/lib/exports.c b/arch/tile/lib/exports.c index 49284fa..0996ef7 100644 --- a/arch/tile/lib/exports.c +++ b/arch/tile/lib/exports.c @@ -89,4 +89,6 @@ uint64_t __ashrdi3(uint64_t, unsigned int); EXPORT_SYMBOL(__ashrdi3); uint64_t __ashldi3(uint64_t, unsigned int); EXPORT_SYMBOL(__ashldi3); +int __ffsdi2(uint64_t); +EXPORT_SYMBOL(__ffsdi2); #endif -- cgit v1.1 From 1819a873d94cd7abeb94f235175052f72fe6fa2c Mon Sep 17 00:00:00 2001 From: "Zhanghaoyu (A)" Date: Fri, 14 Jun 2013 07:36:13 +0000 Subject: KVM: x86: remove vcpu's CPL check in host-invoked XCR set commit 764bcbc5a6d7a2f3e75c9f0e4caa984e2926e346 upstream. __kvm_set_xcr function does the CPL check when set xcr. __kvm_set_xcr is called in two flows, one is invoked by guest, call stack shown as below, handle_xsetbv(or xsetbv_interception) kvm_set_xcr __kvm_set_xcr the other one is invoked by host, for example during system reset: kvm_arch_vcpu_ioctl kvm_vcpu_ioctl_x86_set_xcrs __kvm_set_xcr The former does need the CPL check, but the latter does not. Signed-off-by: Zhang Haoyu [Tweaks to commit message. - Paolo] Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/x86.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) (limited to 'arch') diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 15e79a6..34afae8 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -548,8 +548,6 @@ int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr) if (index != XCR_XFEATURE_ENABLED_MASK) return 1; xcr0 = xcr; - if (kvm_x86_ops->get_cpl(vcpu) != 0) - return 1; if (!(xcr0 & XSTATE_FP)) return 1; if ((xcr0 & XSTATE_YMM) && !(xcr0 & XSTATE_SSE)) @@ -563,7 +561,8 @@ int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr) int kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr) { - if (__kvm_set_xcr(vcpu, index, xcr)) { + if (kvm_x86_ops->get_cpl(vcpu) != 0 || + __kvm_set_xcr(vcpu, index, xcr)) { kvm_inject_gp(vcpu, 0); return 1; } -- cgit v1.1 From a55f7be46db3b6cdeb23d99bbd915aa285521de2 Mon Sep 17 00:00:00 2001 From: Laszlo Ersek Date: Tue, 18 Oct 2011 22:42:59 +0200 Subject: xen/time: remove blocked time accounting from xen "clockchip" commit 0b0c002c340e78173789f8afaa508070d838cf3d upstream. ... because the "clock_event_device framework" already accounts for idle time through the "event_handler" function pointer in xen_timer_interrupt(). The patch is intended as the completion of [1]. It should fix the double idle times seen in PV guests' /proc/stat [2]. It should be orthogonal to stolen time accounting (the removed code seems to be isolated). The approach may be completely misguided. [1] https://lkml.org/lkml/2011/10/6/10 [2] http://lists.xensource.com/archives/html/xen-devel/2010-08/msg01068.html John took the time to retest this patch on top of v3.10 and reported: "idle time is correctly incremented for pv and hvm for the normal case, nohz=off and nohz=idle." so lets put this patch in. Signed-off-by: Laszlo Ersek Signed-off-by: John Haxby Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Greg Kroah-Hartman --- arch/x86/xen/time.c | 17 ++--------------- 1 file changed, 2 insertions(+), 15 deletions(-) (limited to 'arch') diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c index 4b0fb29..19568a0 100644 --- a/arch/x86/xen/time.c +++ b/arch/x86/xen/time.c @@ -36,9 +36,8 @@ static DEFINE_PER_CPU(struct vcpu_runstate_info, xen_runstate); /* snapshots of runstate info */ static DEFINE_PER_CPU(struct vcpu_runstate_info, xen_runstate_snapshot); -/* unused ns of stolen and blocked time */ +/* unused ns of stolen time */ static DEFINE_PER_CPU(u64, xen_residual_stolen); -static DEFINE_PER_CPU(u64, xen_residual_blocked); /* return an consistent snapshot of 64-bit time/counter value */ static u64 get64(const u64 *p) @@ -115,7 +114,7 @@ static void do_stolen_accounting(void) { struct vcpu_runstate_info state; struct vcpu_runstate_info *snap; - s64 blocked, runnable, offline, stolen; + s64 runnable, offline, stolen; cputime_t ticks; get_runstate_snapshot(&state); @@ -125,7 +124,6 @@ static void do_stolen_accounting(void) snap = &__get_cpu_var(xen_runstate_snapshot); /* work out how much time the VCPU has not been runn*ing* */ - blocked = state.time[RUNSTATE_blocked] - snap->time[RUNSTATE_blocked]; runnable = state.time[RUNSTATE_runnable] - snap->time[RUNSTATE_runnable]; offline = state.time[RUNSTATE_offline] - snap->time[RUNSTATE_offline]; @@ -141,17 +139,6 @@ static void do_stolen_accounting(void) ticks = iter_div_u64_rem(stolen, NS_PER_TICK, &stolen); __this_cpu_write(xen_residual_stolen, stolen); account_steal_ticks(ticks); - - /* Add the appropriate number of ticks of blocked time, - including any left-overs from last time. */ - blocked += __this_cpu_read(xen_residual_blocked); - - if (blocked < 0) - blocked = 0; - - ticks = iter_div_u64_rem(blocked, NS_PER_TICK, &blocked); - __this_cpu_write(xen_residual_blocked, blocked); - account_idle_ticks(ticks); } /* Get the TSC speed from Xen */ -- cgit v1.1 From cd8bca6fe4862f5af7244a5f5e4b08788ccaff11 Mon Sep 17 00:00:00 2001 From: Jed Davis Date: Thu, 20 Jun 2013 10:16:29 +0100 Subject: ARM: 7765/1: perf: Record the user-mode PC in the call chain. commit c5f927a6f62196226915f12194c9d0df4e2210d7 upstream. With this change, we no longer lose the innermost entry in the user-mode part of the call chain. See also the x86 port, which includes the ip. It's possible to partially work around this problem by post-processing the data to use the PERF_SAMPLE_IP value, but this works only if the CPU wasn't in the kernel when the sample was taken. Signed-off-by: Jed Davis Signed-off-by: Will Deacon Signed-off-by: Russell King Signed-off-by: Greg Kroah-Hartman --- arch/arm/kernel/perf_event.c | 1 + 1 file changed, 1 insertion(+) (limited to 'arch') diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c index 2b5b142..75373a9 100644 --- a/arch/arm/kernel/perf_event.c +++ b/arch/arm/kernel/perf_event.c @@ -741,6 +741,7 @@ perf_callchain_user(struct perf_callchain_entry *entry, struct pt_regs *regs) struct frame_tail __user *tail; + perf_callchain_store(entry, regs->ARM_pc); tail = (struct frame_tail __user *)regs->ARM_fp - 1; while ((entry->nr < PERF_MAX_STACK_DEPTH) && -- cgit v1.1 From 00c218981b362e8dbfd624ebf0c874bb1bc9df04 Mon Sep 17 00:00:00 2001 From: Olivier DANET Date: Wed, 10 Jul 2013 13:56:10 -0700 Subject: sparc32: vm_area_struct access for old Sun SPARCs. upstream commit 961246b4ed8da3bcf4ee1eb9147f341013553e3c Commit e4c6bfd2d79d063017ab19a18915f0bc759f32d9 ("mm: rearrange vm_area_struct for fewer cache misses") changed the layout of the vm_area_struct structure, it broke several SPARC32 assembly routines which used numerical constants for accessing the vm_mm field. This patch defines the VMA_VM_MM constant to replace the immediate values. Signed-off-by: Olivier DANET Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- arch/sparc/kernel/asm-offsets.c | 2 ++ arch/sparc/mm/hypersparc.S | 8 ++++---- arch/sparc/mm/swift.S | 8 ++++---- arch/sparc/mm/tsunami.S | 6 +++--- arch/sparc/mm/viking.S | 10 +++++----- 5 files changed, 18 insertions(+), 16 deletions(-) (limited to 'arch') diff --git a/arch/sparc/kernel/asm-offsets.c b/arch/sparc/kernel/asm-offsets.c index 68f7e11..ce48203 100644 --- a/arch/sparc/kernel/asm-offsets.c +++ b/arch/sparc/kernel/asm-offsets.c @@ -34,6 +34,8 @@ int foo(void) DEFINE(AOFF_task_thread, offsetof(struct task_struct, thread)); BLANK(); DEFINE(AOFF_mm_context, offsetof(struct mm_struct, context)); + BLANK(); + DEFINE(VMA_VM_MM, offsetof(struct vm_area_struct, vm_mm)); /* DEFINE(NUM_USER_SEGMENTS, TASK_SIZE>>28); */ return 0; diff --git a/arch/sparc/mm/hypersparc.S b/arch/sparc/mm/hypersparc.S index 44aad32..969f964 100644 --- a/arch/sparc/mm/hypersparc.S +++ b/arch/sparc/mm/hypersparc.S @@ -74,7 +74,7 @@ hypersparc_flush_cache_mm_out: /* The things we do for performance... */ hypersparc_flush_cache_range: - ld [%o0 + 0x0], %o0 /* XXX vma->vm_mm, GROSS XXX */ + ld [%o0 + VMA_VM_MM], %o0 #ifndef CONFIG_SMP ld [%o0 + AOFF_mm_context], %g1 cmp %g1, -1 @@ -163,7 +163,7 @@ hypersparc_flush_cache_range_out: */ /* Verified, my ass... */ hypersparc_flush_cache_page: - ld [%o0 + 0x0], %o0 /* XXX vma->vm_mm, GROSS XXX */ + ld [%o0 + VMA_VM_MM], %o0 ld [%o0 + AOFF_mm_context], %g2 #ifndef CONFIG_SMP cmp %g2, -1 @@ -284,7 +284,7 @@ hypersparc_flush_tlb_mm_out: sta %g5, [%g1] ASI_M_MMUREGS hypersparc_flush_tlb_range: - ld [%o0 + 0x00], %o0 /* XXX vma->vm_mm GROSS XXX */ + ld [%o0 + VMA_VM_MM], %o0 mov SRMMU_CTX_REG, %g1 ld [%o0 + AOFF_mm_context], %o3 lda [%g1] ASI_M_MMUREGS, %g5 @@ -307,7 +307,7 @@ hypersparc_flush_tlb_range_out: sta %g5, [%g1] ASI_M_MMUREGS hypersparc_flush_tlb_page: - ld [%o0 + 0x00], %o0 /* XXX vma->vm_mm GROSS XXX */ + ld [%o0 + VMA_VM_MM], %o0 mov SRMMU_CTX_REG, %g1 ld [%o0 + AOFF_mm_context], %o3 andn %o1, (PAGE_SIZE - 1), %o1 diff --git a/arch/sparc/mm/swift.S b/arch/sparc/mm/swift.S index c801c39..5d2b88d 100644 --- a/arch/sparc/mm/swift.S +++ b/arch/sparc/mm/swift.S @@ -105,7 +105,7 @@ swift_flush_cache_mm_out: .globl swift_flush_cache_range swift_flush_cache_range: - ld [%o0 + 0x0], %o0 /* XXX vma->vm_mm, GROSS XXX */ + ld [%o0 + VMA_VM_MM], %o0 sub %o2, %o1, %o2 sethi %hi(4096), %o3 cmp %o2, %o3 @@ -116,7 +116,7 @@ swift_flush_cache_range: .globl swift_flush_cache_page swift_flush_cache_page: - ld [%o0 + 0x0], %o0 /* XXX vma->vm_mm, GROSS XXX */ + ld [%o0 + VMA_VM_MM], %o0 70: ld [%o0 + AOFF_mm_context], %g2 cmp %g2, -1 @@ -219,7 +219,7 @@ swift_flush_sig_insns: .globl swift_flush_tlb_range .globl swift_flush_tlb_all swift_flush_tlb_range: - ld [%o0 + 0x00], %o0 /* XXX vma->vm_mm GROSS XXX */ + ld [%o0 + VMA_VM_MM], %o0 swift_flush_tlb_mm: ld [%o0 + AOFF_mm_context], %g2 cmp %g2, -1 @@ -233,7 +233,7 @@ swift_flush_tlb_all_out: .globl swift_flush_tlb_page swift_flush_tlb_page: - ld [%o0 + 0x00], %o0 /* XXX vma->vm_mm GROSS XXX */ + ld [%o0 + VMA_VM_MM], %o0 mov SRMMU_CTX_REG, %g1 ld [%o0 + AOFF_mm_context], %o3 andn %o1, (PAGE_SIZE - 1), %o1 diff --git a/arch/sparc/mm/tsunami.S b/arch/sparc/mm/tsunami.S index 4e55e8f..bf10a34 100644 --- a/arch/sparc/mm/tsunami.S +++ b/arch/sparc/mm/tsunami.S @@ -24,7 +24,7 @@ /* Sliiick... */ tsunami_flush_cache_page: tsunami_flush_cache_range: - ld [%o0 + 0x0], %o0 /* XXX vma->vm_mm, GROSS XXX */ + ld [%o0 + VMA_VM_MM], %o0 tsunami_flush_cache_mm: ld [%o0 + AOFF_mm_context], %g2 cmp %g2, -1 @@ -46,7 +46,7 @@ tsunami_flush_sig_insns: /* More slick stuff... */ tsunami_flush_tlb_range: - ld [%o0 + 0x00], %o0 /* XXX vma->vm_mm GROSS XXX */ + ld [%o0 + VMA_VM_MM], %o0 tsunami_flush_tlb_mm: ld [%o0 + AOFF_mm_context], %g2 cmp %g2, -1 @@ -65,7 +65,7 @@ tsunami_flush_tlb_out: /* This one can be done in a fine grained manner... */ tsunami_flush_tlb_page: - ld [%o0 + 0x00], %o0 /* XXX vma->vm_mm GROSS XXX */ + ld [%o0 + VMA_VM_MM], %o0 mov SRMMU_CTX_REG, %g1 ld [%o0 + AOFF_mm_context], %o3 andn %o1, (PAGE_SIZE - 1), %o1 diff --git a/arch/sparc/mm/viking.S b/arch/sparc/mm/viking.S index 6dfcc13..a516372 100644 --- a/arch/sparc/mm/viking.S +++ b/arch/sparc/mm/viking.S @@ -109,7 +109,7 @@ viking_mxcc_flush_page: viking_flush_cache_page: viking_flush_cache_range: #ifndef CONFIG_SMP - ld [%o0 + 0x0], %o0 /* XXX vma->vm_mm, GROSS XXX */ + ld [%o0 + VMA_VM_MM], %o0 #endif viking_flush_cache_mm: #ifndef CONFIG_SMP @@ -149,7 +149,7 @@ viking_flush_tlb_mm: #endif viking_flush_tlb_range: - ld [%o0 + 0x00], %o0 /* XXX vma->vm_mm GROSS XXX */ + ld [%o0 + VMA_VM_MM], %o0 mov SRMMU_CTX_REG, %g1 ld [%o0 + AOFF_mm_context], %o3 lda [%g1] ASI_M_MMUREGS, %g5 @@ -174,7 +174,7 @@ viking_flush_tlb_range: #endif viking_flush_tlb_page: - ld [%o0 + 0x00], %o0 /* XXX vma->vm_mm GROSS XXX */ + ld [%o0 + VMA_VM_MM], %o0 mov SRMMU_CTX_REG, %g1 ld [%o0 + AOFF_mm_context], %o3 lda [%g1] ASI_M_MMUREGS, %g5 @@ -240,7 +240,7 @@ sun4dsmp_flush_tlb_range: tst %g5 bne 3f mov SRMMU_CTX_REG, %g1 - ld [%o0 + 0x00], %o0 /* XXX vma->vm_mm GROSS XXX */ + ld [%o0 + VMA_VM_MM], %o0 ld [%o0 + AOFF_mm_context], %o3 lda [%g1] ASI_M_MMUREGS, %g5 sethi %hi(~((1 << SRMMU_PGDIR_SHIFT) - 1)), %o4 @@ -266,7 +266,7 @@ sun4dsmp_flush_tlb_page: tst %g5 bne 2f mov SRMMU_CTX_REG, %g1 - ld [%o0 + 0x00], %o0 /* XXX vma->vm_mm GROSS XXX */ + ld [%o0 + VMA_VM_MM], %o0 ld [%o0 + AOFF_mm_context], %o3 lda [%g1] ASI_M_MMUREGS, %g5 and %o1, PAGE_MASK, %o1 -- cgit v1.1 From b37c61632db280b4e831dc2431a73ca045bc7e42 Mon Sep 17 00:00:00 2001 From: bob picco Date: Tue, 11 Jun 2013 14:54:51 -0400 Subject: sparc64 address-congruence property Upstream commit 771a37ff4d80b80db3b0df3e7696f14b298c67b7 The Machine Description (MD) property "address-congruence-offset" is optional. According to the MD specification the value is assumed 0UL when not present. This caused early boot failure on T5. Signed-off-by: Bob Picco CC: sparclinux@vger.kernel.org Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- arch/sparc/mm/init_64.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c index 6ff4d78..b4989f9 100644 --- a/arch/sparc/mm/init_64.c +++ b/arch/sparc/mm/init_64.c @@ -1071,7 +1071,14 @@ static int __init grab_mblocks(struct mdesc_handle *md) m->size = *val; val = mdesc_get_property(md, node, "address-congruence-offset", NULL); - m->offset = *val; + + /* The address-congruence-offset property is optional. + * Explicity zero it be identifty this. + */ + if (val) + m->offset = *val; + else + m->offset = 0UL; numadbg("MBLOCK[%d]: base[%llx] size[%llx] offset[%llx]\n", count - 1, m->base, m->size, m->offset); -- cgit v1.1 From 519d018ae15412bd501598872300d4c883197b44 Mon Sep 17 00:00:00 2001 From: Dave Kleikamp Date: Tue, 18 Jun 2013 09:05:36 -0500 Subject: sparc: tsb must be flushed before tlb upstream commit 23a01138efe216f8084cfaa74b0b90dd4b097441 This fixes a race where a cpu may re-load a tlb from a stale tsb right after it has been flushed by a remote function call. I still see some instability when stressing the system with parallel kernel builds while creating memory pressure by writing to /proc/sys/vm/nr_hugepages, but this patch improves the stability significantly. Signed-off-by: Dave Kleikamp Acked-by: Bob Picco Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- arch/sparc/mm/tlb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/sparc/mm/tlb.c b/arch/sparc/mm/tlb.c index afd021e..072f553 100644 --- a/arch/sparc/mm/tlb.c +++ b/arch/sparc/mm/tlb.c @@ -115,8 +115,8 @@ no_cache_flush: } if (!tb->active) { - global_flush_tlb_page(mm, vaddr); flush_tsb_user_page(mm, vaddr); + global_flush_tlb_page(mm, vaddr); goto out; } -- cgit v1.1 From 2a20b17ba9f0636e757ecbdbd79d460ff1fde0d0 Mon Sep 17 00:00:00 2001 From: Anton Blanchard Date: Mon, 15 Jul 2013 14:04:50 +1000 Subject: powerpc/modules: Module CRC relocation fix causes perf issues commit 0e0ed6406e61434d3f38fb58aa8464ec4722b77e upstream. Module CRCs are implemented as absolute symbols that get resolved by a linker script. We build an intermediate .o that contains an unresolved symbol for each CRC. genksysms parses this .o, calculates the CRCs and writes a linker script that "resolves" the symbols to the calculated CRC. Unfortunately the ppc64 relocatable kernel sees these CRCs as symbols that need relocating and relocates them at boot. Commit d4703aef (module: handle ppc64 relocating kcrctabs when CONFIG_RELOCATABLE=y) added a hook to reverse the bogus relocations. Part of this patch created a symbol at 0x0: # head -2 /proc/kallsyms 0000000000000000 T reloc_start c000000000000000 T .__start This reloc_start symbol is causing lots of confusion to perf. It thinks reloc_start is a massive function that stretches from 0x0 to 0xc000000000000000 and we get various cryptic errors out of perf, including: problem incrementing symbol count, skipping event This patch removes the reloc_start linker script label and instead defines it as PHYSICAL_START. We also need to wrap it with CONFIG_PPC64 because the ppc32 kernel can set a non zero PHYSICAL_START at compile time and we wouldn't want to subtract it from the CRCs in that case. Signed-off-by: Anton Blanchard Acked-by: Rusty Russell Signed-off-by: Benjamin Herrenschmidt Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/include/asm/module.h | 5 ++--- arch/powerpc/kernel/vmlinux.lds.S | 3 --- 2 files changed, 2 insertions(+), 6 deletions(-) (limited to 'arch') diff --git a/arch/powerpc/include/asm/module.h b/arch/powerpc/include/asm/module.h index 0192a4e..80de64b 100644 --- a/arch/powerpc/include/asm/module.h +++ b/arch/powerpc/include/asm/module.h @@ -87,10 +87,9 @@ struct exception_table_entry; void sort_ex_table(struct exception_table_entry *start, struct exception_table_entry *finish); -#ifdef CONFIG_MODVERSIONS +#if defined(CONFIG_MODVERSIONS) && defined(CONFIG_PPC64) #define ARCH_RELOCATES_KCRCTAB - -extern const unsigned long reloc_start[]; +#define reloc_start PHYSICAL_START #endif #endif /* __KERNEL__ */ #endif /* _ASM_POWERPC_MODULE_H */ diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S index 920276c..3e8fe4b 100644 --- a/arch/powerpc/kernel/vmlinux.lds.S +++ b/arch/powerpc/kernel/vmlinux.lds.S @@ -38,9 +38,6 @@ jiffies = jiffies_64 + 4; #endif SECTIONS { - . = 0; - reloc_start = .; - . = KERNELBASE; /* -- cgit v1.1 From 9f65bf026312945f8dfd76a2c6573dd0d81488ed Mon Sep 17 00:00:00 2001 From: "H.J. Lu" Date: Fri, 26 Jul 2013 09:11:56 -0700 Subject: x86, fpu: correct the asm constraints for fxsave, unbreak mxcsr.daz MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit eaa5a990191d204ba0f9d35dbe5505ec2cdd1460 upstream. GCC will optimize mxcsr_feature_mask_init in arch/x86/kernel/i387.c: memset(&fx_scratch, 0, sizeof(struct i387_fxsave_struct)); asm volatile("fxsave %0" : : "m" (fx_scratch)); mask = fx_scratch.mxcsr_mask; if (mask == 0) mask = 0x0000ffbf; to memset(&fx_scratch, 0, sizeof(struct i387_fxsave_struct)); asm volatile("fxsave %0" : : "m" (fx_scratch)); mask = 0x0000ffbf; since asm statement doesn’t say it will update fx_scratch. As the result, the DAZ bit will be cleared. This patch fixes it. This bug dates back to at least kernel 2.6.12. Signed-off-by: H. Peter Anvin Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/i387.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/x86/kernel/i387.c b/arch/x86/kernel/i387.c index 12aff25..f7183ec 100644 --- a/arch/x86/kernel/i387.c +++ b/arch/x86/kernel/i387.c @@ -51,7 +51,7 @@ void __cpuinit mxcsr_feature_mask_init(void) clts(); if (cpu_has_fxsr) { memset(&fx_scratch, 0, sizeof(struct i387_fxsave_struct)); - asm volatile("fxsave %0" : : "m" (fx_scratch)); + asm volatile("fxsave %0" : "+m" (fx_scratch)); mask = fx_scratch.mxcsr_mask; if (mask == 0) mask = 0x0000ffbf; -- cgit v1.1 From bd874f70e245977197782bc0e03c658f3e93573b Mon Sep 17 00:00:00 2001 From: Jesper Nilsson Date: Mon, 24 Oct 2011 11:19:25 +0200 Subject: CRIS: Add _sdata to vmlinux.lds.S commit 473e162eea465e60578edb93341752e7f1c1dacc upstream. Fixes link error: LD vmlinux kernel/built-in.o: In function `core_kernel_data': (.text+0x13e44): undefined reference to `_sdata' Signed-off-by: Jesper Nilsson Cc: Guenter Roeck Cc: Geert Uytterhoeven Signed-off-by: Greg Kroah-Hartman --- arch/cris/kernel/vmlinux.lds.S | 1 + 1 file changed, 1 insertion(+) (limited to 'arch') diff --git a/arch/cris/kernel/vmlinux.lds.S b/arch/cris/kernel/vmlinux.lds.S index a6990cb..a68b983 100644 --- a/arch/cris/kernel/vmlinux.lds.S +++ b/arch/cris/kernel/vmlinux.lds.S @@ -52,6 +52,7 @@ SECTIONS EXCEPTION_TABLE(4) + _sdata = .; RODATA . = ALIGN (4); -- cgit v1.1 From ec982038bd3b0620090e80075be2b5bb5dd26872 Mon Sep 17 00:00:00 2001 From: Sam Ravnborg Date: Sat, 19 May 2012 11:54:11 +0200 Subject: sparc32: add ucmpdi2 commit de36e66d5fa52bc6e2dacd95c701a1762b5308a7 upstream. Based on copy from microblaze add ucmpdi2 implementation. This fixes build of niu driver which failed with: drivers/built-in.o: In function `niu_get_nfc': niu.c:(.text+0x91494): undefined reference to `__ucmpdi2' This driver will never be used on a sparc32 system, but patch added to fix build breakage with all*config builds. Signed-off-by: Sam Ravnborg Signed-off-by: David S. Miller Cc: Guenter Roeck Signed-off-by: Greg Kroah-Hartman --- arch/sparc/lib/Makefile | 2 +- arch/sparc/lib/ucmpdi2.c | 19 +++++++++++++++++++ 2 files changed, 20 insertions(+), 1 deletion(-) create mode 100644 arch/sparc/lib/ucmpdi2.c (limited to 'arch') diff --git a/arch/sparc/lib/Makefile b/arch/sparc/lib/Makefile index a3fc437..f6f5f38 100644 --- a/arch/sparc/lib/Makefile +++ b/arch/sparc/lib/Makefile @@ -15,7 +15,7 @@ lib-$(CONFIG_SPARC32) += divdi3.o udivdi3.o lib-$(CONFIG_SPARC32) += copy_user.o locks.o lib-y += atomic_$(BITS).o lib-$(CONFIG_SPARC32) += lshrdi3.o ashldi3.o -lib-$(CONFIG_SPARC32) += muldi3.o bitext.o cmpdi2.o +lib-$(CONFIG_SPARC32) += muldi3.o bitext.o cmpdi2.o ucmpdi2.o lib-$(CONFIG_SPARC64) += copy_page.o clear_page.o bzero.o lib-$(CONFIG_SPARC64) += csum_copy.o csum_copy_from_user.o csum_copy_to_user.o diff --git a/arch/sparc/lib/ucmpdi2.c b/arch/sparc/lib/ucmpdi2.c new file mode 100644 index 0000000..1e06ed5 --- /dev/null +++ b/arch/sparc/lib/ucmpdi2.c @@ -0,0 +1,19 @@ +#include +#include "libgcc.h" + +word_type __ucmpdi2(unsigned long long a, unsigned long long b) +{ + const DWunion au = {.ll = a}; + const DWunion bu = {.ll = b}; + + if ((unsigned int) au.s.high < (unsigned int) bu.s.high) + return 0; + else if ((unsigned int) au.s.high > (unsigned int) bu.s.high) + return 2; + if ((unsigned int) au.s.low < (unsigned int) bu.s.low) + return 0; + else if ((unsigned int) au.s.low > (unsigned int) bu.s.low) + return 2; + return 1; +} +EXPORT_SYMBOL(__ucmpdi2); -- cgit v1.1 From 3a2f18948f8e7ef5b90c654c09e237027e1e0645 Mon Sep 17 00:00:00 2001 From: "David S. Miller" Date: Sat, 19 May 2012 15:27:01 -0700 Subject: sparc32: Add ucmpdi2.o to obj-y instead of lib-y. commit 74c7b28953d4eaa6a479c187aeafcfc0280da5e8 upstream. Otherwise if no references exist in the static kernel image, we won't export the symbol properly to modules. Signed-off-by: David S. Miller Cc: Guenter Roeck Signed-off-by: Greg Kroah-Hartman --- arch/sparc/lib/Makefile | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'arch') diff --git a/arch/sparc/lib/Makefile b/arch/sparc/lib/Makefile index f6f5f38..4961516 100644 --- a/arch/sparc/lib/Makefile +++ b/arch/sparc/lib/Makefile @@ -15,7 +15,7 @@ lib-$(CONFIG_SPARC32) += divdi3.o udivdi3.o lib-$(CONFIG_SPARC32) += copy_user.o locks.o lib-y += atomic_$(BITS).o lib-$(CONFIG_SPARC32) += lshrdi3.o ashldi3.o -lib-$(CONFIG_SPARC32) += muldi3.o bitext.o cmpdi2.o ucmpdi2.o +lib-$(CONFIG_SPARC32) += muldi3.o bitext.o cmpdi2.o lib-$(CONFIG_SPARC64) += copy_page.o clear_page.o bzero.o lib-$(CONFIG_SPARC64) += csum_copy.o csum_copy_from_user.o csum_copy_to_user.o @@ -40,7 +40,7 @@ lib-$(CONFIG_SPARC64) += copy_in_user.o user_fixup.o memmove.o lib-$(CONFIG_SPARC64) += mcount.o ipcsum.o xor.o hweight.o ffs.o obj-y += iomap.o -obj-$(CONFIG_SPARC32) += atomic32.o +obj-$(CONFIG_SPARC32) += atomic32.o ucmpdi2.o obj-y += ksyms.o obj-$(CONFIG_SPARC64) += PeeCeeI.o obj-y += usercopy.o -- cgit v1.1 From fc1e43e5cbee9f14ee940044d0e4e722370009d2 Mon Sep 17 00:00:00 2001 From: Anton Blanchard Date: Thu, 30 Jun 2011 13:55:27 +0000 Subject: powerpc: Use -mtraceback=no commit af9719c3062dfe216a0c3de3fa52be6d22b4456c upstream. gcc 4.7 will be more strict about parsing the -mtraceback option: gcc: error: unrecognized argument in option '-mtraceback=none' gcc: note: valid arguments to '-mtraceback=' are: full no part gcc used to do a 2 char compare so both "no" and "none" would match. Switch to using -mtraceback=no should work everywhere. Signed-off-by: Anton Blanchard Signed-off-by: Benjamin Herrenschmidt Cc: Guenter Roeck Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile index b7212b6..f1b5251 100644 --- a/arch/powerpc/Makefile +++ b/arch/powerpc/Makefile @@ -67,7 +67,7 @@ LDFLAGS_vmlinux-yy := -Bstatic LDFLAGS_vmlinux-$(CONFIG_PPC64)$(CONFIG_RELOCATABLE) := -pie LDFLAGS_vmlinux := $(LDFLAGS_vmlinux-yy) -CFLAGS-$(CONFIG_PPC64) := -mminimal-toc -mtraceback=none -mcall-aixdesc +CFLAGS-$(CONFIG_PPC64) := -mminimal-toc -mtraceback=no -mcall-aixdesc CFLAGS-$(CONFIG_PPC32) := -ffixed-r2 -mmultiple KBUILD_CPPFLAGS += -Iarch/$(ARCH) KBUILD_AFLAGS += -Iarch/$(ARCH) -- cgit v1.1 From 3fa539e24c5d7077791a1d6bd8bb28bf86bef932 Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Fri, 26 Jul 2013 00:08:25 +0200 Subject: m68k/atari: ARAnyM - Fix NatFeat module support commit e8184e10f89736a23ea6eea8e24cd524c5c513d2 upstream. As pointed out by Andreas Schwab, pointers passed to ARAnyM NatFeat calls should be physical addresses, not virtual addresses. Fortunately on Atari, physical and virtual kernel addresses are the same, as long as normal kernel memory is concerned, so this usually worked fine without conversion. But for modules, pointers to literal strings are located in vmalloc()ed memory. Depending on the version of ARAnyM, this causes the nf_get_id() call to just fail, or worse, crash ARAnyM itself with e.g. Gotcha! Illegal memory access. Atari PC = $968c This is a big issue for distro kernels, who want to have all drivers as loadable modules in an initrd. Add a wrapper for nf_get_id() that copies the literal to the stack to work around this issue. Reported-by: Thorsten Glaser Signed-off-by: Geert Uytterhoeven Signed-off-by: Greg Kroah-Hartman --- arch/m68k/emu/natfeat.c | 23 +++++++++++++++++++---- 1 file changed, 19 insertions(+), 4 deletions(-) (limited to 'arch') diff --git a/arch/m68k/emu/natfeat.c b/arch/m68k/emu/natfeat.c index 2291a7d..fa277ae 100644 --- a/arch/m68k/emu/natfeat.c +++ b/arch/m68k/emu/natfeat.c @@ -18,9 +18,11 @@ #include #include +extern long nf_get_id2(const char *feature_name); + asm("\n" -" .global nf_get_id,nf_call\n" -"nf_get_id:\n" +" .global nf_get_id2,nf_call\n" +"nf_get_id2:\n" " .short 0x7300\n" " rts\n" "nf_call:\n" @@ -29,12 +31,25 @@ asm("\n" "1: moveq.l #0,%d0\n" " rts\n" " .section __ex_table,\"a\"\n" -" .long nf_get_id,1b\n" +" .long nf_get_id2,1b\n" " .long nf_call,1b\n" " .previous"); -EXPORT_SYMBOL_GPL(nf_get_id); EXPORT_SYMBOL_GPL(nf_call); +long nf_get_id(const char *feature_name) +{ + /* feature_name may be in vmalloc()ed memory, so make a copy */ + char name_copy[32]; + size_t n; + + n = strlcpy(name_copy, feature_name, sizeof(name_copy)); + if (n >= sizeof(name_copy)) + return 0; + + return nf_get_id2(name_copy); +} +EXPORT_SYMBOL_GPL(nf_get_id); + void nfprint(const char *fmt, ...) { static char buf[256]; -- cgit v1.1 From 0e69b54fa8b48e3cdc1a78f77beab3af763a33a1 Mon Sep 17 00:00:00 2001 From: Linus Walleij Date: Tue, 6 Sep 2011 07:45:46 +0100 Subject: ARM: 7080/1: l2x0: make sure I&D are not locked down on init commit bac7e6ecf60933b68af910eb4c83a775a8b20b19 upstream. Fighting unfixed U-Boots and other beasts that may the cache in a locked-down state when starting the kernel, we make sure to disable all cache lock-down when initializing the l2x0 so we are in a known state. Reviewed-by: Santosh Shilimkar Reported-by: Jan Rinze Cc: Srinidhi Kasagar Cc: Rabin Vincent Cc: Adrian Bunk Cc: Rob Herring Cc: Catalin Marinas Cc: Will Deacon Tested-by: Robert Marklund Signed-off-by: Linus Walleij Signed-off-by: Russell King Cc: Guenter Roeck Signed-off-by: Greg Kroah-Hartman --- arch/arm/include/asm/hardware/cache-l2x0.h | 9 +++++++-- arch/arm/mm/cache-l2x0.c | 21 +++++++++++++++++++++ 2 files changed, 28 insertions(+), 2 deletions(-) (limited to 'arch') diff --git a/arch/arm/include/asm/hardware/cache-l2x0.h b/arch/arm/include/asm/hardware/cache-l2x0.h index bfa706f..99a6ed7 100644 --- a/arch/arm/include/asm/hardware/cache-l2x0.h +++ b/arch/arm/include/asm/hardware/cache-l2x0.h @@ -45,8 +45,13 @@ #define L2X0_CLEAN_INV_LINE_PA 0x7F0 #define L2X0_CLEAN_INV_LINE_IDX 0x7F8 #define L2X0_CLEAN_INV_WAY 0x7FC -#define L2X0_LOCKDOWN_WAY_D 0x900 -#define L2X0_LOCKDOWN_WAY_I 0x904 +/* + * The lockdown registers repeat 8 times for L310, the L210 has only one + * D and one I lockdown register at 0x0900 and 0x0904. + */ +#define L2X0_LOCKDOWN_WAY_D_BASE 0x900 +#define L2X0_LOCKDOWN_WAY_I_BASE 0x904 +#define L2X0_LOCKDOWN_STRIDE 0x08 #define L2X0_TEST_OPERATION 0xF00 #define L2X0_LINE_DATA 0xF10 #define L2X0_LINE_TAG 0xF30 diff --git a/arch/arm/mm/cache-l2x0.c b/arch/arm/mm/cache-l2x0.c index 44c0867..9ecfdb5 100644 --- a/arch/arm/mm/cache-l2x0.c +++ b/arch/arm/mm/cache-l2x0.c @@ -277,6 +277,25 @@ static void l2x0_disable(void) spin_unlock_irqrestore(&l2x0_lock, flags); } +static void __init l2x0_unlock(__u32 cache_id) +{ + int lockregs; + int i; + + if (cache_id == L2X0_CACHE_ID_PART_L310) + lockregs = 8; + else + /* L210 and unknown types */ + lockregs = 1; + + for (i = 0; i < lockregs; i++) { + writel_relaxed(0x0, l2x0_base + L2X0_LOCKDOWN_WAY_D_BASE + + i * L2X0_LOCKDOWN_STRIDE); + writel_relaxed(0x0, l2x0_base + L2X0_LOCKDOWN_WAY_I_BASE + + i * L2X0_LOCKDOWN_STRIDE); + } +} + void __init l2x0_init(void __iomem *base, __u32 aux_val, __u32 aux_mask) { __u32 aux; @@ -328,6 +347,8 @@ void __init l2x0_init(void __iomem *base, __u32 aux_val, __u32 aux_mask) * accessing the below registers will fault. */ if (!(readl_relaxed(l2x0_base + L2X0_CTRL) & 1)) { + /* Make sure that I&D is not locked down when starting */ + l2x0_unlock(cache_id); /* l2x0 controller is disabled */ writel_relaxed(aux, l2x0_base + L2X0_AUX_CTRL); -- cgit v1.1 From c4e462a085dd8279af22493ad0858d73e0bcafe1 Mon Sep 17 00:00:00 2001 From: Andreas Schwab Date: Fri, 9 Aug 2013 15:14:08 +0200 Subject: m68k: Truncate base in do_div() commit ea077b1b96e073eac5c3c5590529e964767fc5f7 upstream. Explicitly truncate the second operand of do_div() to 32 bits to guard against bogus code calling it with a 64-bit divisor. [Thorsten] After upgrading from 3.2 to 3.10, mounting a btrfs volume fails with: btrfs: setting nodatacow, compression disabled btrfs: enabling auto recovery btrfs: disk space caching is enabled *** ZERO DIVIDE *** FORMAT=2 Current process id is 722 BAD KERNEL TRAP: 00000000 Modules linked in: evdev mac_hid ext4 crc16 jbd2 mbcache btrfs xor lzo_compress zlib_deflate raid6_pq crc32c libcrc32c PC: [<319535b2>] __btrfs_map_block+0x11c/0x119a [btrfs] SR: 2000 SP: 30c1fab4 a2: 30f0faf0 d0: 00000000 d1: 00001000 d2: 00000000 d3: 00000000 d4: 00010000 d5: 00000000 a0: 3085c72c a1: 3085c72c Process mount (pid: 722, task=30f0faf0) Frame format=2 instr addr=319535ae Stack from 30c1faec: 00000000 00000020 00000000 00001000 00000000 01401000 30253928 300ffc00 00a843ac 3026f640 00000000 00010000 0009e250 00d106c0 00011220 00000000 00001000 301c6830 0009e32a 000000ff 00000009 3085c72c 00000000 00000000 30c1fd14 00000000 00000020 00000000 30c1fd14 0009e26c 00000020 00000003 00000000 0009dd8a 300b0b6c 30253928 00a843ac 00001000 00000000 00000000 0000a008 3194e76a 30253928 00a843ac 00001000 00000000 00000000 00000002 Call Trace: [<00001000>] kernel_pg_dir+0x0/0x1000 [...] Code: 222e ff74 2a2e ff5c 2c2e ff60 4c45 1402 <2d40> ff64 2d41 ff68 2205 4c2e 1800 ff68 4c04 0800 2041 d1c0 2206 4c2e 1400 ff68 [Geert] As diagnosed by Andreas, fs/btrfs/volumes.c:__btrfs_map_block() calls do_div(stripe_nr, stripe_len); with stripe_len u64, while do_div() assumes the divisor is a 32-bit number. Due to the lack of truncation in the m68k-specific implementation of do_div(), the division is performed using the upper 32-bit word of stripe_len, which is zero. This was introduced by commit 53b381b3abeb86f12787a6c40fee9b2f71edc23b ("Btrfs: RAID5 and RAID6"), which changed the divisor from map->stripe_len (struct map_lookup.stripe_len is int) to a 64-bit temporary. Reported-by: Thorsten Glaser Signed-off-by: Andreas Schwab Tested-by: Thorsten Glaser Signed-off-by: Geert Uytterhoeven Signed-off-by: Greg Kroah-Hartman --- arch/m68k/include/asm/div64.h | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) (limited to 'arch') diff --git a/arch/m68k/include/asm/div64.h b/arch/m68k/include/asm/div64.h index edb6614..7558032 100644 --- a/arch/m68k/include/asm/div64.h +++ b/arch/m68k/include/asm/div64.h @@ -13,16 +13,17 @@ unsigned long long n64; \ } __n; \ unsigned long __rem, __upper; \ + unsigned long __base = (base); \ \ __n.n64 = (n); \ if ((__upper = __n.n32[0])) { \ asm ("divul.l %2,%1:%0" \ - : "=d" (__n.n32[0]), "=d" (__upper) \ - : "d" (base), "0" (__n.n32[0])); \ + : "=d" (__n.n32[0]), "=d" (__upper) \ + : "d" (__base), "0" (__n.n32[0])); \ } \ asm ("divu.l %2,%1:%0" \ - : "=d" (__n.n32[1]), "=d" (__rem) \ - : "d" (base), "1" (__upper), "0" (__n.n32[1])); \ + : "=d" (__n.n32[1]), "=d" (__rem) \ + : "d" (__base), "1" (__upper), "0" (__n.n32[1])); \ (n) = __n.n64; \ __rem; \ }) -- cgit v1.1 From 850cc18d180176194633acba57ff6bd443086ad9 Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Tue, 17 Jul 2012 15:48:02 -0700 Subject: m32r: consistently use "suffix-$(...)" commit df12aef6a19bb2d69859a94936bda0e6ccaf3327 upstream. Commit a556bec9955c ("m32r: fix arch/m32r/boot/compressed/Makefile") changed "$(suffix_y)" to "$(suffix-y)", but didn't update any location where "suffix_y" is set, causing: make[5]: *** No rule to make target `arch/m32r/boot/compressed/vmlinux.bin.', needed by `arch/m32r/boot/compressed/piggy.o'. Stop. make[4]: *** [arch/m32r/boot/compressed/vmlinux] Error 2 make[3]: *** [zImage] Error 2 Correct the other locations to fix this. Signed-off-by: Geert Uytterhoeven Cc: Hirokazu Takata Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Cc: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- arch/m32r/boot/compressed/Makefile | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'arch') diff --git a/arch/m32r/boot/compressed/Makefile b/arch/m32r/boot/compressed/Makefile index 177716b..01729c2 100644 --- a/arch/m32r/boot/compressed/Makefile +++ b/arch/m32r/boot/compressed/Makefile @@ -43,9 +43,9 @@ endif OBJCOPYFLAGS += -R .empty_zero_page -suffix_$(CONFIG_KERNEL_GZIP) = gz -suffix_$(CONFIG_KERNEL_BZIP2) = bz2 -suffix_$(CONFIG_KERNEL_LZMA) = lzma +suffix-$(CONFIG_KERNEL_GZIP) = gz +suffix-$(CONFIG_KERNEL_BZIP2) = bz2 +suffix-$(CONFIG_KERNEL_LZMA) = lzma $(obj)/piggy.o: $(obj)/vmlinux.scr $(obj)/vmlinux.bin.$(suffix-y) FORCE $(call if_changed,ld) -- cgit v1.1 From 4cfa1966cc4cec7ea37d572eca9f930b09dc3cf2 Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Tue, 17 Jul 2012 15:48:04 -0700 Subject: m32r: add memcpy() for CONFIG_KERNEL_GZIP=y commit a8abbca6617e1caa2344d2d38d0a35f3e5928b79 upstream. Fix the m32r link error: LD arch/m32r/boot/compressed/vmlinux arch/m32r/boot/compressed/misc.o: In function `zlib_updatewindow': misc.c:(.text+0x190): undefined reference to `memcpy' misc.c:(.text+0x190): relocation truncated to fit: R_M32R_26_PLTREL against undefined symbol `memcpy' make[5]: *** [arch/m32r/boot/compressed/vmlinux] Error 1 by adding our own implementation of memcpy(). Signed-off-by: Geert Uytterhoeven Cc: Hirokazu Takata Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Cc: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- arch/m32r/boot/compressed/misc.c | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'arch') diff --git a/arch/m32r/boot/compressed/misc.c b/arch/m32r/boot/compressed/misc.c index 370d608..3147aa2 100644 --- a/arch/m32r/boot/compressed/misc.c +++ b/arch/m32r/boot/compressed/misc.c @@ -39,6 +39,16 @@ static void *memset(void *s, int c, size_t n) #endif #ifdef CONFIG_KERNEL_GZIP +void *memcpy(void *dest, const void *src, size_t n) +{ + char *d = dest; + const char *s = src; + while (n--) + *d++ = *s++; + + return dest; +} + #define BOOT_HEAP_SIZE 0x10000 #include "../../../../lib/decompress_inflate.c" #endif -- cgit v1.1 From 40318c990ef6005a3c9933253e7b63a4f5e06c3a Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Tue, 17 Jul 2012 15:48:05 -0700 Subject: m32r: make memset() global for CONFIG_KERNEL_BZIP2=y commit 9a75c6e5240f7edc5955e8da5b94bde6f96070b3 upstream. Fix the m32r compile error: arch/m32r/boot/compressed/misc.c:31:14: error: static declaration of 'memset' follows non-static declaration make[5]: *** [arch/m32r/boot/compressed/misc.o] Error 1 make[4]: *** [arch/m32r/boot/compressed/vmlinux] Error 2 by removing the static keyword. Signed-off-by: Geert Uytterhoeven Cc: Hirokazu Takata Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Cc: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- arch/m32r/boot/compressed/misc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/m32r/boot/compressed/misc.c b/arch/m32r/boot/compressed/misc.c index 3147aa2..28a0952 100644 --- a/arch/m32r/boot/compressed/misc.c +++ b/arch/m32r/boot/compressed/misc.c @@ -28,7 +28,7 @@ static unsigned long free_mem_ptr; static unsigned long free_mem_end_ptr; #ifdef CONFIG_KERNEL_BZIP2 -static void *memset(void *s, int c, size_t n) +void *memset(void *s, int c, size_t n) { char *ss = s; -- cgit v1.1 From 2ccddb4d6101ea65c3f716ca6546c4d82b767bdf Mon Sep 17 00:00:00 2001 From: Dominik Dingel Date: Fri, 26 Jul 2013 15:04:00 +0200 Subject: KVM: s390: move kvm_guest_enter,exit closer to sie commit 2b29a9fdcb92bfc6b6f4c412d71505869de61a56 upstream. Any uaccess between guest_enter and guest_exit could trigger a page fault, the page fault handler would handle it as a guest fault and translate a user address as guest address. Signed-off-by: Dominik Dingel Signed-off-by: Christian Borntraeger Signed-off-by: Paolo Bonzini [bwh: Backported to 3.2: adjust context and add the rc variable] Signed-off-by: Ben Hutchings Reviewed-by: Dominik Dingel Signed-off-by: Greg Kroah-Hartman --- arch/s390/kvm/kvm-s390.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) (limited to 'arch') diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index f9804b7..1e88eef 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -445,6 +445,8 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu, static void __vcpu_run(struct kvm_vcpu *vcpu) { + int rc; + memcpy(&vcpu->arch.sie_block->gg14, &vcpu->arch.guest_gprs[14], 16); if (need_resched()) @@ -455,21 +457,24 @@ static void __vcpu_run(struct kvm_vcpu *vcpu) kvm_s390_deliver_pending_interrupts(vcpu); + VCPU_EVENT(vcpu, 6, "entering sie flags %x", + atomic_read(&vcpu->arch.sie_block->cpuflags)); + vcpu->arch.sie_block->icptcode = 0; local_irq_disable(); kvm_guest_enter(); local_irq_enable(); - VCPU_EVENT(vcpu, 6, "entering sie flags %x", - atomic_read(&vcpu->arch.sie_block->cpuflags)); - if (sie64a(vcpu->arch.sie_block, vcpu->arch.guest_gprs)) { + rc = sie64a(vcpu->arch.sie_block, vcpu->arch.guest_gprs); + local_irq_disable(); + kvm_guest_exit(); + local_irq_enable(); + + if (rc) { VCPU_EVENT(vcpu, 3, "%s", "fault in sie instruction"); kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING); } VCPU_EVENT(vcpu, 6, "exit sie icptcode %d", vcpu->arch.sie_block->icptcode); - local_irq_disable(); - kvm_guest_exit(); - local_irq_enable(); memcpy(&vcpu->arch.guest_gprs[14], &vcpu->arch.sie_block->gg14, 16); } -- cgit v1.1 From 7b900d1daf22341794f5fd7a0ec1fe97966b8590 Mon Sep 17 00:00:00 2001 From: Anton Blanchard Date: Wed, 7 Aug 2013 02:01:19 +1000 Subject: powerpc: Handle unaligned ldbrx/stdbrx commit 230aef7a6a23b6166bd4003bfff5af23c9bd381f upstream. Normally when we haven't implemented an alignment handler for a load or store instruction the process will be terminated. The alignment handler uses the DSISR (or a pseudo one) to locate the right handler. Unfortunately ldbrx and stdbrx overlap lfs and stfs so we incorrectly think ldbrx is an lfs and stdbrx is an stfs. This bug is particularly nasty - instead of terminating the process we apply an incorrect fixup and continue on. With more and more overlapping instructions we should stop creating a pseudo DSISR and index using the instruction directly, but for now add a special case to catch ldbrx/stdbrx. Signed-off-by: Anton Blanchard Signed-off-by: Benjamin Herrenschmidt Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/kernel/align.c | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'arch') diff --git a/arch/powerpc/kernel/align.c b/arch/powerpc/kernel/align.c index 8184ee9..3fcbae0 100644 --- a/arch/powerpc/kernel/align.c +++ b/arch/powerpc/kernel/align.c @@ -764,6 +764,16 @@ int fix_alignment(struct pt_regs *regs) nb = aligninfo[instr].len; flags = aligninfo[instr].flags; + /* ldbrx/stdbrx overlap lfs/stfs in the DSISR unfortunately */ + if (IS_XFORM(instruction) && ((instruction >> 1) & 0x3ff) == 532) { + nb = 8; + flags = LD+SW; + } else if (IS_XFORM(instruction) && + ((instruction >> 1) & 0x3ff) == 660) { + nb = 8; + flags = ST+SW; + } + /* Byteswap little endian loads and stores */ swiz = 0; if (regs->msr & MSR_LE) { -- cgit v1.1 From 4595b6def019ab1324b3948dfbaa959963a132e8 Mon Sep 17 00:00:00 2001 From: Peter Maydell Date: Thu, 22 Aug 2013 17:47:50 +0100 Subject: ARM: PCI: versatile: Fix SMAP register offsets commit 99f2b130370b904ca5300079243fdbcafa2c708b upstream. The SMAP register offsets in the versatile PCI controller code were all off by four. (This didn't have any observable bad effects because on this board PHYS_OFFSET is zero, and (a) writing zero to the flags register at offset 0x10 has no effect and (b) the reset value of the SMAP register is zero anyway, so failing to write SMAP2 didn't matter.) Signed-off-by: Peter Maydell Reviewed-by: Linus Walleij Signed-off-by: Kevin Hilman Signed-off-by: Greg Kroah-Hartman --- arch/arm/mach-versatile/pci.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'arch') diff --git a/arch/arm/mach-versatile/pci.c b/arch/arm/mach-versatile/pci.c index 13c7e5f..3f47259 100644 --- a/arch/arm/mach-versatile/pci.c +++ b/arch/arm/mach-versatile/pci.c @@ -43,9 +43,9 @@ #define PCI_IMAP0 __IO_ADDRESS(VERSATILE_PCI_CORE_BASE+0x0) #define PCI_IMAP1 __IO_ADDRESS(VERSATILE_PCI_CORE_BASE+0x4) #define PCI_IMAP2 __IO_ADDRESS(VERSATILE_PCI_CORE_BASE+0x8) -#define PCI_SMAP0 __IO_ADDRESS(VERSATILE_PCI_CORE_BASE+0x10) -#define PCI_SMAP1 __IO_ADDRESS(VERSATILE_PCI_CORE_BASE+0x14) -#define PCI_SMAP2 __IO_ADDRESS(VERSATILE_PCI_CORE_BASE+0x18) +#define PCI_SMAP0 __IO_ADDRESS(VERSATILE_PCI_CORE_BASE+0x14) +#define PCI_SMAP1 __IO_ADDRESS(VERSATILE_PCI_CORE_BASE+0x18) +#define PCI_SMAP2 __IO_ADDRESS(VERSATILE_PCI_CORE_BASE+0x1c) #define PCI_SELFID __IO_ADDRESS(VERSATILE_PCI_CORE_BASE+0xc) #define DEVICE_ID_OFFSET 0x00 -- cgit v1.1 From 58f5bc0c124fb8338e91c8a4110ad64259632dd9 Mon Sep 17 00:00:00 2001 From: Masoud Sharbiani Date: Fri, 20 Sep 2013 15:59:07 -0700 Subject: x86/reboot: Add quirk to make Dell C6100 use reboot=pci automatically commit 4f0acd31c31f03ba42494c8baf6c0465150e2621 upstream. Dell PowerEdge C6100 machines fail to completely reboot about 20% of the time. Signed-off-by: Masoud Sharbiani Signed-off-by: Vinson Lee Cc: Robin Holt Cc: Russell King Cc: Guan Xuetao Link: http://lkml.kernel.org/r/1379717947-18042-1-git-send-email-vlee@freedesktop.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/reboot.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) (limited to 'arch') diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c index 89d6877..282c98f 100644 --- a/arch/x86/kernel/reboot.c +++ b/arch/x86/kernel/reboot.c @@ -460,6 +460,22 @@ static struct dmi_system_id __initdata pci_reboot_dmi_table[] = { DMI_MATCH(DMI_PRODUCT_NAME, "Precision M6600"), }, }, + { /* Handle problems with rebooting on the Dell PowerEdge C6100. */ + .callback = set_pci_reboot, + .ident = "Dell PowerEdge C6100", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."), + DMI_MATCH(DMI_PRODUCT_NAME, "C6100"), + }, + }, + { /* Some C6100 machines were shipped with vendor being 'Dell'. */ + .callback = set_pci_reboot, + .ident = "Dell PowerEdge C6100", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Dell"), + DMI_MATCH(DMI_PRODUCT_NAME, "C6100"), + }, + }, { } }; -- cgit v1.1 From e8cf7dd6baa2ac1817ab4a8ef92f2b6791254870 Mon Sep 17 00:00:00 2001 From: Josh Boyer Date: Thu, 18 Apr 2013 07:51:34 -0700 Subject: x86, efi: Don't map Boot Services on i386 commit 700870119f49084da004ab588ea2b799689efaf7 upstream. Add patch to fix 32bit EFI service mapping (rhbz 726701) Multiple people are reporting hitting the following WARNING on i386, WARNING: at arch/x86/mm/ioremap.c:102 __ioremap_caller+0x3d3/0x440() Modules linked in: Pid: 0, comm: swapper Not tainted 3.9.0-rc7+ #95 Call Trace: [] warn_slowpath_common+0x5f/0x80 [] ? __ioremap_caller+0x3d3/0x440 [] ? __ioremap_caller+0x3d3/0x440 [] warn_slowpath_null+0x1d/0x20 [] __ioremap_caller+0x3d3/0x440 [] ? get_usage_chars+0xfb/0x110 [] ? vprintk_emit+0x147/0x480 [] ? efi_enter_virtual_mode+0x1e4/0x3de [] ioremap_cache+0x1a/0x20 [] ? efi_enter_virtual_mode+0x1e4/0x3de [] efi_enter_virtual_mode+0x1e4/0x3de [] start_kernel+0x286/0x2f4 [] ? repair_env_string+0x51/0x51 [] i386_start_kernel+0x12c/0x12f Due to the workaround described in commit 916f676f8 ("x86, efi: Retain boot service code until after switching to virtual mode") EFI Boot Service regions are mapped for a period during boot. Unfortunately, with the limited size of the i386 direct kernel map it's possible that some of the Boot Service regions will not be directly accessible, which causes them to be ioremap()'d, triggering the above warning as the regions are marked as E820_RAM in the e820 memmap. There are currently only two situations where we need to map EFI Boot Service regions, 1. To workaround the firmware bug described in 916f676f8 2. To access the ACPI BGRT image but since we haven't seen an i386 implementation that requires either, this simple fix should suffice for now. [ Added to changelog - Matt ] Reported-by: Bryan O'Donoghue Acked-by: Tom Zanussi Acked-by: Darren Hart Cc: Josh Triplett Cc: Matthew Garrett Cc: H. Peter Anvin Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Greg Kroah-Hartman Signed-off-by: Josh Boyer Signed-off-by: Matt Fleming Signed-off-by: Greg Kroah-Hartman --- arch/x86/platform/efi/efi.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) (limited to 'arch') diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c index 899e393..86272f0 100644 --- a/arch/x86/platform/efi/efi.c +++ b/arch/x86/platform/efi/efi.c @@ -588,10 +588,13 @@ void __init efi_enter_virtual_mode(void) for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) { md = p; - if (!(md->attribute & EFI_MEMORY_RUNTIME) && - md->type != EFI_BOOT_SERVICES_CODE && - md->type != EFI_BOOT_SERVICES_DATA) - continue; + if (!(md->attribute & EFI_MEMORY_RUNTIME)) { +#ifdef CONFIG_X86_64 + if (md->type != EFI_BOOT_SERVICES_CODE && + md->type != EFI_BOOT_SERVICES_DATA) +#endif + continue; + } size = md->num_pages << EFI_PAGE_SHIFT; end = md->phys_addr + size; -- cgit v1.1 From 4067bddb238b1f8d91add21ea38ae2cd32c1acac Mon Sep 17 00:00:00 2001 From: Nishanth Aravamudan Date: Tue, 1 Oct 2013 14:04:53 -0700 Subject: powerpc/iommu: Use GFP_KERNEL instead of GFP_ATOMIC in iommu_init_table() commit 1cf389df090194a0976dc867b7fffe99d9d490cb upstream. Under heavy (DLPAR?) stress, we tripped this panic() in arch/powerpc/kernel/iommu.c::iommu_init_table(): page = alloc_pages_node(nid, GFP_ATOMIC, get_order(sz)); if (!page) panic("iommu_init_table: Can't allocate %ld bytes\n", sz); Before the panic() we got a page allocation failure for an order-2 allocation. There appears to be memory free, but perhaps not in the ATOMIC context. I looked through all the call-sites of iommu_init_table() and didn't see any obvious reason to need an ATOMIC allocation. Most call-sites in fact have an explicit GFP_KERNEL allocation shortly before the call to iommu_init_table(), indicating we are not in an atomic context. There is some indirection for some paths, but I didn't see any locks indicating that GFP_KERNEL is inappropriate. With this change under the same conditions, we have not been able to reproduce the panic. Signed-off-by: Nishanth Aravamudan Signed-off-by: Benjamin Herrenschmidt Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/kernel/iommu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c index 961bb03..795e807 100644 --- a/arch/powerpc/kernel/iommu.c +++ b/arch/powerpc/kernel/iommu.c @@ -495,7 +495,7 @@ struct iommu_table *iommu_init_table(struct iommu_table *tbl, int nid) /* number of bytes needed for the bitmap */ sz = (tbl->it_size + 7) >> 3; - page = alloc_pages_node(nid, GFP_ATOMIC, get_order(sz)); + page = alloc_pages_node(nid, GFP_KERNEL, get_order(sz)); if (!page) panic("iommu_init_table: Can't allocate %ld bytes\n", sz); tbl->it_map = page_address(page); -- cgit v1.1 From 46779b3c9f75cb80573a1ceb82b16b831bfb349c Mon Sep 17 00:00:00 2001 From: Prarit Bhargava Date: Mon, 23 Sep 2013 09:33:36 -0400 Subject: powerpc/vio: Fix modalias_show return values commit e82b89a6f19bae73fb064d1b3dd91fcefbb478f4 upstream. modalias_show() should return an empty string on error, not -ENODEV. This causes the following false and annoying error: > find /sys/devices -name modalias -print0 | xargs -0 cat >/dev/null cat: /sys/devices/vio/4000/modalias: No such device cat: /sys/devices/vio/4001/modalias: No such device cat: /sys/devices/vio/4002/modalias: No such device cat: /sys/devices/vio/4004/modalias: No such device cat: /sys/devices/vio/modalias: No such device Signed-off-by: Prarit Bhargava Signed-off-by: Benjamin Herrenschmidt Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/kernel/vio.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) (limited to 'arch') diff --git a/arch/powerpc/kernel/vio.c b/arch/powerpc/kernel/vio.c index 1b695fd..c9f2ac8 100644 --- a/arch/powerpc/kernel/vio.c +++ b/arch/powerpc/kernel/vio.c @@ -1345,11 +1345,15 @@ static ssize_t modalias_show(struct device *dev, struct device_attribute *attr, const char *cp; dn = dev->of_node; - if (!dn) - return -ENODEV; + if (!dn) { + strcat(buf, "\n"); + return strlen(buf); + } cp = of_get_property(dn, "compatible", NULL); - if (!cp) - return -ENODEV; + if (!cp) { + strcat(buf, "\n"); + return strlen(buf); + } return sprintf(buf, "vio:T%sS%s\n", vio_dev->type, cp); } -- cgit v1.1 From a821af3f7d73022d45550200241e6e671127ec81 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Tue, 1 Oct 2013 16:54:05 +1000 Subject: powerpc: Fix parameter clobber in csum_partial_copy_generic() commit d9813c3681a36774b254c0cdc9cce53c9e22c756 upstream. The csum_partial_copy_generic() uses register r7 to adjust the remaining bytes to process. Unfortunately, r7 also holds a parameter, namely the address of the flag to set in case of access exceptions while reading the source buffer. Lacking a quantum implementation of PowerPC, this commit instead uses register r9 to do the adjusting, leaving r7's pointer uncorrupted. Signed-off-by: Paul E. McKenney Signed-off-by: Anton Blanchard Signed-off-by: Benjamin Herrenschmidt Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/lib/checksum_64.S | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'arch') diff --git a/arch/powerpc/lib/checksum_64.S b/arch/powerpc/lib/checksum_64.S index 18245af..afa2eba 100644 --- a/arch/powerpc/lib/checksum_64.S +++ b/arch/powerpc/lib/checksum_64.S @@ -272,8 +272,8 @@ _GLOBAL(csum_partial_copy_generic) rldicl. r6,r3,64-1,64-2 /* r6 = (r3 & 0x3) >> 1 */ beq .Lcopy_aligned - li r7,4 - sub r6,r7,r6 + li r9,4 + sub r6,r9,r6 mtctr r6 1: -- cgit v1.1 From 8107520ccf6a1f88d2139ba99e831ca8eeca8a77 Mon Sep 17 00:00:00 2001 From: Kirill Tkhai Date: Fri, 2 Aug 2013 19:23:18 +0400 Subject: sparc64: Fix ITLB handler of null page [ Upstream commit 1c2696cdaad84580545a2e9c0879ff597880b1a9 ] 1)Use kvmap_itlb_longpath instead of kvmap_dtlb_longpath. 2)Handle page #0 only, don't handle page #1: bleu -> blu (KERNBASE is 0x400000, so #1 does not exist too. But everything is possible in the future. Fix to not to have problems later.) 3)Remove unused kvmap_itlb_nonlinear. Signed-off-by: Kirill Tkhai CC: David Miller Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- arch/sparc/kernel/ktlb.S | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'arch') diff --git a/arch/sparc/kernel/ktlb.S b/arch/sparc/kernel/ktlb.S index 79f3103..7c00735 100644 --- a/arch/sparc/kernel/ktlb.S +++ b/arch/sparc/kernel/ktlb.S @@ -25,11 +25,10 @@ kvmap_itlb: */ kvmap_itlb_4v: -kvmap_itlb_nonlinear: /* Catch kernel NULL pointer calls. */ sethi %hi(PAGE_SIZE), %g5 cmp %g4, %g5 - bleu,pn %xcc, kvmap_dtlb_longpath + blu,pn %xcc, kvmap_itlb_longpath nop KERN_TSB_LOOKUP_TL1(%g4, %g6, %g5, %g1, %g2, %g3, kvmap_itlb_load) -- cgit v1.1 From ca0bd2082f83ccf6abbb2db2e4475bb81b415118 Mon Sep 17 00:00:00 2001 From: Kirill Tkhai Date: Mon, 12 Aug 2013 16:02:24 +0400 Subject: sparc64: Remove RWSEM export leftovers [ Upstream commit 61d9b9355b0d427bd1e732bd54628ff9103e496f ] The functions __down_read __down_read_trylock __down_write __down_write_trylock __up_read __up_write __downgrade_write are implemented inline, so remove corresponding EXPORT_SYMBOLs (They lead to compile errors on RT kernel). Signed-off-by: Kirill Tkhai CC: David Miller Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- arch/sparc/lib/ksyms.c | 9 --------- 1 file changed, 9 deletions(-) (limited to 'arch') diff --git a/arch/sparc/lib/ksyms.c b/arch/sparc/lib/ksyms.c index 1b30bb3..fbb8005 100644 --- a/arch/sparc/lib/ksyms.c +++ b/arch/sparc/lib/ksyms.c @@ -131,15 +131,6 @@ EXPORT_SYMBOL(___copy_from_user); EXPORT_SYMBOL(___copy_in_user); EXPORT_SYMBOL(__clear_user); -/* RW semaphores */ -EXPORT_SYMBOL(__down_read); -EXPORT_SYMBOL(__down_read_trylock); -EXPORT_SYMBOL(__down_write); -EXPORT_SYMBOL(__down_write_trylock); -EXPORT_SYMBOL(__up_read); -EXPORT_SYMBOL(__up_write); -EXPORT_SYMBOL(__downgrade_write); - /* Atomic counter implementation. */ EXPORT_SYMBOL(atomic_add); EXPORT_SYMBOL(atomic_add_ret); -- cgit v1.1 From e6114d1d56548014e6f5323d8c71e9de61486786 Mon Sep 17 00:00:00 2001 From: "David S. Miller" Date: Thu, 22 Aug 2013 16:38:46 -0700 Subject: sparc64: Fix off by one in trampoline TLB mapping installation loop. [ Upstream commit 63d499662aeec1864ec36d042aca8184ea6a938e ] Reported-by: Kirill Tkhai Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- arch/sparc/kernel/trampoline_64.S | 2 -- 1 file changed, 2 deletions(-) (limited to 'arch') diff --git a/arch/sparc/kernel/trampoline_64.S b/arch/sparc/kernel/trampoline_64.S index da1b781..8fa84a3 100644 --- a/arch/sparc/kernel/trampoline_64.S +++ b/arch/sparc/kernel/trampoline_64.S @@ -131,7 +131,6 @@ startup_continue: clr %l5 sethi %hi(num_kernel_image_mappings), %l6 lduw [%l6 + %lo(num_kernel_image_mappings)], %l6 - add %l6, 1, %l6 mov 15, %l7 BRANCH_IF_ANY_CHEETAH(g1,g5,2f) @@ -224,7 +223,6 @@ niagara_lock_tlb: clr %l5 sethi %hi(num_kernel_image_mappings), %l6 lduw [%l6 + %lo(num_kernel_image_mappings)], %l6 - add %l6, 1, %l6 1: mov HV_FAST_MMU_MAP_PERM_ADDR, %o5 -- cgit v1.1 From ee0ab40d6810a03cbd74715889dad558c5f9f02d Mon Sep 17 00:00:00 2001 From: Kirill Tkhai Date: Fri, 26 Jul 2013 17:21:12 +0400 Subject: sparc64: Fix not SRA'ed %o5 in 32-bit traced syscall [ Upstream commit ab2abda6377723e0d5fbbfe5f5aa16a5523344d1 ] (From v1 to v2: changed comment) On the way linux_sparc_syscall32->linux_syscall_trace32->goto 2f, register %o5 doesn't clear its second 32-bit. Fix that. Signed-off-by: Kirill Tkhai CC: David Miller Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- arch/sparc/kernel/syscalls.S | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'arch') diff --git a/arch/sparc/kernel/syscalls.S b/arch/sparc/kernel/syscalls.S index 7f5f65d..817187d 100644 --- a/arch/sparc/kernel/syscalls.S +++ b/arch/sparc/kernel/syscalls.S @@ -147,7 +147,7 @@ linux_syscall_trace32: srl %i4, 0, %o4 srl %i1, 0, %o1 srl %i2, 0, %o2 - ba,pt %xcc, 2f + ba,pt %xcc, 5f srl %i3, 0, %o3 linux_syscall_trace: @@ -177,13 +177,13 @@ linux_sparc_syscall32: srl %i1, 0, %o1 ! IEU0 Group ldx [%g6 + TI_FLAGS], %l0 ! Load - srl %i5, 0, %o5 ! IEU1 + srl %i3, 0, %o3 ! IEU0 srl %i2, 0, %o2 ! IEU0 Group andcc %l0, (_TIF_SYSCALL_TRACE|_TIF_SECCOMP|_TIF_SYSCALL_AUDIT|_TIF_SYSCALL_TRACEPOINT), %g0 bne,pn %icc, linux_syscall_trace32 ! CTI mov %i0, %l5 ! IEU1 - call %l7 ! CTI Group brk forced - srl %i3, 0, %o3 ! IEU0 +5: call %l7 ! CTI Group brk forced + srl %i5, 0, %o5 ! IEU1 ba,a,pt %xcc, 3f /* Linux native system calls enter here... */ -- cgit v1.1 From 5391cb09f10c98af52458b4fd6e331a6465797f7 Mon Sep 17 00:00:00 2001 From: Kirill Tkhai Date: Fri, 26 Jul 2013 01:17:15 +0400 Subject: sparc32: Fix exit flag passed from traced sys_sigreturn [ Upstream commit 7a3b0f89e3fea680f93932691ca41a68eee7ab5e ] Pass 1 in %o1 to indicate that syscall_trace accounts exit. Signed-off-by: Kirill Tkhai CC: David Miller Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- arch/sparc/kernel/entry.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/sparc/kernel/entry.S b/arch/sparc/kernel/entry.S index f445e98..cfabc3d 100644 --- a/arch/sparc/kernel/entry.S +++ b/arch/sparc/kernel/entry.S @@ -1177,7 +1177,7 @@ sys_sigreturn: nop call syscall_trace - nop + mov 1, %o1 1: /* We don't want to muck with user registers like a -- cgit v1.1 From a9f1434b8e47776e2b6d42a5556516209f5ba3ae Mon Sep 17 00:00:00 2001 From: Chris Metcalf Date: Thu, 26 Sep 2013 13:24:53 -0400 Subject: tile: use a more conservative __my_cpu_offset in CONFIG_PREEMPT commit f862eefec0b68e099a9fa58d3761ffb10bad97e1 upstream. It turns out the kernel relies on barrier() to force a reload of the percpu offset value. Since we can't easily modify the definition of barrier() to include "tp" as an output register, we instead provide a definition of __my_cpu_offset as extended assembly that includes a fake stack read to hazard against barrier(), forcing gcc to know that it must reread "tp" and recompute anything based on "tp" after a barrier. This fixes observed hangs in the slub allocator when we are looping on a percpu cmpxchg_double. A similar fix for ARMv7 was made in June in change 509eb76ebf97. Signed-off-by: Chris Metcalf Signed-off-by: Greg Kroah-Hartman --- arch/tile/include/asm/percpu.h | 34 +++++++++++++++++++++++++++++++--- 1 file changed, 31 insertions(+), 3 deletions(-) (limited to 'arch') diff --git a/arch/tile/include/asm/percpu.h b/arch/tile/include/asm/percpu.h index 63294f5..4f7ae39 100644 --- a/arch/tile/include/asm/percpu.h +++ b/arch/tile/include/asm/percpu.h @@ -15,9 +15,37 @@ #ifndef _ASM_TILE_PERCPU_H #define _ASM_TILE_PERCPU_H -register unsigned long __my_cpu_offset __asm__("tp"); -#define __my_cpu_offset __my_cpu_offset -#define set_my_cpu_offset(tp) (__my_cpu_offset = (tp)) +register unsigned long my_cpu_offset_reg asm("tp"); + +#ifdef CONFIG_PREEMPT +/* + * For full preemption, we can't just use the register variable + * directly, since we need barrier() to hazard against it, causing the + * compiler to reload anything computed from a previous "tp" value. + * But we also don't want to use volatile asm, since we'd like the + * compiler to be able to cache the value across multiple percpu reads. + * So we use a fake stack read as a hazard against barrier(). + * The 'U' constraint is like 'm' but disallows postincrement. + */ +static inline unsigned long __my_cpu_offset(void) +{ + unsigned long tp; + register unsigned long *sp asm("sp"); + asm("move %0, tp" : "=r" (tp) : "U" (*sp)); + return tp; +} +#define __my_cpu_offset __my_cpu_offset() +#else +/* + * We don't need to hazard against barrier() since "tp" doesn't ever + * change with PREEMPT_NONE, and with PREEMPT_VOLUNTARY it only + * changes at function call points, at which we are already re-reading + * the value of "tp" due to "my_cpu_offset_reg" being a global variable. + */ +#define __my_cpu_offset my_cpu_offset_reg +#endif + +#define set_my_cpu_offset(tp) (my_cpu_offset_reg = (tp)) #include -- cgit v1.1 From ac008905d50badfe8b695fa3f1eef20ac352e3e6 Mon Sep 17 00:00:00 2001 From: Helge Deller Date: Tue, 1 Oct 2013 21:54:46 +0200 Subject: parisc: fix interruption handler to respect pagefault_disable() commit 59b33f148cc08fb33cbe823fca1e34f7f023765e upstream. Running an "echo t > /proc/sysrq-trigger" crashes the parisc kernel. The problem is, that in print_worker_info() we try to read the workqueue info via the probe_kernel_read() functions which use pagefault_disable() to avoid crashes like this: probe_kernel_read(&pwq, &worker->current_pwq, sizeof(pwq)); probe_kernel_read(&wq, &pwq->wq, sizeof(wq)); probe_kernel_read(name, wq->name, sizeof(name) - 1); The problem here is, that the first probe_kernel_read(&pwq) might return zero in pwq and as such the following probe_kernel_reads() try to access contents of the page zero which is read protected and generate a kernel segfault. With this patch we fix the interruption handler to call parisc_terminate() directly only if pagefault_disable() was not called (in which case preempt_count()==0). Otherwise we hand over to the pagefault handler which will try to look up the faulting address in the fixup tables. Signed-off-by: Helge Deller Signed-off-by: John David Anglin Signed-off-by: Helge Deller Signed-off-by: Greg Kroah-Hartman --- arch/parisc/kernel/traps.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'arch') diff --git a/arch/parisc/kernel/traps.c b/arch/parisc/kernel/traps.c index 8b58bf0..0acc27b 100644 --- a/arch/parisc/kernel/traps.c +++ b/arch/parisc/kernel/traps.c @@ -811,14 +811,14 @@ void notrace handle_interruption(int code, struct pt_regs *regs) else { /* - * The kernel should never fault on its own address space. + * The kernel should never fault on its own address space, + * unless pagefault_disable() was called before. */ - if (fault_space == 0) + if (fault_space == 0 && !in_atomic()) { pdc_chassis_send_status(PDC_CHASSIS_DIRECT_PANIC); parisc_terminate("Kernel Fault", regs, code, fault_address); - } } -- cgit v1.1