aboutsummaryrefslogtreecommitdiffstats
path: root/arch/x86/kernel
Commit message (Collapse)AuthorAgeFilesLines
* x86, hypervisor: add missing <linux/module.h>H. Peter Anvin2010-05-092-0/+2
| | | | | | | | | | | | | | | EXPORT_SYMBOL() needs <linux/module.h> to be included; fixes modular builds of the VMware balloon driver, and any future modular drivers which depends on the hypervisor. Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Greg KH <greg@kroah.com> Cc: Hank Janssen <hjanssen@microsoft.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Ky Srinivasan <ksrinivasan@novell.com> Cc: Dmitry Torokhov <dtor@vmware.com> LKML-Reference: <4BE49778.6060800@zytor.com>
* x86, hypervisor: Export the x86_hyper* symbolsH. Peter Anvin2010-05-093-1/+3
| | | | | | | | | | | | | Export x86_hyper and the related specific structures, allowing for hypervisor identification by modules. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Greg KH <greg@kroah.com> Cc: Hank Janssen <hjanssen@microsoft.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Ky Srinivasan <ksrinivasan@novell.com> Cc: Dmitry Torokhov <dtor@vmware.com> LKML-Reference: <4BE49778.6060800@zytor.com>
* Merge commit 'v2.6.34-rc6' into x86/cpuH. Peter Anvin2010-05-0864-103/+280
|\
| * Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds2010-04-283-3/+24
| |\ | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip: x86: Disable large pages on CPUs with Atom erratum AAE44 x86-64: Clear a 64-bit FS/GS base on fork if selector is nonzero x86, mrst: Conditionally register cpu hotplug notifier for apbt
| | * x86: Disable large pages on CPUs with Atom erratum AAE44H. Peter Anvin2010-04-231-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Atom erratum AAE44/AAF40/AAG38/AAH41: "If software clears the PS (page size) bit in a present PDE (page directory entry), that will cause linear addresses mapped through this PDE to use 4-KByte pages instead of using a large page after old TLB entries are invalidated. Due to this erratum, if a code fetch uses this PDE before the TLB entry for the large page is invalidated then it may fetch from a different physical address than specified by either the old large page translation or the new 4-KByte page translation. This erratum may also cause speculative code fetches from incorrect addresses." [http://download.intel.com/design/processor/specupdt/319536.pdf] Where as commit 211b3d03c7400f48a781977a50104c9d12f4e229 seems to workaround errata AAH41 (mixed 4K TLBs) it reduces the window of opportunity for the bug to occur and does not totally remove it. This patch disables mixed 4K/4MB page tables totally avoiding the page splitting and not tripping this processor issue. This is based on an original patch by Colin King. Originally-by: Colin Ian King <colin.king@canonical.com> Cc: Colin Ian King <colin.king@canonical.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> LKML-Reference: <1269271251-19775-1-git-send-email-colin.king@canonical.com> Cc: <stable@kernel.org>
| | * x86-64: Clear a 64-bit FS/GS base on fork if selector is nonzeroH. Peter Anvin2010-04-231-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we do a thread switch, we clear the outgoing FS/GS base if the corresponding selector is nonzero. This is taken by __switch_to() as an entry invariant; it does not verify that it is true on entry. However, copy_thread() doesn't enforce this constraint, which can result in inconsistent results after fork(). Make copy_thread() match the behavior of __switch_to(). Reported-and-tested-by: Samuel Thibault <samuel.thibault@inria.fr> Signed-off-by: H. Peter Anvin <hpa@zytor.com> LKML-Reference: <4BD1E061.8030605@zytor.com> Cc: <stable@kernel.org>
| | * x86, mrst: Conditionally register cpu hotplug notifier for apbtJacob Pan2010-04-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | APB timer is used on Moorestown platforms but not on a standard PC. If APB timer code is compiled in but not initialized at run-time due to lack of FW reported SFI table, kernel would panic when the non-boot CPUs are offlined and notifier is called. https://bugzilla.kernel.org/show_bug.cgi?id=15786 This patch ensures CPU hotplug notifier for APB timer is only registered when the APBT timer block is initialized. Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> LKML-Reference: <1271701423-1162-1-git-send-email-jacob.jun.pan@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * | VMware Balloon driverDmitry Torokhov2010-04-241-0/+2
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a standalone version of VMware Balloon driver. Ballooning is a technique that allows hypervisor dynamically limit the amount of memory available to the guest (with guest cooperation). In the overcommit scenario, when hypervisor set detects that it needs to shuffle some memory, it instructs the driver to allocate certain number of pages, and the underlying memory gets returned to the hypervisor. Later hypervisor may return memory to the guest by reattaching memory to the pageframes and instructing the driver to "deflate" balloon. We are submitting a standalone driver because KVM maintainer (Avi Kivity) expressed opinion (rightly) that our transport does not fit well into virtqueue paradigm and thus it does not make much sense to integrate with virtio. There were also some concerns whether current ballooning technique is the right thing. If there appears a better framework to achieve this we are prepared to evaluate and switch to using it, but in the meantime we'd like to get this driver upstream. We want to get the driver accepted in distributions so that users do not have to deal with an out-of-tree module and many distributions have "upstream first" requirement. The driver has been shipping for a number of years and users running on VMware platform will have it installed as part of VMware Tools even if it will not come from a distribution, thus there should not be additional risk in pulling the driver into mainline. The driver will only activate if host is VMware so everyone else should not be affected at all. Signed-off-by: Dmitry Torokhov <dtor@vmware.com> Cc: Avi Kivity <avi@redhat.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds2010-04-201-2/+6
| |\ | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf: Fix unsafe frame rewinding with hot regs fetching
| | * perf: Fix unsafe frame rewinding with hot regs fetchingFrederic Weisbecker2010-04-081-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we fetch the hot regs and rewind to the nth caller, it might happen that we dereference a frame pointer outside the kernel stack boundaries, like in this example: perf_trace_sched_switch+0xd5/0x120 schedule+0x6b5/0x860 retint_careful+0xd/0x21 Since we directly dereference a userspace frame pointer here while rewinding behind retint_careful, this may end up in a crash. Fix this by simply using probe_kernel_address() when we rewind the frame pointer. This issue will have a much more proper fix in the next version of the perf_arch_fetch_caller_regs() API that will only need to rewind to the first caller. Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Tested-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: David Miller <davem@davemloft.net> Cc: Archs <linux-arch@vger.kernel.org>
| * | Merge branch 'iommu/fixes' of ↵Ingo Molnar2010-04-135-28/+64
| |\ \ | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into x86/urgent
| | * | x86/gart: Disable GART explicitly before initializationJoerg Roedel2010-04-072-1/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we boot into a crash-kernel the gart might still be enabled and its caches might be dirty. This can result in undefined behavior later. Fix it by explicitly disabling the gart hardware before initialization and flushing the caches after enablement. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
| | * | Merge branch 'amd-iommu/fixes' into iommu/fixesJoerg Roedel2010-04-073-27/+47
| | |\ \
| | | * | x86/amd-iommu: use for_each_pci_devChris Wright2010-04-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace open coded version with for_each_pci_dev Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
| | | * | Revert "x86: disable IOMMUs on kernel crash"Chris Wright2010-04-071-6/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This effectively reverts commit 61d047be99757fd9b0af900d7abce9a13a337488. Disabling the IOMMU can potetially allow DMA transactions to complete without being translated. Leave it enabled, and allow crash kernel to do the IOMMU reinitialization properly. Cc: stable@kernel.org Cc: Joerg Roedel <joerg.roedel@amd.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
| | | * | x86/amd-iommu: warn when issuing command to uninitialized cmd bufferChris Wright2010-04-072-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To catch future potential issues we can add a warning whenever we issue a command before the command buffer is fully initialized. Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
| | | * | x86/amd-iommu: enable iommu before attaching devicesChris Wright2010-04-071-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Hit another kdump problem as reported by Neil Horman. When initializaing the IOMMU, we attach devices to their domains before the IOMMU is fully (re)initialized. Attaching a device will issue some important invalidations. In the context of the newly kexec'd kdump kernel, the IOMMU may have stale cached data from the original kernel. Because we do the attach too early, the invalidation commands are placed in the new command buffer before the IOMMU is updated w/ that buffer. This leaves the stale entries in the kdump context and can renders device unusable. Simply enable the IOMMU before we do the attach. Cc: stable@kernel.org Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
| | | * | x86/amd-iommu: Use helper function to destroy domainJoerg Roedel2010-03-081-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the amd_iommu_domain_destroy the protection_domain_free function is partly reimplemented. The 'partly' is the bug here because the domain is not deleted from the domain list. This results in use-after-free errors and data-corruption. Fix it by just using protection_domain_free instead. Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
| | | * | x86/amd-iommu: Report errors in acpi parsing functions upstreamJoerg Roedel2010-03-011-11/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since acpi_table_parse ignores the return values of the parsing function this patch introduces a workaround and reports these errors upstream via a global variable. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
| | | * | x86/amd-iommu: Pt mode fix for domain_destroyChris Wright2010-03-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After a guest is shutdown, assigned devices are not properly returned to the pt domain. This can leave the device using stale cached IOMMU data, and result in a non-functional device after it's re-bound to the host driver. For example, I see this upon rebinding: AMD-Vi: Event logged [IO_PAGE_FAULT device=02:00.0 domain=0x0000 address=0x000000007e2a8000 flags=0x0050] AMD-Vi: Event logged [IO_PAGE_FAULT device=02:00.0 domain=0x0000 address=0x000000007e2a8040 flags=0x0050] AMD-Vi: Event logged [IO_PAGE_FAULT device=02:00.0 domain=0x0000 address=0x000000007e2a8080 flags=0x0050] AMD-Vi: Event logged [IO_PAGE_FAULT device=02:00.0 domain=0x0000 address=0x000000007e2a80c0 flags=0x0050] 0000:02:00.0: eth2: Detected Hardware Unit Hang: ... The amd_iommu_destroy_domain() function calls do_detach() which doesn't reattach the pt domain to the device. Use __detach_device() instead. Cc: stable@kernel.org Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
| | | * | x86/amd-iommu: Protect IOMMU-API map/unmap pathJoerg Roedel2010-03-011-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces a mutex to lock page table updates in the IOMMU-API path. We can't use the spin_lock here because this patch might sleep. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
| | | * | x86/amd-iommu: Remove double NULL check in check_deviceJulia Lawall2010-03-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dev was tested just above, so drop the second test. Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
| * | | | Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds2010-04-071-0/+1
| |\ \ \ \ | | | |_|/ | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf, x86: Enable Nehalem-EX support perf kmem: Fix breakage introduced by 5a0e3ad slab.h script
| | * | | perf, x86: Enable Nehalem-EX supportVince Weaver2010-04-061-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | According to Intel Software Devel Manual Volume 3B, the Nehalem-EX PMU is just like regular Nehalem (except for the uncore support, which is completely different). Signed-off-by: Vince Weaver <vweaver1@eecs.utk.edu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Lin Ming <ming.m.lin@intel.com> LKML-Reference: <alpine.DEB.2.00.1004060956580.1417@cl320.eecs.utk.edu> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds2010-04-075-9/+44
| |\ \ \ \ | | |/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip: x86: Fix double enable_IR_x2apic() call on SMP kernel on !SMP boards x86: Increase CONFIG_NODES_SHIFT max to 10 ibft, x86: Change reserve_ibft_region() to find_ibft_region() x86, hpet: Fix bug in RTC emulation x86, hpet: Erratum workaround for read after write of HPET comparator bootmem, x86: Fix 32bit numa system without RAM on node 0 nobootmem, x86: Fix 32bit numa system without RAM on node 0 x86: Handle overlapping mptables x86: Make e820_remove_range to handle all covered case x86-32, resume: do a global tlb flush in S4 resume
| | * | | x86: Fix double enable_IR_x2apic() call on SMP kernel on !SMP boardsSuresh Siddha2010-04-021-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Jan Grossmann reported kernel boot panic while booting SMP kernel on his system with a single core cpu. SMP kernels call enable_IR_x2apic() from native_smp_prepare_cpus() and on platforms where the kernel doesn't find SMP configuration we ended up again calling enable_IR_x2apic() from the APIC_init_uniprocessor() call in the smp_sanity_check(). Thus leading to kernel panic. Don't call enable_IR_x2apic() and default_setup_apic_routing() from APIC_init_uniprocessor() in CONFIG_SMP case. NOTE: this kind of non-idempotent and assymetric initialization sequence is rather fragile and unclean, we'll clean that up in v2.6.35. This is the minimal fix for v2.6.34. Reported-by: Jan.Grossmann@kielnet.net Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: <jbarnes@virtuousgeek.org> Cc: <david.woodhouse@intel.com> Cc: <weidong.han@intel.com> Cc: <youquan.song@intel.com> Cc: <Jan.Grossmann@kielnet.net> Cc: <stable@kernel.org> # [v2.6.32.x, v2.6.33.x] LKML-Reference: <1270083887.7835.78.camel@sbs-t61.sc.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | ibft, x86: Change reserve_ibft_region() to find_ibft_region()Yinghai Lu2010-04-011-2/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This allows arch code could decide the way to reserve the ibft. And we should reserve ibft as early as possible, instead of BOOTMEM stage, in case the table is in RAM range and is not reserved by BIOS (this will often be the case.) Move to just after find_smp_config(). Also when CONFIG_NO_BOOTMEM=y, We will not have reserve_bootmem() anymore. -v2: fix typo about ibft pointed by Konrad Rzeszutek Wilk <konrad@darnok.org> Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4BB510FB.80601@kernel.org> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Peter Jones <pjones@redhat.com> Cc: Konrad Rzeszutek Wilk <konrad@kernel.org> CC: Jan Beulich <jbeulich@novell.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| | * | | x86, hpet: Fix bug in RTC emulationAlok Kataria2010-04-011-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We think there exists a bug in the HPET code that emulates the RTC. In the normal case, when the RTC frequency is set, the rtc driver tells the hpet code about it here: int hpet_set_periodic_freq(unsigned long freq) { uint64_t clc; if (!is_hpet_enabled()) return 0; if (freq <= DEFAULT_RTC_INT_FREQ) hpet_pie_limit = DEFAULT_RTC_INT_FREQ / freq; else { clc = (uint64_t) hpet_clockevent.mult * NSEC_PER_SEC; do_div(clc, freq); clc >>= hpet_clockevent.shift; hpet_pie_delta = (unsigned long) clc; } return 1; } If freq is set to 64Hz (DEFAULT_RTC_INT_FREQ) or lower, then hpet_pie_limit (a static) is set to non-zero. Then, on every one-shot HPET interrupt, hpet_rtc_timer_reinit is called to compute the next timeout. Well, that function has this logic: if (!(hpet_rtc_flags & RTC_PIE) || hpet_pie_limit) delta = hpet_default_delta; else delta = hpet_pie_delta; Since hpet_pie_limit is not 0, hpet_default_delta is used. That corresponds to 64Hz. Now, if you set a different rtc frequency, you'll take the else path through hpet_set_periodic_freq, but unfortunately no one resets hpet_pie_limit back to 0. Boom....now you are stuck with 64Hz RTC interrupts forever. The patch below just resets the hpet_pie_limit value when requested freq is greater than DEFAULT_RTC_INT_FREQ, which we think fixes this problem. Signed-off-by: Alok N Kataria <akataria@vmware.com> LKML-Reference: <201003112200.o2BM0Hre012875@imap1.linux-foundation.org> Signed-off-by: Daniel Hecht <dhecht@vmware.com> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| | * | | x86, hpet: Erratum workaround for read after write of HPET comparatorPallipadi, Venkatesh2010-04-011-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On Wed, Feb 24, 2010 at 03:37:04PM -0800, Justin Piszcz wrote: > Hello, > > Again, on the Intel DP55KG board: > > # uname -a > Linux host 2.6.33 #1 SMP Wed Feb 24 18:31:00 EST 2010 x86_64 GNU/Linux > > [ 1.237600] ------------[ cut here ]------------ > [ 1.237890] WARNING: at arch/x86/kernel/hpet.c:404 hpet_next_event+0x70/0x80() > [ 1.238221] Hardware name: > [ 1.238504] hpet: compare register read back failed. > [ 1.238793] Modules linked in: > [ 1.239315] Pid: 0, comm: swapper Not tainted 2.6.33 #1 > [ 1.239605] Call Trace: > [ 1.239886] <IRQ> [<ffffffff81056c13>] ? warn_slowpath_common+0x73/0xb0 > [ 1.240409] [<ffffffff81079608>] ? tick_dev_program_event+0x38/0xc0 > [ 1.240699] [<ffffffff81056cb0>] ? warn_slowpath_fmt+0x40/0x50 > [ 1.240992] [<ffffffff81079608>] ? tick_dev_program_event+0x38/0xc0 > [ 1.241281] [<ffffffff81041ad0>] ? hpet_next_event+0x70/0x80 > [ 1.241573] [<ffffffff81079608>] ? tick_dev_program_event+0x38/0xc0 > [ 1.241859] [<ffffffff81078e32>] ? tick_handle_oneshot_broadcast+0xe2/0x100 > [ 1.246533] [<ffffffff8102a67a>] ? timer_interrupt+0x1a/0x30 > [ 1.246826] [<ffffffff81085499>] ? handle_IRQ_event+0x39/0xd0 > [ 1.247118] [<ffffffff81087368>] ? handle_edge_irq+0xb8/0x160 > [ 1.247407] [<ffffffff81029f55>] ? handle_irq+0x15/0x20 > [ 1.247689] [<ffffffff810294a2>] ? do_IRQ+0x62/0xe0 > [ 1.247976] [<ffffffff8146be53>] ? ret_from_intr+0x0/0xa > [ 1.248262] <EOI> [<ffffffff8102f277>] ? mwait_idle+0x57/0x80 > [ 1.248796] [<ffffffff8102645c>] ? cpu_idle+0x5c/0xb0 > [ 1.249080] ---[ end trace db7f668fb6fef4e1 ]--- > > Is this something Intel has to fix or is it a bug in the kernel? This is a chipset erratum. Thomas: You mentioned we can retain this check only for known-buggy and hpet debug kind of options. But here is the simple workaround patch for this particular erratum. Some chipsets have a erratum due to which read immediately following a write of HPET comparator returns old comparator value instead of most recently written value. Erratum 15 in "Intel I/O Controller Hub 9 (ICH9) Family Specification Update" (http://www.intel.com/assets/pdf/specupdate/316973.pdf) Workaround for the errata is to read the comparator twice if the first one fails. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> LKML-Reference: <20100225185348.GA9674@linux-os.sc.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@gmail.com> Cc: <stable@kernel.org>
| | * | | x86: Handle overlapping mptablesAndi Kleen2010-04-011-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We found a system where the MP table MPC and MPF structures overlap. That doesn't really matter because the mptable is not used anyways with ACPI, but it leads to a panic in the early allocator due to the overlapping reservations in 2.6.33. Earlier kernels handled this without problems. Simply change these reservations to reserve_early_overlap_ok to avoid the panic. Reported-by: Thomas Renninger <trenn@suse.de> Tested-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Andi Kleen <ak@linux.intel.com> LKML-Reference: <20100329074111.GA22821@basil.fritz.box> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: <stable@kernel.org>
| | * | | x86: Make e820_remove_range to handle all covered caseYinghai Lu2010-03-311-4/+20
| | |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rusty found on lguest with trim_bios_range, max_pfn is not right anymore, and looks e820_remove_range does not work right. [ 0.000000] BIOS-provided physical RAM map: [ 0.000000] LGUEST: 0000000000000000 - 0000000004000000 (usable) [ 0.000000] Notice: NX (Execute Disable) protection missing in CPU or disabled in BIOS! [ 0.000000] DMI not present or invalid. [ 0.000000] last_pfn = 0x3fa0 max_arch_pfn = 0x100000 [ 0.000000] init_memory_mapping: 0000000000000000-0000000003fa0000 root cause is: the e820_remove_range doesn't handle the all covered case. e820_remove_range(BIOS_START, BIOS_END - BIOS_START, ...) produces a bogus range as a result. Make it match e820_update_range() by handling that case too. Reported-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Tested-by: Rusty Russell <rusty@rustcorp.com.au> LKML-Reference: <4BB18E55.6090903@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * | | Merge branch 'master' into export-slabhTejun Heo2010-04-059-53/+111
| |\ \ \
| | * | | perf, x86: Fix callgraphs of 32-bit processes on 64-bit kernelsTorok Edwin2010-04-022-5/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When profiling a 32-bit process on a 64-bit kernel, callgraph tracing stopped after the first function, because it has seen a garbage memory address (tried to interpret the frame pointer, and return address as a 64-bit pointer). Fix this by using a struct stack_frame with 32-bit pointers when the TIF_IA32 flag is set. Note that TIF_IA32 flag must be used, and not is_compat_task(), because the latter is only set when the 32-bit process is executing a syscall, which may not always be the case (when tracing page fault events for example). Signed-off-by: Török Edwin <edwintorok@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paul Mackerras <paulus@samba.org> Cc: x86@kernel.org Cc: linux-kernel@vger.kernel.org LKML-Reference: <1268820436-13145-1-git-send-email-edwintorok@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | perf, x86: Fix AMD hotplug & constraint initializationPeter Zijlstra2010-04-022-36/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 3f6da39 ("perf: Rework and fix the arch CPU-hotplug hooks") moved the amd northbridge allocation from CPUS_ONLINE to CPUS_PREPARE_UP however amd_nb_id() doesn't work yet on prepare so it would simply bail basically reverting to a state where we do not properly track node wide constraints - causing weird perf results. Fix up the AMD NorthBridge initialization code by allocating from CPU_UP_PREPARE and installing it from CPU_STARTING once we have the proper nb_id. It also properly deals with the allocation failing. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> [ robustify using amd_has_nb() ] Signed-off-by: Stephane Eranian <eranian@google.com> LKML-Reference: <1269353485.5109.48.camel@twins> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | x86: Move notify_cpu_starting() callback to a later stagePeter Zijlstra2010-04-021-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Because we need to have cpu identification things done by the time we run CPU_STARTING notifiers. ( This init ordering will be relied on by the next fix. ) Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1269353485.5109.48.camel@twins> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | Merge branch 'perf/urgent' of ↵Ingo Molnar2010-04-022-3/+1
| | |\ \ \ | | | |/ / | | |/| | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into perf/urgent
| | | * | x86,kgdb: Always initialize the hw breakpoint attributeJason Wessel2010-04-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is required to call hw_breakpoint_init() on an attr before using it in any other calls. This fixes the problem where kgdb will sometimes fail to initialize on x86_64. Signed-off-by: Jason Wessel <jason.wessel@windriver.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: 2.6.33 <stable@kernel.org> LKML-Reference: <1269975907-27602-1-git-send-email-jason.wessel@windriver.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
| | | * | perf: Use hot regs with software sched switch/migrate eventsFrederic Weisbecker2010-04-011-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Scheduler's task migration events don't work because they always pass NULL regs perf_sw_event(). The event hence gets filtered in perf_swevent_add(). Scheduler's context switches events use task_pt_regs() to get the context when the event occured which is a wrong thing to do as this won't give us the place in the kernel where we went to sleep but the place where we left userspace. The result is even more wrong if we switch from a kernel thread. Use the hot regs snapshot for both events as they belong to the non-interrupt/exception based events family. Unlike page faults or so that provide the regs matching the exact origin of the event, we need to save the current context. This makes the task migration event working and fix the context switch callchains and origin ip. Example: perf record -a -e cs Before: 10.91% ksoftirqd/0 0 [k] 0000000000000000 | --- (nil) perf_callchain perf_prepare_sample __perf_event_overflow perf_swevent_overflow perf_swevent_add perf_swevent_ctx_event do_perf_sw_event __perf_sw_event perf_event_task_sched_out schedule run_ksoftirqd kthread kernel_thread_helper After: 23.77% hald-addon-stor [kernel.kallsyms] [k] schedule | --- schedule | |--60.00%-- schedule_timeout | wait_for_common | wait_for_completion | blk_execute_rq | scsi_execute | scsi_execute_req | sr_test_unit_ready | | | |--66.67%-- sr_media_change | | media_changed | | cdrom_media_changed | | sr_block_media_changed | | check_disk_change | | cdrom_open v2: Always build perf_arch_fetch_caller_regs() now that software events need that too. They don't need it from modules, unlike trace events, so we keep the EXPORT_SYMBOL in trace_event_perf.c Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Miller <davem@davemloft.net>
| * | | | include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo2010-03-3052-15/+40
| | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
* | | | x86: Clean up the hypervisor layerH. Peter Anvin2010-05-073-65/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clean up the hypervisor layer and the hypervisor drivers, using an ops structure instead of an enumeration with if statements. The identity of the hypervisor, if needed, can be tested by testing the pointer value in x86_hyper. The MS-HyperV private state is moved into a normal global variable (it's per-system state, not per-CPU state). Being a normal bss variable, it will be left at all zero on non-HyperV platforms, and so can generally be tested for HyperV-specific features without additional qualification. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Acked-by: Greg KH <greg@kroah.com> Cc: Hank Janssen <hjanssen@microsoft.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Ky Srinivasan <ksrinivasan@novell.com> LKML-Reference: <4BE49778.6060800@zytor.com>
* | | | x86, HyperV: fix up the license to mshyperv.cGreg Kroah-Hartman2010-05-071-12/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This should have been GPLv2 only, we cut and pasted from the wrong file originally, sorry. Also removed some unneeded boilerplate license code, we all know where to find the GPLv2, and that there's no warranty as that is implicit from the license. Cc: Ky Srinivasan <ksrinivasan@novell.com> Cc: Hank Janssen <hjanssen@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> LKML-Reference: <20100507235541.GA15448@kroah.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* | | | x86: Detect running on a Microsoft HyperV systemKy Srinivasan2010-05-063-4/+75
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch integrates HyperV detection within the framework currently used by VmWare. With this patch, we can avoid having to replicate the HyperV detection code in each of the Microsoft HyperV drivers. Reworked and tweaked by Greg K-H to build properly. Signed-off-by: K. Y. Srinivasan <ksrinivasan@novell.com> LKML-Reference: <20100506190841.GA1605@kroah.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Vadim Rozenfeld <vrozenfe@redhat.com> Cc: Avi Kivity <avi@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: "K.Prasad" <prasad@linux.vnet.ibm.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Alan Cox <alan@linux.intel.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Hank Janssen <hjanssen@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* | | | x86, cpu: Make APERF/MPERF a normal table-driven flagH. Peter Anvin2010-05-031-15/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | APERF/MPERF can be handled via the table like all the other scattered CPU flags. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Thomas Renninger <trenn@suse.de> Cc: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <1270065406-1814-4-git-send-email-bp@amd64.org>
* | | | x86, cacheinfo: Disable index in all four subcachesBorislav Petkov2010-04-221-17/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When disabling an L3 cache index, make sure we disable that index in all four subcaches of the L3. Clarify nomenclature while at it, wrt to disable slots versus disable index and rename accordingly. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <1271945222-5283-6-git-send-email-bp@amd64.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* | | | x86, cacheinfo: Make L3 cache info per nodeBorislav Petkov2010-04-221-14/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, we're allocating L3 cache info and calculating indices for each online cpu which is clearly superfluous. Instead, we need to do this per-node as is each L3 cache. No functional change, only per-cpu memory savings. -v2: Allocate L3 cache descriptors array dynamically. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <1271945222-5283-5-git-send-email-bp@amd64.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* | | | x86, cacheinfo: Reorganize AMD L3 cache structureBorislav Petkov2010-04-221-21/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a struct representing L3 cache attributes (subcache sizes and indices count) and move the respective members out of _cpuid4_info. Also, stash the struct pci_dev ptr into the struct simplifying the code even more. There should be no functionality change resulting from this patch except slightly slimming the _cpuid4_info per-cpu vars. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <1271945222-5283-4-git-send-email-bp@amd64.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* | | | x86, cacheinfo: Turn off L3 cache index disable feature in virtualized ↵Frank Arnold2010-04-221-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | environments When running a quest kernel on xen we get: BUG: unable to handle kernel NULL pointer dereference at 0000000000000038 IP: [<ffffffff8142f2fb>] cpuid4_cache_lookup_regs+0x2ca/0x3df PGD 0 Oops: 0000 [#1] SMP last sysfs file: CPU 0 Modules linked in: Pid: 0, comm: swapper Tainted: G W 2.6.34-rc3 #1 /HVM domU RIP: 0010:[<ffffffff8142f2fb>] [<ffffffff8142f2fb>] cpuid4_cache_lookup_regs+0x 2ca/0x3df RSP: 0018:ffff880002203e08 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000060 RDX: 0000000000000000 RSI: 0000000000000040 RDI: 0000000000000000 RBP: ffff880002203ed8 R08: 00000000000017c0 R09: ffff880002203e38 R10: ffff8800023d5d40 R11: ffffffff81a01e28 R12: ffff880187e6f5c0 R13: ffff880002203e34 R14: ffff880002203e58 R15: ffff880002203e68 FS: 0000000000000000(0000) GS:ffff880002200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000038 CR3: 0000000001a3c000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper (pid: 0, threadinfo ffffffff81a00000, task ffffffff81a44020) Stack: ffffffff810d7ecb ffff880002203e20 ffffffff81059140 ffff880002203e30 <0> ffffffff810d7ec9 0000000002203e40 000000000050d140 ffff880002203e70 <0> 0000000002008140 0000000000000086 ffff880040020140 ffffffff81068b8b Call Trace: <IRQ> [<ffffffff810d7ecb>] ? sync_supers_timer_fn+0x0/0x1c [<ffffffff81059140>] ? mod_timer+0x23/0x25 [<ffffffff810d7ec9>] ? arm_supers_timer+0x34/0x36 [<ffffffff81068b8b>] ? hrtimer_get_next_event+0xa7/0xc3 [<ffffffff81058e85>] ? get_next_timer_interrupt+0x19a/0x20d [<ffffffff8142fa23>] get_cpu_leaves+0x5c/0x232 [<ffffffff8106a7b1>] ? sched_clock_local+0x1c/0x82 [<ffffffff8106a9a0>] ? sched_clock_tick+0x75/0x7a [<ffffffff8107748c>] generic_smp_call_function_single_interrupt+0xae/0xd0 [<ffffffff8101f6ef>] smp_call_function_single_interrupt+0x18/0x27 [<ffffffff8100a773>] call_function_single_interrupt+0x13/0x20 <EOI> [<ffffffff8143c468>] ? notifier_call_chain+0x14/0x63 [<ffffffff810295c6>] ? native_safe_halt+0xc/0xd [<ffffffff810114eb>] ? default_idle+0x36/0x53 [<ffffffff81008c22>] cpu_idle+0xaa/0xe4 [<ffffffff81423a9a>] rest_init+0x7e/0x80 [<ffffffff81b10dd2>] start_kernel+0x40e/0x419 [<ffffffff81b102c8>] x86_64_start_reservations+0xb3/0xb7 [<ffffffff81b103c4>] x86_64_start_kernel+0xf8/0x107 Code: 14 d5 40 ff ae 81 8b 14 02 31 c0 3b 15 47 1c 8b 00 7d 0e 48 8b 05 36 1c 8b 00 48 63 d2 48 8b 04 d0 c7 85 5c ff ff ff 00 00 00 00 <8b> 70 38 48 8d 8d 5c ff ff ff 48 8b 78 10 ba c4 01 00 00 e8 eb RIP [<ffffffff8142f2fb>] cpuid4_cache_lookup_regs+0x2ca/0x3df RSP <ffff880002203e08> CR2: 0000000000000038 ---[ end trace a7919e7f17c0a726 ]--- The L3 cache index disable feature of AMD CPUs has to be disabled if the kernel is running as guest on top of a hypervisor because northbridge devices are not available to the guest. Currently, this fixes a boot crash on top of Xen. In the future this will become an issue on KVM as well. Check if northbridge devices are present and do not enable the feature if there are none. Signed-off-by: Frank Arnold <frank.arnold@amd.com> LKML-Reference: <1271945222-5283-3-git-send-email-bp@amd64.org> Acked-by: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* | | | x86, cacheinfo: Unify AMD L3 cache index disable checkingBorislav Petkov2010-04-221-7/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All F10h CPUs starting with model 8 resp. 9, stepping 1, support L3 cache index disable. Concentrate the family, model, stepping checking at one place and enable the feature implicitly on upcoming Fam10h models. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <1271945222-5283-2-git-send-email-bp@amd64.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* | | | powernow-k8: Fix frequency reportingMark Langsdorf2010-04-091-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With F10, model 10, all valid frequencies are in the ACPI _PST table. Cc: <stable@kernel.org> # 33.x 32.x Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com> LKML-Reference: <1270065406-1814-6-git-send-email-bp@amd64.org> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Reviewed-by: Thomas Renninger <trenn@suse.de> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* | | | x86, cpufreq: Add APERF/MPERF support for AMD processorsMark Langsdorf2010-04-095-44/+72
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Starting with model 10 of Family 0x10, AMD processors may have support for APERF/MPERF. Add support for identifying it and using it within cpufreq. Move the APERF/MPERF functions out of the acpi-cpufreq code and into their own file so they can easily be shared. Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com> LKML-Reference: <20100401141956.GA1930@aftab> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Reviewed-by: Thomas Renninger <trenn@suse.de> Signed-off-by: H. Peter Anvin <hpa@zytor.com>