aboutsummaryrefslogtreecommitdiffstats
path: root/kernel/hrtimer.c
Commit message (Collapse)AuthorAgeFilesLines
* Merge remote-tracking branch 'korg/linux-3.0.y' into cm-13.0rogersb112015-11-101-35/+38
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: crypto/algapi.c drivers/gpu/drm/i915/i915_debugfs.c drivers/gpu/drm/i915/intel_display.c drivers/video/fbmem.c include/linux/nls.h kernel/cgroup.c kernel/signal.c kernel/timeconst.pl net/ipv4/ping.c Change-Id: I1f532925d1743df74d66bcdd6fc92f05c72ee0dd
| * hrtimers: Move SMP function call to thread contextThomas Gleixner2013-07-281-15/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 5ec2481b7b47a4005bb446d176e5d0257400c77d upstream. smp_call_function_* must not be called from softirq context. But clock_was_set() which calls on_each_cpu() is called from softirq context to implement a delayed clock_was_set() for the timer interrupt handler. Though that almost never gets invoked. A recent change in the resume code uses the softirq based delayed clock_was_set to support Xens resume mechanism. linux-next contains a new warning which warns if smp_call_function_* is called from softirq context which gets triggered by that Xen change. Fix this by moving the delayed clock_was_set() call to a work context. Reported-and-tested-by: Artem Savkov <artem.savkov@gmail.com> Reported-by: Sasha Levin <sasha.levin@oracle.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com>, Cc: Konrad Wilk <konrad.wilk@oracle.com> Cc: John Stultz <john.stultz@linaro.org> Cc: xen-devel@lists.xen.org Cc: stable@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * hrtimer: Add expiry time overflow check in hrtimer_interruptPrarit Bhargava2013-05-071-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 8f294b5a139ee4b75e890ad5b443c93d1e558a8b upstream. The settimeofday01 test in the LTP testsuite effectively does gettimeofday(current time); settimeofday(Jan 1, 1970 + 100 seconds); settimeofday(current time); This test causes a stack trace to be displayed on the console during the setting of timeofday to Jan 1, 1970 + 100 seconds: [ 131.066751] ------------[ cut here ]------------ [ 131.096448] WARNING: at kernel/time/clockevents.c:209 clockevents_program_event+0x135/0x140() [ 131.104935] Hardware name: Dinar [ 131.108150] Modules linked in: sg nfsv3 nfs_acl nfsv4 auth_rpcgss nfs dns_resolver fscache lockd sunrpc nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables kvm_amd kvm sp5100_tco bnx2 i2c_piix4 crc32c_intel k10temp fam15h_power ghash_clmulni_intel amd64_edac_mod pcspkr serio_raw edac_mce_amd edac_core microcode xfs libcrc32c sr_mod sd_mod cdrom ata_generic crc_t10dif pata_acpi radeon i2c_algo_bit drm_kms_helper ttm drm ahci pata_atiixp libahci libata usb_storage i2c_core dm_mirror dm_region_hash dm_log dm_mod [ 131.176784] Pid: 0, comm: swapper/28 Not tainted 3.8.0+ #6 [ 131.182248] Call Trace: [ 131.184684] <IRQ> [<ffffffff810612af>] warn_slowpath_common+0x7f/0xc0 [ 131.191312] [<ffffffff8106130a>] warn_slowpath_null+0x1a/0x20 [ 131.197131] [<ffffffff810b9fd5>] clockevents_program_event+0x135/0x140 [ 131.203721] [<ffffffff810bb584>] tick_program_event+0x24/0x30 [ 131.209534] [<ffffffff81089ab1>] hrtimer_interrupt+0x131/0x230 [ 131.215437] [<ffffffff814b9600>] ? cpufreq_p4_target+0x130/0x130 [ 131.221509] [<ffffffff81619119>] smp_apic_timer_interrupt+0x69/0x99 [ 131.227839] [<ffffffff8161805d>] apic_timer_interrupt+0x6d/0x80 [ 131.233816] <EOI> [<ffffffff81099745>] ? sched_clock_cpu+0xc5/0x120 [ 131.240267] [<ffffffff814b9ff0>] ? cpuidle_wrap_enter+0x50/0xa0 [ 131.246252] [<ffffffff814b9fe9>] ? cpuidle_wrap_enter+0x49/0xa0 [ 131.252238] [<ffffffff814ba050>] cpuidle_enter_tk+0x10/0x20 [ 131.257877] [<ffffffff814b9c89>] cpuidle_idle_call+0xa9/0x260 [ 131.263692] [<ffffffff8101c42f>] cpu_idle+0xaf/0x120 [ 131.268727] [<ffffffff815f8971>] start_secondary+0x255/0x257 [ 131.274449] ---[ end trace 1151a50552231615 ]--- When we change the system time to a low value like this, the value of timekeeper->offs_real will be a negative value. It seems that the WARN occurs because an hrtimer has been started in the time between the releasing of the timekeeper lock and the IPI call (via a call to on_each_cpu) in clock_was_set() in the do_settimeofday() code. The end result is that a REALTIME_CLOCK timer has been added with softexpires = expires = KTIME_MAX. The hrtimer_interrupt() fires/is called and the loop at kernel/hrtimer.c:1289 is executed. In this loop the code subtracts the clock base's offset (which was set to timekeeper->offs_real in do_settimeofday()) from the current hrtimer_cpu_base->expiry value (which was KTIME_MAX): KTIME_MAX - (a negative value) = overflow A simple check for an overflow can resolve this problem. Using KTIME_MAX instead of the overflow value will result in the hrtimer function being run, and the reprogramming of the timer after that. Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Prarit Bhargava <prarit@redhat.com> [jstultz: Tweaked commit subject] Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * hrtimer: Fix ktime_add_ns() overflow on 32bit architecturesDavid Engraf2013-05-071-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 51fd36f3fad8447c487137ae26b9d0b3ce77bb25 upstream. One can trigger an overflow when using ktime_add_ns() on a 32bit architecture not supporting CONFIG_KTIME_SCALAR. When passing a very high value for u64 nsec, e.g. 7881299347898368000 the do_div() function converts this value to seconds (7881299347) which is still to high to pass to the ktime_set() function as long. The result in is a negative value. The problem on my system occurs in the tick-sched.c, tick_nohz_stop_sched_tick() when time_delta is set to timekeeping_max_deferment(). The check for time_delta < KTIME_MAX is valid, thus ktime_add_ns() is called with a too large value resulting in a negative expire value. This leads to an endless loop in the ticker code: time_delta: 7881299347898368000 expires = ktime_add_ns(last_update, time_delta) expires: negative value This fix caps the value to KTIME_MAX. This error doesn't occurs on 64bit or architectures supporting CONFIG_KTIME_SCALAR (e.g. ARM, x86-32). Signed-off-by: David Engraf <david.engraf@sysgo.com> [jstultz: Minor tweaks to commit message & header] Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * hrtimer: Don't reinitialize a cpu_base lock on CPU_UPMichael Bohan2013-04-251-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 84cc8fd2fe65866e49d70b38b3fdf7219dd92fe0 upstream. The current code makes the assumption that a cpu_base lock won't be held if the CPU corresponding to that cpu_base is offline, which isn't always true. If a hrtimer is not queued, then it will not be migrated by migrate_hrtimers() when a CPU is offlined. Therefore, the hrtimer's cpu_base may still point to a CPU which has subsequently gone offline if the timer wasn't enqueued at the time the CPU went down. Normally this wouldn't be a problem, but a cpu_base's lock is blindly reinitialized each time a CPU is brought up. If a CPU is brought online during the period that another thread is performing a hrtimer operation on a stale hrtimer, then the lock will be reinitialized under its feet, and a SPIN_BUG() like the following will be observed: <0>[ 28.082085] BUG: spinlock already unlocked on CPU#0, swapper/0/0 <0>[ 28.087078] lock: 0xc4780b40, value 0x0 .magic: dead4ead, .owner: <none>/-1, .owner_cpu: -1 <4>[ 42.451150] [<c0014398>] (unwind_backtrace+0x0/0x120) from [<c0269220>] (do_raw_spin_unlock+0x44/0xdc) <4>[ 42.460430] [<c0269220>] (do_raw_spin_unlock+0x44/0xdc) from [<c071b5bc>] (_raw_spin_unlock+0x8/0x30) <4>[ 42.469632] [<c071b5bc>] (_raw_spin_unlock+0x8/0x30) from [<c00a9ce0>] (__hrtimer_start_range_ns+0x1e4/0x4f8) <4>[ 42.479521] [<c00a9ce0>] (__hrtimer_start_range_ns+0x1e4/0x4f8) from [<c00aa014>] (hrtimer_start+0x20/0x28) <4>[ 42.489247] [<c00aa014>] (hrtimer_start+0x20/0x28) from [<c00e6190>] (rcu_idle_enter_common+0x1ac/0x320) <4>[ 42.498709] [<c00e6190>] (rcu_idle_enter_common+0x1ac/0x320) from [<c00e6440>] (rcu_idle_enter+0xa0/0xb8) <4>[ 42.508259] [<c00e6440>] (rcu_idle_enter+0xa0/0xb8) from [<c000f268>] (cpu_idle+0x24/0xf0) <4>[ 42.516503] [<c000f268>] (cpu_idle+0x24/0xf0) from [<c06ed3c0>] (rest_init+0x88/0xa0) <4>[ 42.524319] [<c06ed3c0>] (rest_init+0x88/0xa0) from [<c0c00978>] (start_kernel+0x3d0/0x434) As an example, this particular crash occurred when hrtimer_start() was executed on CPU #0. The code locked the hrtimer's current cpu_base corresponding to CPU #1. CPU #0 then tried to switch the hrtimer's cpu_base to an optimal CPU which was online. In this case, it selected the cpu_base corresponding to CPU #3. Before it could proceed, CPU #1 came online and reinitialized the spinlock corresponding to its cpu_base. Thus now CPU #0 held a lock which was reinitialized. When CPU #0 finally ended up unlocking the old cpu_base corresponding to CPU #1 so that it could switch to CPU #3, we hit this SPIN_BUG() above while in switch_hrtimer_base(). CPU #0 CPU #1 ---- ---- ... <offline> hrtimer_start() lock_hrtimer_base(base #1) ... init_hrtimers_cpu() switch_hrtimer_base() ... ... raw_spin_lock_init(&cpu_base->lock) raw_spin_unlock(&cpu_base->lock) ... <spin_bug> Solve this by statically initializing the lock. Signed-off-by: Michael Bohan <mbohan@codeaurora.org> Link: http://lkml.kernel.org/r/1363745965-23475-1-git-send-email-mbohan@codeaurora.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * hrtimer: Prevent hrtimer_enqueue_reprogram raceLeonid Shatz2013-02-281-18/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit b22affe0aef429d657bc6505aacb1c569340ddd2 upstream. hrtimer_enqueue_reprogram contains a race which could result in timer.base switch during unlock/lock sequence. hrtimer_enqueue_reprogram is releasing the lock protecting the timer base for calling raise_softirq_irqsoff() due to a lock ordering issue versus rq->lock. If during that time another CPU calls __hrtimer_start_range_ns() on the same hrtimer, the timer base might switch, before the current CPU can lock base->lock again and therefor the unlock_timer_base() call will unlock the wrong lock. [ tglx: Added comment and massaged changelog ] Signed-off-by: Leonid Shatz <leonid.shatz@ravellosystems.com> Signed-off-by: Izik Eidus <izik.eidus@ravellosystems.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Link: http://lkml.kernel.org/r/1359981217-389-1-git-send-email-izik.eidus@ravellosystems.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | Merge remote-tracking branch 'kernelorg/linux-3.0.y' into 3_0_64Andrew Dodd2013-02-271-16/+37
|\ \ | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: arch/arm/Kconfig arch/arm/include/asm/hwcap.h arch/arm/kernel/smp.c arch/arm/plat-samsung/adc.c drivers/gpu/drm/i915/i915_reg.h drivers/gpu/drm/i915/intel_drv.h drivers/mmc/core/sd.c drivers/net/tun.c drivers/net/usb/usbnet.c drivers/regulator/max8997.c drivers/usb/core/hub.c drivers/usb/host/xhci.h drivers/usb/serial/qcserial.c fs/jbd2/transaction.c include/linux/migrate.h kernel/sys.c kernel/time/timekeeping.c lib/genalloc.c mm/memory-failure.c mm/memory_hotplug.c mm/mempolicy.c mm/page_alloc.c mm/vmalloc.c mm/vmscan.c mm/vmstat.c scripts/Kbuild.include Change-Id: I91e2d85c07320c7ccfc04cf98a448e89bed6ade6
| * hrtimer: Update hrtimer base offsets each hrtimer_interruptJohn Stultz2012-07-191-14/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a backport of 5baefd6d84163443215f4a99f6a20f054ef11236 The update of the hrtimer base offsets on all cpus cannot be made atomically from the timekeeper.lock held and interrupt disabled region as smp function calls are not allowed there. clock_was_set(), which enforces the update on all cpus, is called either from preemptible process context in case of do_settimeofday() or from the softirq context when the offset modification happened in the timer interrupt itself due to a leap second. In both cases there is a race window for an hrtimer interrupt between dropping timekeeper lock, enabling interrupts and clock_was_set() issuing the updates. Any interrupt which arrives in that window will see the new time but operate on stale offsets. So we need to make sure that an hrtimer interrupt always sees a consistent state of time and offsets. ktime_get_update_offsets() allows us to get the current monotonic time and update the per cpu hrtimer base offsets from hrtimer_interrupt() to capture a consistent state of monotonic time and the offsets. The function replaces the existing ktime_get() calls in hrtimer_interrupt(). The overhead of the new function vs. ktime_get() is minimal as it just adds two store operations. This ensures that any changes to realtime or boottime offsets are noticed and stored into the per-cpu hrtimer base structures, prior to any hrtimer expiration and guarantees that timers are not expired early. Signed-off-by: John Stultz <johnstul@us.ibm.com> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Prarit Bhargava <prarit@redhat.com> Link: http://lkml.kernel.org/r/1341960205-56738-8-git-send-email-johnstul@us.ibm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * hrtimers: Move lock held region in hrtimer_interrupt()Thomas Gleixner2012-07-191-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a backport of 196951e91262fccda81147d2bcf7fdab08668b40 We need to update the base offsets from this code and we need to do that under base->lock. Move the lock held region around the ktime_get() calls. The ktime_get() calls are going to be replaced with a function which gets the time and the offsets atomically. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: John Stultz <johnstul@us.ibm.com> Link: http://lkml.kernel.org/r/1341960205-56738-6-git-send-email-johnstul@us.ibm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * hrtimer: Provide clock_was_set_delayed()John Stultz2012-07-191-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a backport of f55a6faa384304c89cfef162768e88374d3312cb clock_was_set() cannot be called from hard interrupt context because it calls on_each_cpu(). For fixing the widely reported leap seconds issue it is necessary to call it from hard interrupt context, i.e. the timer tick code, which does the timekeeping updates. Provide a new function which denotes it in the hrtimer cpu base structure of the cpu on which it is called and raise the hrtimer softirq. We then execute the clock_was_set() notificiation from softirq context in run_hrtimer_softirq(). The hrtimer softirq is rarely used, so polling the flag there is not a performance issue. [ tglx: Made it depend on CONFIG_HIGH_RES_TIMERS. We really should get rid of all this ifdeffery ASAP ] Signed-off-by: John Stultz <johnstul@us.ibm.com> Reported-by: Jan Engelhardt <jengelh@inai.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Prarit Bhargava <prarit@redhat.com> Link: http://lkml.kernel.org/r/1341960205-56738-2-git-send-email-johnstul@us.ibm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | merge opensource jb u5codeworkx2012-09-221-0/+3
|/ | | | Change-Id: I1aaec157aa196f3448eff8636134fce89a814cf2
* hrtimer: Fix extra wakeups from __remove_hrtimer()Jeff Ohlstein2011-12-091-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 27c9cd7e601632b3794e1c3344d37b86917ffb43 upstream. __remove_hrtimer() attempts to reprogram the clockevent device when the timer being removed is the next to expire. However, __remove_hrtimer() reprograms the clockevent *before* removing the timer from the timerqueue and thus when hrtimer_force_reprogram() finds the next timer to expire it finds the timer we're trying to remove. This is especially noticeable when the system switches to NOHz mode and the system tick is removed. The timer tick is removed from the system but the clockevent is programmed to wakeup in another HZ anyway. Silence the extra wakeup by removing the timer from the timerqueue before calling hrtimer_force_reprogram() so that we actually program the clockevent for the next timer to expire. This was broken by 998adc3 "hrtimers: Convert hrtimers to use timerlist infrastructure". Signed-off-by: Jeff Ohlstein <johlstei@codeaurora.org> Link: http://lkml.kernel.org/r/1321660030-8520-1-git-send-email-johlstei@codeaurora.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* hrtimers: Fix typo causing erratic timersThomas Gleixner2011-05-251-1/+1
| | | | | | | | | | | | | | | | | commit 9ec2690758a5 ("timerfd: Manage cancelable timers in timerfd") introduced a CONFIG_HIGHRES_TIMERS (should be CONFIG_HIGH_RES_TIMERS) typo, which caused applications depending on CLOCK_REALTIME timers to become sluggy due to the fact that the time base of the realtime timers was not updated when the wall clock time was set. This causes anything from 100% CPU use for some applications to odd delays and hickups. Reported-bisected-and-tested-by: Anca Emanuel <anca.emanuel@gmail.com> Tested-by: Linus Torvalds <torvalds@linux-foundation.org> Fatfingered-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* hrtimers: Reorder clock basesThomas Gleixner2011-05-231-6/+6
| | | | | | | | | | | | | | | | | The ordering of the clock bases is historical due to the CLOCK_REALTIME and CLOCK_MONOTONIC constants. Now the hrtimer bases have their own enumeration due to the gap between CLOCK_MONOTONIC and CLOCK_BOOTTIME. So we can be more clever as most timers end up on the CLOCK_MONOTONIC base due to the virtue of POSIX declaring that relative CLOCK_REALTIME timers are not affected by time changes. In desktop environments this is slowly changing as applications switch to absolute timers, but I've observed empty CLOCK_REALTIME bases often enough. There is no performance penalty or overhead when CLOCK_REALTIME timers are active, but in case they are not we don't skip over a full cache line. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Peter Zijlstra <peterz@infradead.org>
* hrtimers: Avoid touching inactive timer basesThomas Gleixner2011-05-231-11/+18
| | | | | | | | Instead of iterating over all possible timer bases avoid it by marking the active bases in the cpu base. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Peter Zijlstra <peterz@infradead.org>
* timerfd: Manage cancelable timers in timerfdThomas Gleixner2011-05-231-62/+32
| | | | | | | | | | | | | Peter is concerned about the extra scan of CLOCK_REALTIME_COS in the timer interrupt. Yes, I did not think about it, because the solution was so elegant. I didn't like the extra list in timerfd when it was proposed some time ago, but with a rcu based list the list walk it's less horrible than the original global lock, which was held over the list iteration. Requested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Peter Zijlstra <peterz@infradead.org>
* timerfd: Allow timers to be cancelled when clock was setThomas Gleixner2011-05-021-1/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some applications must be aware of clock realtime being set backward. A simple example is a clock applet which arms a timer for the next minute display. If clock realtime is set backward then the applet displays a stale time for the amount of time which the clock was set backwards. Due to that applications poll the time because we don't have an interface. Extend the timerfd interface by adding a flag which puts the timer onto a different internal realtime clock. All timers on this clock are expired whenever the clock was set. The timerfd core records the monotonic offset when the timer is created. When the timer is armed, then the current offset is compared to the previous recorded offset. When it has changed, then timerfd_settime returns -ECANCELED. When a timer is read the offset is compared and if it changed -ECANCELED returned to user space. Periodic timers are not rearmed in the cancelation case. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Chris Friesen <chris.friesen@genband.com> Tested-by: Kay Sievers <kay.sievers@vrfy.org> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Davide Libenzi <davidel@xmailserver.org> Reviewed-by: Alexander Shishkin <virtuoso@slind.org> Link: http://lkml.kernel.org/r/%3Calpine.LFD.2.02.1104271359580.3323%40ionos%3E Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* hrtimers: Prepare for cancel on clock was set timersThomas Gleixner2011-05-021-64/+61
| | | | | | | | Make clock_was_set() unconditional and rename hres_timers_resume to hrtimers_resume. This is a preparatory patch for hrtimers which are cancelled when clock realtime was set. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* hrtimer: Make lookup table constMike Frysinger2011-05-021-1/+1
| | | | | | Signed-off-by: Mike Frysinger <vapier@gentoo.org> Link: http://lkml.kernel.org/r/%3C1304364267-14489-1-git-send-email-vapier%40gentoo.org%3E Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* hrtimer: Initialize CLOCK_ID to HRTIMER_BASE table staticallyThomas Gleixner2011-04-291-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sedat and Bruno reported RCU stalls which turned out to be caused by the following; sched_init() calls init_rt_bandwidth() which calls hrtimer_init() _BEFORE_ hrtimers_init() is called. While not entirely correct this worked because hrtimer_init() only accessed statically initialized data (hrtimer_bases.clock_base[CLOCK_MONOTONIC]) Commit e06383db9 (hrtimers: extend hrtimer base code to handle more then 2 clockids) added an indirection to the hrtimer_bases.clock_base lookup to avoid gap handling in the hot path. The table which is used for the translataion from CLOCK_ID to HRTIMER_BASE index is initialized at runtime in hrtimers_init(). So the early call of the scheduler code translates CLOCK_MONOTONIC to HRTIMER_BASE_REALTIME. Thus the rt_bandwith timer ends up on CLOCK_REALTIME. If the timer is armed and the wall clock time is set (e.g. ntpdate in the early boot process - which also gives the problem deterministic behaviour i.e. magic recovery after N hours), then the timer ends up with an expiry time far into the future. That breaks the RT throttler mechanism as rt runtime is accumulated and never cleared, so the rt throttler detects a false cpu hog condition and blocks all RT tasks until the timer finally expires. That in turn stalls the RCU thread of TINYRCU which leads to an huge amount of RCU callbacks piling up. Make the translation table statically initialized, so we are back to the status of <= 2.6.39. Reported-and-tested-by: Sedat Dilek <sedat.dilek@gmail.com> Reported-by: Bruno Prémont <bonbons@linux-vserver.org> Cc: John stultz <johnstul@us.ibm.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/%3Calpine.LFD.2.02.1104282353140.3005%40ionos%3E Reviewed-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* Merge branch 'timers-core-for-linus' of ↵Linus Torvalds2011-03-151-39/+45
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (62 commits) posix-clocks: Check write permissions in posix syscalls hrtimer: Remove empty hrtimer_init_hres_timer() hrtimer: Update hrtimer->state documentation hrtimer: Update base[CLOCK_BOOTTIME].offset correctly timers: Export CLOCK_BOOTTIME via the posix timers interface timers: Add CLOCK_BOOTTIME hrtimer base time: Extend get_xtime_and_monotonic_offset() to also return sleep time: Introduce get_monotonic_boottime and ktime_get_boottime hrtimers: extend hrtimer base code to handle more then 2 clockids ntp: Remove redundant and incorrect parameter check mn10300: Switch do_timer() to xtimer_update() posix clocks: Introduce dynamic clocks posix-timers: Cleanup namespace posix-timers: Add support for fd based clocks x86: Add clock_adjtime for x86 posix-timers: Introduce a syscall for clock tuning. time: Splitout compat timex accessors ntp: Add ADJ_SETOFFSET mode bit time: Introduce timekeeping_inject_offset posix-timer: Update comment ... Fix up new system-call-related conflicts in arch/x86/ia32/ia32entry.S arch/x86/include/asm/unistd_32.h arch/x86/include/asm/unistd_64.h arch/x86/kernel/syscall_table_32.S (name_to_handle_at()/open_by_handle_at() vs clock_adjtime()), and some due to movement of get_jiffies_64() in: kernel/time.c
| * hrtimer: Remove empty hrtimer_init_hres_timer()Thomas Gleixner2011-03-101-10/+0
| | | | | | | | | | | | Leftover from earlier implementation. All empty, remove it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * hrtimer: Update base[CLOCK_BOOTTIME].offset correctlyThomas Gleixner2011-03-021-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | We calculate the current time of each clock base by adding an offset to clock_monotonic. The offset for the clock bases is set in retrigger_next_event() which is called when we switch a cpu to highres mode or when the clock was set. Add the missing update for clock boottime. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <johnstul@us.ibm.com>
| * timers: Add CLOCK_BOOTTIME hrtimer baseJohn Stultz2011-02-211-4/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CLOCK_MONOTONIC stops while the system is in suspend. This is because to applications system suspend is invisible. However, there is a growing set of applications that are wanting to be suspend-aware, but do not want to deal with the complications of CLOCK_REALTIME (which might jump around if settimeofday is called). For these applications, I propose a new clockid: CLOCK_BOOTTIME. CLOCK_BOOTTIME is idential to CLOCK_MONOTONIC, except it also includes any time spent in suspend. This patch add hrtimer base for CLOCK_BOOTTIME, using get_monotonic_boottime/ktime_get_boottime, to allow in kernel users to set timers against. CC: Jamie Lokier <jamie@shareable.org> CC: Thomas Gleixner <tglx@linutronix.de> CC: Alexander Shishkin <virtuoso@slind.org> CC: Arve Hjønnevåg <arve@android.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
| * time: Extend get_xtime_and_monotonic_offset() to also return sleepJohn Stultz2011-02-211-4/+5
| | | | | | | | | | | | | | | | | | | | | | Extend get_xtime_and_monotonic_offset to get_xtime_and_monotonic_and_sleep_offset(). CC: Jamie Lokier <jamie@shareable.org> CC: Thomas Gleixner <tglx@linutronix.de> CC: Alexander Shishkin <virtuoso@slind.org> CC: Arve Hjønnevåg <arve@android.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
| * hrtimers: extend hrtimer base code to handle more then 2 clockidsJohn Stultz2011-02-211-13/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The hrtimer code is written mainly with CLOCK_REALTIME and CLOCK_MONOTONIC in mind. These are clockids 0 and 1 resepctively. However, if we are to introduce any new hrtimer bases, using new clockids, we have to skip the cputimers (clockids 2,3) as well as other clockids that may not impelement timers. This patch adds a little bit of indirection between the clockid and the base, so that we can extend the base by one when we add a new clockid at number 7 or so. CC: Jamie Lokier <jamie@shareable.org> CC: Thomas Gleixner <tglx@linutronix.de> CC: Alexander Shishkin <virtuoso@slind.org> CC: Arve Hjønnevåg <arve@android.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
| * time: Provide get_xtime_and_monotonic_offset()Torben Hohn2011-01-311-11/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The hrtimer code accesses timekeeping variables under xtime_lock. Provide a sensible accessor function and use it. [ tglx: Removed the conditionals, unused variable, fixed codingstyle and massaged changelog ] Signed-off-by: Torben Hohn <torbenh@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: johnstul@us.ibm.com Cc: yong.zhang0@gmail.com Cc: hch@infradead.org LKML-Reference: <20110127145905.23248.30458.stgit@localhost> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | debugobjects: Add hint for better object identificationStanislaw Gruszka2011-03-081-0/+6
|/ | | | | | | | | | | | | | | | | | | In complex subsystems like mac80211 structures can contain several timers and work structs, so identifying a specific instance from the call trace and object type output of debugobjects can be hard. Allow the subsystems which support debugobjects to provide a hint function. This function returns a pointer to a kernel address (preferrably the objects callback function) which is printed along with the debugobjects type. Add hint methods for timer_list, work_struct and hrtimer. [ tglx: Massaged changelog, made it compile ] Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> LKML-Reference: <20110307085809.GA9334@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* Merge branch 'for-next' of ↵Linus Torvalds2011-01-131-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits) Documentation/trace/events.txt: Remove obsolete sched_signal_send. writeback: fix global_dirty_limits comment runtime -> real-time ppc: fix comment typo singal -> signal drivers: fix comment typo diable -> disable. m68k: fix comment typo diable -> disable. wireless: comment typo fix diable -> disable. media: comment typo fix diable -> disable. remove doc for obsolete dynamic-printk kernel-parameter remove extraneous 'is' from Documentation/iostats.txt Fix spelling milisec -> ms in snd_ps3 module parameter description Fix spelling mistakes in comments Revert conflicting V4L changes i7core_edac: fix typos in comments mm/rmap.c: fix comment sound, ca0106: Fix assignment to 'channel'. hrtimer: fix a typo in comment init/Kconfig: fix typo anon_inodes: fix wrong function name in comment fix comment typos concerning "consistent" poll: fix a typo in comment ... Fix up trivial conflicts in: - drivers/net/wireless/iwlwifi/iwl-core.c (moved to iwl-legacy.c) - fs/ext4/ext4.h Also fix missed 'diabled' typo in drivers/net/bnx2x/bnx2x.h while at it.
| * hrtimer: fix a typo in commentNamhyung Kim2010-12-221-1/+1
| | | | | | | | | | Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
* | Merge branch 'for-2.6.38' of ↵Linus Torvalds2011-01-071-1/+1
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (30 commits) gameport: use this_cpu_read instead of lookup x86: udelay: Use this_cpu_read to avoid address calculation x86: Use this_cpu_inc_return for nmi counter x86: Replace uses of current_cpu_data with this_cpu ops x86: Use this_cpu_ops to optimize code vmstat: User per cpu atomics to avoid interrupt disable / enable irq_work: Use per cpu atomics instead of regular atomics cpuops: Use cmpxchg for xchg to avoid lock semantics x86: this_cpu_cmpxchg and this_cpu_xchg operations percpu: Generic this_cpu_cmpxchg() and this_cpu_xchg support percpu,x86: relocate this_cpu_add_return() and friends connector: Use this_cpu operations xen: Use this_cpu_inc_return taskstats: Use this_cpu_ops random: Use this_cpu_inc_return fs: Use this_cpu_inc_return in buffer.c highmem: Use this_cpu_xx_return() operations vmstat: Use this_cpu_inc_return for vm statistics x86: Support for this_cpu_add, sub, dec, inc_return percpu: Generic support for this_cpu_add, sub, dec, inc_return ... Fixed up conflicts: in arch/x86/kernel/{apic/nmi.c, apic/x2apic_uv_x.c, process.c} as per Tejun.
| * | core: Replace __get_cpu_var with __this_cpu_read if not used for an address.Christoph Lameter2010-12-171-1/+1
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | __get_cpu_var() can be replaced with this_cpu_read and will then use a single read instruction with implied address calculation to access the correct per cpu instance. However, the address of a per cpu variable passed to __this_cpu_read() cannot be determined (since it's an implied address conversion through segment prefixes). Therefore apply this only to uses of __get_cpu_var where the address of the variable is not used. Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Hugh Dickins <hughd@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* | Merge branch 'timers-for-linus' of ↵Linus Torvalds2011-01-061-56/+27
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: MAINTAINERS: Update timer related entries timers: Use this_cpu_read timerqueue: Make timerqueue_getnext() static inline hrtimer: fix timerqueue conversion flub hrtimers: Convert hrtimers to use timerlist infrastructure timers: Fixup allmodconfig build issue timers: Rename timerlist infrastructure to timerqueue timers: Introduce timerlist infrastructure. hrtimer: Remove stale comment on curr_timer timer: Warn when del_timer_sync() is called in hardirq context timer: Del_timer_sync() can be used in softirq context timer: Make try_to_del_timer_sync() the same on SMP and UP posix-timers: Annotate lock_timer() timer: Permit statically-declared work with deferrable timers time: Use ARRAY_SIZE macro in timecompare.c timer: Initialize the field slack of timer_list timer_list: Remove alignment padding on 64 bit when CONFIG_TIMER_STATS time: Compensate for rounding on odd-frequency clocksources Fix up trivial conflict in MAINTAINERS
| * hrtimer: fix timerqueue conversion flubJohn Stultz2010-12-101-5/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | In converting the hrtimers to timerqueue, I missed a spot in hrtimer_run_queues where we loop running timers. We end up not pulling the new next value out and instead just use the last next value, causing boot time hangs in some cases. The proper fix is to pull timerqueue_getnext each iteration instead of using a local next value. Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: John Stultz <john.stultz@linaro.org>
| * hrtimers: Convert hrtimers to use timerlist infrastructureJohn Stultz2010-12-101-56/+30
| | | | | | | | | | | | | | | | | | | | | | Converts the hrtimer code to use the new timerlist infrastructure Signed-off-by: John Stultz <john.stultz@linaro.org> LKML Reference: <1290136329-18291-3-git-send-email-john.stultz@linaro.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> CC: Alessandro Zummo <a.zummo@towertech.it> CC: Thomas Gleixner <tglx@linutronix.de> CC: Richard Cochran <richardcochran@gmail.com>
* | hrtimer: Preserve timer state in remove_hrtimer()Salman Qazi2010-10-141-2/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The race is described as follows: CPU X CPU Y remove_hrtimer // state & QUEUED == 0 timer->state = CALLBACK unlock timer base timer->f(n) //very long hrtimer_start lock timer base remove_hrtimer // no effect hrtimer_enqueue timer->state = CALLBACK | QUEUED unlock timer base hrtimer_start lock timer base remove_hrtimer mode = INACTIVE // CALLBACK bit lost! switch_hrtimer_base CALLBACK bit not set: timer->base changes to a different CPU. lock this CPU's timer base The bug was introduced with commit ca109491f (hrtimer: removing all ur callback modes) in 2.6.29 [ tglx: Feed new state via local variable and add a comment. ] Signed-off-by: Salman Qazi <sqazi@google.com> Cc: akpm@linux-foundation.org Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <20101012142351.8485.21823.stgit@dungbeetle.mtv.corp.google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@kernel.org
* | gcc-4.6: kernel/*: Fix unused but set warningsAndi Kleen2010-09-051-2/+1
|/ | | | | | | | | | No real bugs I believe, just some dead code. Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: andi@firstfloor.org Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Merge branch 'timers-timekeeping-for-linus' of ↵Linus Torvalds2010-08-061-5/+4
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-timekeeping-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: um: Fix read_persistent_clock fallout kgdb: Do not access xtime directly powerpc: Clean up obsolete code relating to decrementer and timebase powerpc: Rework VDSO gettimeofday to prevent time going backwards clocksource: Add __clocksource_updatefreq_hz/khz methods x86: Convert common clocksources to use clocksource_register_hz/khz timekeeping: Make xtime and wall_to_monotonic static hrtimer: Cleanup direct access to wall_to_monotonic um: Convert to use read_persistent_clock timkeeping: Fix update_vsyscall to provide wall_to_monotonic offset powerpc: Cleanup xtime usage powerpc: Simplify update_vsyscall time: Kill off CONFIG_GENERIC_TIME time: Implement timespec_add x86: Fix vtime/file timestamp inconsistencies Trivial conflicts in Documentation/feature-removal-schedule.txt Much less trivial conflicts in arch/powerpc/kernel/time.c resolved as per Thomas' earlier merge commit 47916be4e28c ("Merge branch 'powerpc.cherry-picks' into timers/clocksource")
| * hrtimer: Cleanup direct access to wall_to_monotonicJohn Stultz2010-07-271-5/+4
| | | | | | | | | | | | | | | | | | | | | | | | Provides an accessor function to replace hrtimer.c's direct access of wall_to_monotonic. This will allow wall_to_monotonic to be made static as planned in Documentation/feature-removal-schedule.txt Signed-off-by: John Stultz <johnstul@us.ibm.com> LKML-Reference: <1279068988-21864-9-git-send-email-johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | sched: Change nohz idle load balancing logic to push modelVenkatesh Pallipadi2010-06-091-6/+2
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the new push model, all idle CPUs indeed go into nohz mode. There is still the concept of idle load balancer (performing the load balancing on behalf of all the idle cpu's in the system). Busy CPU kicks the nohz balancer when any of the nohz CPUs need idle load balancing. The kickee CPU does the idle load balancing on behalf of all idle CPUs instead of the normal idle balance. This addresses the below two problems with the current nohz ilb logic: * the idle load balancer continued to have periodic ticks during idle and wokeup frequently, even though it did not have any rebalancing to do on behalf of any of the idle CPUs. * On x86 and CPUs that have APIC timer stoppage on idle CPUs, this periodic wakeup can result in a periodic additional interrupt on a CPU doing the timer broadcast. Also currently we are migrating the unpinned timers from an idle to the cpu doing idle load balancing (when all the cpus in the system are idle, there is no idle load balancing cpu and timers get added to the same idle cpu where the request was made. So the existing optimization works only on semi idle system). And In semi idle system, we no longer have periodic ticks on the idle load balancer CPU. Using that cpu will add more delays to the timers than intended (as that cpu's timer base may not be uptodate wrt jiffies etc). This was causing mysterious slowdowns during boot etc. For now, in the semi idle case, use the nearest busy cpu for migrating timers from an idle cpu. This is good for power-savings anyway. Signed-off-by: Venkatesh Pallipadi <venki@google.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <1274486981.2840.46.camel@sbs-t61.sc.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* hrtimer: Avoid double seqlockStanislaw Gruszka2010-05-261-1/+1
| | | | | | | | | | | hrtimer_get_softirq_time() has it's own xtime lock protection, so it's safe to use plain __current_kernel_time() and avoid the double seqlock loop. Signed-off-by: Stanislaw Gruszka <stf_xl@wp.pl> LKML-Reference: <20100525214912.GA1934@r2bh72.net.upc.cz> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* hrtimers: Provide schedule_hrtimeout for CLOCK_REALTIMECarsten Emde2010-04-061-26/+41
| | | | | | | | | | | | | | | | | | The current version of schedule_hrtimeout() always uses the monotonic clock. Some system calls such as mq_timedsend() and mq_timedreceive(), however, require the use of the wall clock due to the definition of the system call. This patch provides the infrastructure to use schedule_hrtimeout() with a CLOCK_REALTIME timer. Signed-off-by: Carsten Emde <C.Emde@osadl.org> Tested-by: Pradyumna Sampath <pradysam@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arjan van de Veen <arjan@infradead.org> LKML-Reference: <20100402204331.167439615@osadl.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* hrtimers: Convert to raw_spinlocksThomas Gleixner2009-12-141-25/+25
| | | | | | | | | Convert locks which cannot be sleeping locks in preempt-rt to raw_spinlocks. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Ingo Molnar <mingo@elte.hu>
* hrtimer: move timer stats helper functions to hrtimer.cHeiko Carstens2009-12-101-4/+20
| | | | | | | | | | | | There is no reason to make timer_stats_hrtimer_set_start_info and friends visible to the rest of the kernel. So move all of them to hrtimer.c. Also make timer_stats_hrtimer_set_start_info a static inline function so it gets inlined and we avoid another function call. Based on a patch by Thomas Gleixner. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> LKML-Reference: <20091210095629.GC4144@osiris.boeblingen.de.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* hrtimer: Tune hrtimer_interrupt hang logicThomas Gleixner2009-12-101-40/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The hrtimer_interrupt hang logic adjusts min_delta_ns based on the execution time of the hrtimer callbacks. This is error-prone for virtual machines, where a guest vcpu can be scheduled out during the execution of the callbacks (and the callbacks themselves can do operations that translate to blocking operations in the hypervisor), which in can lead to large min_delta_ns rendering the system unusable. Replace the current heuristics with something more reliable. Allow the interrupt code to try 3 times to catch up with the lost time. If that fails use the total time spent in the interrupt handler to defer the next timer interrupt so the system can catch up with other things which got delayed. Limit that deferment to 100ms. The retry events and the maximum time spent in the interrupt handler are recorded and exposed via /proc/timer_list Inspired by a patch from Marcelo. Reported-by: Michael Tokarev <mjt@tls.msk.ru> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marcelo Tosatti <mtosatti@redhat.com> Cc: kvm@vger.kernel.org
* Merge branch 'timers-for-linus' of ↵Linus Torvalds2009-12-081-1/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: timers, init: Limit the number of per cpu calibration bootup messages posix-cpu-timers: optimize and document timer_create callback clockevents: Add missing include to pacify sparse x86: vmiclock: Fix printk format x86: Fix printk format due to variable type change sparc: fix printk for change of variable type clocksource/events: Fix fallout of generic code changes nohz: Allow 32-bit machines to sleep for more than 2.15 seconds nohz: Track last do_timer() cpu nohz: Prevent clocksource wrapping during idle nohz: Type cast printk argument mips: Use generic mult/shift factor calculation for clocks clocksource: Provide a generic mult/shift factor calculation clockevents: Use u32 for mult and shift factors nohz: Introduce arch_needs_cpu nohz: Reuse ktime in sub-functions of tick_check_idle. time: Remove xtime_cache time: Implement logarithmic time accumulation
| * nohz: Allow 32-bit machines to sleep for more than 2.15 secondsJon Hunter2009-11-131-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the dynamic tick code, "max_delta_ns" (member of the "clock_event_device" structure) represents the maximum sleep time that can occur between timer events in nanoseconds. The variable, "max_delta_ns", is defined as an unsigned long which is a 32-bit integer for 32-bit machines and a 64-bit integer for 64-bit machines (if -m64 option is used for gcc). The value of max_delta_ns is set by calling the function "clockevent_delta2ns()" which returns a maximum value of LONG_MAX. For a 32-bit machine LONG_MAX is equal to 0x7fffffff and in nanoseconds this equates to ~2.15 seconds. Hence, the maximum sleep time for a 32-bit machine is ~2.15 seconds, where as for a 64-bit machine it will be many years. This patch changes the type of max_delta_ns to be "u64" instead of "unsigned long" so that this variable is a 64-bit type for both 32-bit and 64-bit machines. It also changes the maximum value returned by clockevent_delta2ns() to KTIME_MAX. Hence this allows a 32-bit machine to sleep for longer than ~2.15 seconds. Please note that this patch also changes "min_delta_ns" to be "u64" too and although this is unnecessary, it makes the patch simpler as it avoids to fixup all callers of clockevent_delta2ns(). [ tglx: changed "unsigned long long" to u64 as we use this data type through out the time code ] Signed-off-by: Jon Hunter <jon-hunter@ti.com> Cc: John Stultz <johnstul@us.ibm.com> LKML-Reference: <1250617512-23567-3-git-send-email-jon-hunter@ti.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | Merge branch 'timers-fixes-for-linus' of ↵Linus Torvalds2009-10-051-2/+0
|\ \ | |/ |/| | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: hrtimer: Remove overly verbose "switch to high res mode" message
| * hrtimer: Remove overly verbose "switch to high res mode" messageRoland Dreier2009-09-261-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | On big systems, printing <number of CPUs> copies of Switched to high resolution mode on CPU nnn clutters up the kernel log for minimal gain. Just get rid of them. Signed-off-by: Roland Dreier <rolandd@cisco.com> LKML-Reference: <ada1vlw126s.fsf_-_@cisco.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | Merge branch 'timers-for-linus' of ↵Linus Torvalds2009-09-271-18/+35
|\ \ | |/ |/| | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: hrtimer: Eliminate needless reprogramming of clock events device