aboutsummaryrefslogtreecommitdiffstats
path: root/drivers
Commit message (Collapse)AuthorAgeFilesLines
* sg_write()/bsg_write() is not fit to be called under KERNEL_DSAl Viro2017-01-081-0/+3
| | | | | | | | | | | Both damn things interpret userland pointers embedded into the payload; worse, they are actually traversing those. Leaving aside the bad API design, this is very much _not_ safe to call with KERNEL_DS. Bail out early if that happens. Change-Id: I0d2f3b1ed4e763c559ecec98af32767360985e91 Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* drivers: video: Add bounds checking in fb_cmap_to_userSteve Pfetsch2016-12-131-1/+1
| | | | | | | | Verify that unsigned int value will not become negative before cast to signed int. Bug: 31651010 Change-Id: I548a200f678762042617f11100b6966a405a3920
* fbcmap: Remove unnecessary condition checkPing Li2016-12-131-2/+2
| | | | | | | | In fb_set_user_cmap function, cmap->start variable is an unsigned integer, it doesn't need a condition check with the sign. Change-Id: I355ddb7edcc085ee52e4054833b687640376eee3 Signed-off-by: Ping Li <pingli@codeaurora.org>
* fbmem: Check failure of FBIOPUTCMAP ioctlPawan Kumar2016-12-132-9/+25
| | | | | | | | | On FBIOPUTCMAP ioctl failure deallocate fb cmap. Put null check for cmap red, green, blue component. Change-Id: I10468ee30d0e76c256cf3d7a6ffe14db7fd4511b Signed-off-by: Pawan Kumar <pavaku@codeaurora.org>
* smdk4412-kernel : do not ignore COB wifi chipsRGIB2016-12-091-0/+6
| | | | Change-Id: Ic3edddb6d0909df89a0d9551394891fa1fac707a
* HID: core: prevent out-of-bound readingsBenjamin Tissoires2016-11-111-0/+3
| | | | | | | | | | | | | Plugging a Logitech DJ receiver with KASAN activated raises a bunch of out-of-bound readings. The fields are allocated up to MAX_USAGE, meaning that potentially, we do not have enough fields to fit the incoming values. Add checks and silence KASAN. Change-Id: I11d44957b450a3eda258c05f9e833c71a079e83c Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
* BACKPORT: tty: Prevent ldisc drivers from re-using stale tty fieldsPeter Hurley2016-11-111-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (cherry picked from commit dd42bf1197144ede075a9d4793123f7689e164bc) Line discipline drivers may mistakenly misuse ldisc-related fields when initializing. For example, a failure to initialize tty->receive_room in the N_GIGASET_M101 line discipline was recently found and fixed [1]. Now, the N_X25 line discipline has been discovered accessing the previous line discipline's already-freed private data [2]. Harden the ldisc interface against misuse by initializing revelant tty fields before instancing the new line discipline. [1] commit fd98e9419d8d622a4de91f76b306af6aa627aa9c Author: Tilman Schmidt <tilman@imap.cc> Date: Tue Jul 14 00:37:13 2015 +0200 isdn/gigaset: reset tty->receive_room when attaching ser_gigaset [2] Report from Sasha Levin <sasha.levin@oracle.com> [ 634.336761] ================================================================== [ 634.338226] BUG: KASAN: use-after-free in x25_asy_open_tty+0x13d/0x490 at addr ffff8800a743efd0 [ 634.339558] Read of size 4 by task syzkaller_execu/8981 [ 634.340359] ============================================================================= [ 634.341598] BUG kmalloc-512 (Not tainted): kasan: bad access detected ... [ 634.405018] Call Trace: [ 634.405277] dump_stack (lib/dump_stack.c:52) [ 634.405775] print_trailer (mm/slub.c:655) [ 634.406361] object_err (mm/slub.c:662) [ 634.406824] kasan_report_error (mm/kasan/report.c:138 mm/kasan/report.c:236) [ 634.409581] __asan_report_load4_noabort (mm/kasan/report.c:279) [ 634.411355] x25_asy_open_tty (drivers/net/wan/x25_asy.c:559 (discriminator 1)) [ 634.413997] tty_ldisc_open.isra.2 (drivers/tty/tty_ldisc.c:447) [ 634.414549] tty_set_ldisc (drivers/tty/tty_ldisc.c:567) [ 634.415057] tty_ioctl (drivers/tty/tty_io.c:2646 drivers/tty/tty_io.c:2879) [ 634.423524] do_vfs_ioctl (fs/ioctl.c:43 fs/ioctl.c:607) [ 634.427491] SyS_ioctl (fs/ioctl.c:622 fs/ioctl.c:613) [ 634.427945] entry_SYSCALL_64_fastpath (arch/x86/entry/entry_64.S:188) Cc: Tilman Schmidt <tilman@imap.cc> Cc: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Change-Id: Ibed6feadfb9706d478f93feec3b240aecfc64af3 Bug: 30951112
* sg: Fix double-free when drives detach during SG_IOCalvin Owens2016-11-111-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In sg_common_write(), we free the block request and return -ENODEV if the device is detached in the middle of the SG_IO ioctl(). Unfortunately, sg_finish_rem_req() also tries to free srp->rq, so we end up freeing rq->cmd in the already free rq object, and then free the object itself out from under the current user. This ends up corrupting random memory via the list_head on the rq object. The most common crash trace I saw is this: ------------[ cut here ]------------ kernel BUG at block/blk-core.c:1420! Call Trace: [<ffffffff81281eab>] blk_put_request+0x5b/0x80 [<ffffffffa0069e5b>] sg_finish_rem_req+0x6b/0x120 [sg] [<ffffffffa006bcb9>] sg_common_write.isra.14+0x459/0x5a0 [sg] [<ffffffff8125b328>] ? selinux_file_alloc_security+0x48/0x70 [<ffffffffa006bf95>] sg_new_write.isra.17+0x195/0x2d0 [sg] [<ffffffffa006cef4>] sg_ioctl+0x644/0xdb0 [sg] [<ffffffff81170f80>] do_vfs_ioctl+0x90/0x520 [<ffffffff81258967>] ? file_has_perm+0x97/0xb0 [<ffffffff811714a1>] SyS_ioctl+0x91/0xb0 [<ffffffff81602afb>] tracesys+0xdd/0xe2 RIP [<ffffffff81281e04>] __blk_put_request+0x154/0x1a0 The solution is straightforward: just set srp->rq to NULL in the failure branch so that sg_finish_rem_req() doesn't attempt to re-free it. Additionally, since sg_rq_end_io() will never be called on the object when this happens, we need to free memory backing ->cmd if it isn't embedded in the object itself. KASAN was extremely helpful in finding the root cause of this bug. Change-Id: I8c2389a4e2e1b5f753a47f8af60502a761b891b5 Signed-off-by: Calvin Owens <calvinowens@fb.com> Acked-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* binder: prevent kptr leak by using %pK format specifierNick Desaulniers2016-10-191-1/+1
| | | | | | | Works in conjunction with kptr_restrict. Bug: 30143283 Change-Id: I2b3ce22f4e206e74614d51453a1d59b7080ab05a
* power: max17042_battery: Set type to UNKNOWNZhao Wei Liew2016-08-171-1/+1
| | | | | | | | | | | This is a fuelgauge driver, not an actual battery driver. Setting its type to 'Battery' will confuse healthd, causing healthd to pick this driver instead of the actual battery driver for reading battery stats. Issue-Id: NIGHTLIES-3279 Change-Id: Ia45e74599d391a90cb526aa07a2525b64c3eec96
* motor: max77693: expose min/max/default/threshold in sysfsSimon Shields2016-08-021-0/+58
| | | | | | based off a similar patch for klte by Kevin Haggerty Change-Id: If2b4f1f2c0310fc0a6c3fe49fd680973dce28ef5
* HID: hiddev: validate num_values for HIDIOCGUSAGES, HIDIOCSUSAGES commandsScott Bauer2016-06-301-5/+5
| | | | | | | | | | | | This patch validates the num_values parameter from userland during the HIDIOCGUSAGES and HIDIOCSUSAGES commands. Previously, if the report id was set to HID_REPORT_ID_UNKNOWN, we would fail to validate the num_values parameter leading to a heap overflow. Change-Id: I10866ee01c7ba430eab2b5cc3356c9519c7f9730 Cc: stable@vger.kernel.org Signed-off-by: Scott Bauer <sbauer@plzdonthack.me> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
* zram: fix possible use after free in zcomp_create()Luis Henriques2016-06-131-5/+7
| | | | | | | | | | | | | | | | | | | | zcomp_create() verifies the success of zcomp_strm_{multi,single}_create() through comp->stream, which can potentially be pointing to memory that was freed if these functions returned an error. While at it, replace a 'ERR_PTR(-ENOMEM)' by a more generic 'ERR_PTR(error)' as in the future zcomp_strm_{multi,siggle}_create() could return other error codes. Function documentation updated accordingly. Change-Id: I84334ce1929c8212aa70387781ef0a6b0af50fa5 Fixes: beca3ec71fe5 ("zram: add multi stream functionality") Signed-off-by: Luis Henriques <luis.henriques@canonical.com> Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Remove ENHANCED_LMK_ROUTINE added by SamsungAndreas Blaesius2016-06-131-98/+0
| | | | Change-Id: I2e26fbcd06541536258313f4f5753ca87ab46d9c
* lowmemorykiller: fixes for new oom_score_adjEmanuele2016-06-131-5/+5
| | | | | Change-Id: I34c547039d02366649206395fe3fb3f363fc900e Signed-off-by: Emanuele Scarlata <scarlataemanuele@gmail.com>
* lowmemorykiller: maintain LMK rbtree with signal->adj_nodeHong-Mei Li2016-06-131-12/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, we maintain LMK rbtree with task->adj_node. However, when handling oom_score_adj change case, we may del/add a non-leader task to the RB tree, which is not as expected. This patch we maintain the LMK rbtree with task->signal->adj_node. Since signal_struct is shared between main task and threads, we can avoid non-leader thread adding to tree. Change-Id: I3ba9e740e03ab04c25497a1cc2c870f051bd5b07 Signed-off-by: Hong-Mei Li <a21834@motorola.com> Reviewed-on: http://gerrit.mot.com/754225 SME-Granted: SME Approvals Granted SLTApproved: Slta Waiver <sltawvr@motorola.com> Tested-by: Jira Key <jirakey@motorola.com> Reviewed-by: Zhi-Ming Yuan <a14194@motorola.com> Reviewed-by: Yi-Wei Zhao <gbjc64@motorola.com> Submit-Approved: Jira Key <jirakey@motorola.com> (cherry picked from commit b40634023f9152c6232de9acb80108e0af7e4075) Signed-off-by: Abdul Salam <salamab@motorola.com> Reviewed-on: http://gerrit.mot.com/766107 Reviewed-by: Sudharsan Yettapu <sudharsan.yettapu@motorola.com> Reviewed-by: Ravikumar Vembu <raviv@motorola.com> (cherry picked from commit f3abd37ce3b4d36ae05cfc1c5cd10e5a3f584e7f) Reviewed-on: http://gerrit.mot.com/768302
* drivers:lmk: Fix null pointer issueHong-Mei Li2016-06-131-1/+1
| | | | | | | | | | | | | | | | On some race, the tsk that lmk is using may be deleted from the RB tree by other thread, and rb_next would return a NULL if we use this tsk to get next. For this case, we need to skip this round of shrink and wait for the next turn. Otherwise, tsk would trigger NULL pointer panic. Change-Id: I37f4bd2827f8a0a28f29192dd71532d1c252f986 Signed-off-by: Hong-Mei Li <a21834@motorola.com> Reviewed-on: http://gerrit.mot.com/729556 SLTApproved: Slta Waiver <sltawvr@motorola.com> SME-Granted: SME Approvals Granted Tested-by: Jira Key <jirakey@motorola.com> Reviewed-by: Yi-Wei Zhao <gbjc64@motorola.com> Submit-Approved: Jira Key <jirakey@motorola.com>
* drivers:lmk: Fix double delete issueHong-Mei Li2016-06-131-1/+4
| | | | | | | | | | | | | | | | | | someone may change a process's oom_score_adj by proc fs, even though the process has exited. In that case, the task was deleted from the rb tree already, and the redundant deleting would trigger rb_erase panic finally. In this patch, we make sure to clear the node after deteting and check its empty status before rb_erase. Change-Id: I7628c7d21011099e796b7d366cbc142f96bb8aab Signed-off-by: Hong-Mei Li <a21834@motorola.com> Reviewed-on: http://gerrit.mot.com/725306 SLTApproved: Slta Waiver <sltawvr@motorola.com> SME-Granted: SME Approvals Granted Tested-by: Jira Key <jirakey@motorola.com> Reviewed-by: Sheng-Zhe Zhao <a18689@motorola.com> Submit-Approved: Jira Key <jirakey@motorola.com>
* staging:android:lmk: read rb tree root with spinlockYi-wei Zhao2016-06-131-1/+3
| | | | | | | | | | | | | | there is racing condition: after reading rb tree root, it might be changed by other tasks before adding new node. it can lead to rb tree corruption. This patch is to avoid this race condition. Change-Id: Id86bfd133488ad4ee12cd83c9bf1d1c12ef5598f Signed-off-by: Yi-wei Zhao <gbjc64@motorola.com> Reviewed-on: http://gerrit.mot.com/715645 Tested-by: Jira Key <jirakey@motorola.com> Reviewed-by: Sheng-Zhe Zhao <a18689@motorola.com> SLTApproved: Christopher Fries <cfries@motorola.com> Submit-Approved: Jira Key <jirakey@motorola.com>
* staging: android: lowmemorykiller: select a new task to killYi-wei Zhao2016-06-131-0/+5
| | | | | | | | | | | | | | | | | | | | | Under certain circumstances, a process may take time to handle a SIGKILL. When lowmemkiller is called again shortly after, it would pick the same process to kill over and over, so that we cann't get free memory for long time. Solution is to check fatal_signal_pending() on the selected task, and if it's already pending, select a new task to kill. Cherry-pick 5e3358093351e5d48e21250e31896b855542f22c Reviewed-on: http://gerrit.pcs.mot.com/479831 Change-Id: I53445114451ffaba293f3c7174fb0b01ed0d34b6 Signed-off-by: Tianshui Shi <kfp634@motorola.com> Reviewed-on: http://gerrit.pcs.mot.com/505410 Tested-by: Jira Key <JIRAKEY@motorola.com> Reviewed-by: Yi-Wei Zhao <gbjc64@motorola.com> Reviewed-by: Jason Hrycay <jason.hrycay@motorola.com> Reviewed-by: Jeffrey Carlyle <jeff.carlyle@motorola.com> (cherry picked from commit da093001caf06ed2296b4f79c84cc48ce713eac6)
* staging: android: lowmemorykiller: implement task's adj rbtreeHong-Mei Li2016-06-132-0/+108
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Based on the current LMK implementation, LMK has to scan all processes to select the correct task to kill during low memory. The basic idea for the optimization is to : queue all tasks with oom_score_adj priority, and then LMK just selects the proper task from the queue(rbtree) to kill. performance improvement: the current implementation: average time to find a task to kill : 1004us the optimized implementation: average time to find a task to kill: 43us Change-Id: I4dbbdd5673314dbbdabb71c3eff0dc229ce4ea91 Signed-off-by: Hong-Mei Li <a21834@motorola.com> Reviewed-on: http://gerrit.pcs.mot.com/548917 SLT-Approved: Slta Waiver <sltawvr@motorola.com> Tested-by: Jira Key <jirakey@motorola.com> Reviewed-by: Yi-Wei Zhao <gbjc64@motorola.com> Submit-Approved: Jira Key <jirakey@motorola.com> Signed-off-by: D. Andrei Măceș <dmaces@nd.edu> Conflicts: drivers/staging/android/Kconfig drivers/staging/android/lowmemorykiller.c fs/proc/base.c mm/oom_kill.c Conflicts: drivers/staging/android/lowmemorykiller.c mm/oom_kill.c Conflicts: mm/oom_kill.c Conflicts: drivers/staging/android/lowmemorykiller.c mm/oom_kill.c
* staging: android: lowmemorykiller: fix build breakage on kernel 3.0Ziyann2016-06-131-1/+1
|
* lowmemorykiller: make default lowmemorykiller debug message usefulColin Cross2016-06-131-8/+20
| | | | | | | | | | | | | | | | | | | | | | | | lowmemorykiller debug messages are inscrutable and mostly useful for debugging the lowmemorykiller, not explaining why a process was killed. Make the messages more useful by prefixing them with "lowmemorykiller: " and explaining in more readable terms what was killed, who it was killed for, and why it was killed. The messages now look like: [ 76.997631] lowmemorykiller: Killing 'droid.gallery3d' (2172), adj 1000, [ 76.997635] to free 27436kB on behalf of 'kswapd0' (29) because [ 76.997638] cache 122624kB is below limit 122880kB for oom_score_adj 1000 [ 76.997641] Free memory is -53356kB above reserved A negative number for free memory above reserved means some of the reserved memory has been used and is being regenerated by kswapd, which is likely what called the shrinkers. Change-Id: I1fe983381e73e124b90aa5d91cb66e55eaca390f Signed-off-by: Colin Cross <ccross@android.com> Conflicts: drivers/staging/android/lowmemorykiller.c
* staging: android: lowmemorykiller: Change default debug_level to 1Arve Hjønnevåg2016-06-131-1/+1
| | | | | | | | | | The select...to kill messages are not very useful when not debugging the lowmemorykiller itself. After the change to check TIF_MEMDIE instead of using a task notifer this message can also get very noisy. Change-Id: Ice171c25801d6faa454b885a23b24b002423b754 Signed-off-by: Arve Hjønnevåg <arve@android.com>
* staging: android: lowmemorykiller: Don't count reserved free memoryArve Hjønnevåg2016-06-131-1/+2
| | | | | | | | | | | | The amount of reserved memory varies between devices. Subtract it here to reduce the amount of devices specific tuning needed for the minfree values. Change-Id: I466ae8b18f5972f6f6d8b5a7d8c4ae69660de53a Signed-off-by: Arve Hjønnevåg <arve@android.com> Conflicts: drivers/staging/android/lowmemorykiller.c
* staging: android: lowmemorykiller: Add config option to support oom_adj valuesArve Hjønnevåg2016-06-132-0/+94
| | | | | | | | | | | | | The conversion to use oom_score_adj instead of the deprecated oom_adj values breaks existing user-space code. Add a config option to convert oom_adj values written to oom_score_adj values if they appear to be valid oom_adj values. Change-Id: I68308125059b802ee2991feefb07e9703bc48549 Signed-off-by: Arve Hjønnevåg <arve@android.com> Conflicts: drivers/staging/android/Kconfig
* Staging: android: lowmemorykiller.cGreg Kroah-Hartman2016-06-131-1/+1
| | | | | | | Fix compiler warning about the type of the module parameter. Cc: San Mehat <san@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* staging: android, lowmemorykiller: convert to use oom_score_adjDavid Rientjes2016-06-131-22/+22
| | | | | | | | | | /proc/pid/oom_adj is deprecated and will be removed in August 2012 according to Documentation/feature-removal-schedule.txt. Convert its usage in the lowmemorykiller to use the new interface, oom_score_adj, instead. Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* staging: android/lowmemorykiller: Do not kill kernel threadsAnton Vorontsov2016-06-131-0/+3
| | | | | | | | | | LMK should not try killing kernel threads. Suggested-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org> Acked-by: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* staging: android/lowmemorykiller: No need for task->signal checkAnton Vorontsov2016-06-131-7/+1
| | | | | | | | | | task->signal == NULL is not possible, so no need for these checks. Suggested-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org> Acked-by: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* staging: android/lowmemorykiller: Better mm handlingAnton Vorontsov2016-06-131-7/+9
| | | | | | | | | | | | | | LMK should not directly check for task->mm. The reason is that the process' threads may exit or detach its mm via use_mm(), but other threads may still have a valid mm. To catch this we use find_lock_task_mm(), which walks up all threads and returns an appropriate task (with lock held). Suggested-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org> Acked-by: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* staging: android/lowmemorykiller: Don't grab tasklist_lockAnton Vorontsov2016-06-131-3/+4
| | | | | | | | | | | | | | | | | | | | | | Grabbing tasklist_lock has its disadvantages, i.e. it blocks process creation and destruction. If there are lots of processes, blocking doesn't sound as a great idea. For LMK, it is sufficient to surround tasks list traverse with rcu_read_{,un}lock(). >From now on using force_sig() is not safe, as it can race with an already exiting task, so we use send_sig() now. As a downside, it won't kill PID namespace init processes, but that's not what we want anyway. Suggested-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Conflicts: drivers/staging/android/lowmemorykiller.c
* Staging: android: fixed 80 characters warnings in lowmemorykiller.cMarco Navarra2016-06-131-6/+6
| | | | | | | This patch fixes some 80 chatacters limit warnings in the lowmemorykiller.c file Signed-off-by: Marco Navarra <fromenglish@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* zram: default to LZ4 compression if LZ4 is enabledZiyan2016-06-121-0/+4
| | | | Change-Id: I4ac454ceb60f960c30c9719d0aa63e4bda7dfbd5
* mm: implement WasActive page flag (for improving cleancache)Dan Magenheimer2016-06-121-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (Feedback welcome if there is a different/better way to do this without using a page flag!) Since about 2.6.27, the page replacement algorithm maintains an "active" bit to help decide which pages are most eligible to reclaim, see http://linux-mm.org/PageReplacementDesign This "active' information is also useful to cleancache but is lost by the time that cleancache has the opportunity to preserve the pageful of data. This patch adds a new page flag "WasActive" to retain the state. The flag may possibly be useful elsewhere. It is up to each cleancache backend to utilize the bit as it desires. The matching patch for zcache is included here for clarification/discussion purposes, though it will need to go through GregKH and the staging tree. The patch resolves issues reported with cleancache which occur especially during streaming workloads on older processors, see https://lkml.org/lkml/2011/8/17/351 Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com> Conflicts: include/linux/page-flags.h Change-Id: I0fcb2302a7b9c5e66db005229f679baee90f262f Conflicts: include/linux/page-flags.h
* staging: zcache: fix cleancache race condition with shrinkerSeth Jennings2016-06-121-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 6d7d9798ad5c97ee4e911dd070dc12dc5ae55bd0 upstream. This patch fixes a race condition that results in memory corruption when using cleancache. The race exists between the zcache shrinker handler, shrink_zcache_memory() and cleancache_get_page(). In most cases, the shrinker will both evict a zbpg from its buddy list and flush it from tmem before a cleancache_get_page() occurs on that page. A subsequent cleancache_get_page() will fail in the tmem layer. In the rare case that two occur together and the cleancache_get_page() path gets through the tmem layer before the shrinker path can flush tmem, zbud_decompress() does a check to see if the zbpg is a "zombie", i.e. not on a buddy list, which means the shrinker is in the process of reclaiming it. If the zbpg is a zombie, zbud_decompress() returns -EINVAL. However, this return code is being ignored by the caller, zcache_pampd_get_data_and_free(), which results in the caller of cleancache_get_page() thinking that the page has been properly retrieved when it has not. This patch modifies zcache_pampd_get_data_and_free() to convey the failure up the stack so that the caller of cleancache_get_page() knows the page retrieval failed. This needs to be applied to stable trees as well. zcache-main.c was named zcache.c before v3.1, so I'm not sure how you want to handle trees earlier than that. Change-Id: I618a2488d788c15b3e8d74d2831cc5d83ca71abc Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* mm: swap: don't delay swap free for fast swap devicesVinayak Menon2016-06-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are couple of issues with swapcache usage when ZRAM is used as swap device. 1) Kernel does a swap readahead which can be around 6 to 8 pages depending on total ram, which is not required for zram since accesses are fast. 2) Kernel delays the freeing up of swapcache expecting a later hit, which again is useless in the case of zram. 3) This is not related to swapcache, but zram usage itself. As mentioned in (2) kernel delays freeing of swapcache, but along with that it delays zram compressed page free also. i.e. there can be 2 copies, though one is compressed. This patch addresses these issues using two new flags QUEUE_FLAG_FAST and SWP_FAST, to indicate that accesses to the device will be fast and cheap, and instructs the swap layer to free up swap space agressively, and not to do read ahead. Change-Id: I5d2d5176a5f9420300bb2f843f6ecbdb25ea80e4 Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> Signed-off-by: D. Andrei Măceș <dmaces@nd.edu> Conflicts: include/linux/blkdev.h include/linux/swap.h mm/swap_state.c mm/swapfile.c Conflicts: include/linux/blkdev.h
* zcache: Use zs_get_total_size_pagesnadlabak2016-06-121-1/+1
| | | | Change-Id: I800331317eda3ffa33b12314fc1641f3d2ca4db2
* zsmalloc: change return value unit of zs_get_total_size_bytesMinchan Kim2016-06-121-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | zs_get_total_size_bytes returns a amount of memory zsmalloc consumed with *byte unit* but zsmalloc operates *page unit* rather than byte unit so let's change the API so benefit we could get is that reduce unnecessary overhead (ie, change page unit with byte unit) in zsmalloc. Since return type is pages, "zs_get_total_pages" is better than "zs_get_total_size_bytes". Signed-off-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Dan Streetman <ddstreet@ieee.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: <juno.choi@lge.com> Cc: <seungho1.park@lge.com> Cc: Luigi Semenzato <semenzato@google.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Seth Jennings <sjennings@variantweb.net> Cc: David Horner <ds2horner@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Conflicts: mm/zsmalloc.c Change-Id: If5697d7b7f8ebaab3b58c1f9f84de747eb909ca3
* staging: zsmalloc: add mapping modesSeth Jennings2016-06-121-3/+3
| | | | | | | | | | | | | | | | | | | This patch improves mapping performance in zsmalloc by getting usage information from the user in the form of a "mapping mode" and using it to avoid unnecessary copying for objects that span pages. Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Conflicts: drivers/staging/zram/zram_drv.c drivers/staging/zsmalloc/zsmalloc-main.c drivers/staging/zsmalloc/zsmalloc.h drivers/staging/zsmalloc/zsmalloc_int.h Change-Id: I0b7a97e21eb3b26270bd2949697ef6d14bf7ae27
* zcache: Fix zsmalloc includeEmerson Pinter2016-06-121-1/+1
| | | | Change-Id: I8f0c873f92d8c75388aa59d670da755a4ded873d
* staging: zsmalloc: zsmalloc: use unsigned long instead of void *Minchan Kim2016-06-121-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We should use unsigned long as handle instead of void * to avoid any confusion. Without this, users may just treat zs_malloc return value as a pointer and try to deference it. This patch passed compile test(zram, zcache and ramster) and zram is tested on qemu. changelog * from v2 - remove hval pointed out by Nitin - based on next-20120607 * from v1 - change zcache's zv_create return value - baesd on next-20120604 Cc: Dan Magenheimer <dan.magenheimer@oracle.com> Acked-by: Seth Jennings <sjenning@linux.vnet.ibm.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Minchan Kim <minchan@kernel.org> Acked-by: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Conflicts: drivers/staging/zram/zram_drv.c drivers/staging/zram/zram_drv.h drivers/staging/zsmalloc/zsmalloc-main.c drivers/staging/zsmalloc/zsmalloc.h Change-Id: I5b5adff5f31e3cf51cfd004df0f11e088d709d41
* zram: fix incorrect stat with failed_readsChao Yu2016-06-122-4/+8
| | | | | | | | | | | | | | | | | Since we allocate a temporary buffer in zram_bvec_read to handle partial page operations in commit 924bd88d703e ("Staging: zram: allow partial page operations"), our ->failed_reads value may be incorrect as we do not increase its value when failing to allocate the temporary buffer. Let's fix this issue and correct the annotation of failed_reads. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Acked-by: Jerome Marchand <jmarchan@redhat.com> Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* zram: replace global tb_lock with fine grain lockWeijie Yang2016-06-122-33/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, we use a rwlock tb_lock to protect concurrent access to the whole zram meta table. However, according to the actual access model, there is only a small chance for upper user to access the same table[index], so the current lock granularity is too big. The idea of optimization is to change the lock granularity from whole meta table to per table entry (table -> table[index]), so that we can protect concurrent access to the same table[index], meanwhile allow the maximum concurrency. With this in mind, several kinds of locks which could be used as a per-entry lock were tested and compared: Test environment: x86-64 Intel Core2 Q8400, system memory 4GB, Ubuntu 12.04, kernel v3.15.0-rc3 as base, zram with 4 max_comp_streams LZO. iozone test: iozone -t 4 -R -r 16K -s 200M -I +Z (1GB zram with ext4 filesystem, take the average of 10 tests, KB/s) Test base CAS spinlock rwlock bit_spinlock ------------------------------------------------------------------- Initial write 1381094 1425435 1422860 1423075 1421521 Rewrite 1529479 1641199 1668762 1672855 1654910 Read 8468009 11324979 11305569 11117273 10997202 Re-read 8467476 11260914 11248059 11145336 10906486 Reverse Read 6821393 8106334 8282174 8279195 8109186 Stride read 7191093 8994306 9153982 8961224 9004434 Random read 7156353 8957932 9167098 8980465 8940476 Mixed workload 4172747 5680814 5927825 5489578 5972253 Random write 1483044 1605588 1594329 1600453 1596010 Pwrite 1276644 1303108 1311612 1314228 1300960 Pread 4324337 4632869 4618386 4457870 4500166 To enhance the possibility of access the same table[index] concurrently, set zram a small disksize(10MB) and let threads run with large loop count. fio test: fio --bs=32k --randrepeat=1 --randseed=100 --refill_buffers --scramble_buffers=1 --direct=1 --loops=3000 --numjobs=4 --filename=/dev/zram0 --name=seq-write --rw=write --stonewall --name=seq-read --rw=read --stonewall --name=seq-readwrite --rw=rw --stonewall --name=rand-readwrite --rw=randrw --stonewall (10MB zram raw block device, take the average of 10 tests, KB/s) Test base CAS spinlock rwlock bit_spinlock ------------------------------------------------------------- seq-write 933789 999357 1003298 995961 1001958 seq-read 5634130 6577930 6380861 6243912 6230006 seq-rw 1405687 1638117 1640256 1633903 1634459 rand-rw 1386119 1614664 1617211 1609267 1612471 All the optimization methods show a higher performance than the base, however, it is hard to say which method is the most appropriate. On the other hand, zram is mostly used on small embedded system, so we don't want to increase any memory footprint. This patch pick the bit_spinlock method, pack object size and page_flag into an unsigned long table.value, so as to not increase any memory overhead on both 32-bit and 64-bit system. On the third hand, even though different kinds of locks have different performances, we can ignore this difference, because: if zram is used as zram swapfile, the swap subsystem can prevent concurrent access to the same swapslot; if zram is used as zram-blk for set up filesystem on it, the upper filesystem and the page cache also prevent concurrent access of the same block mostly. So we can ignore the different performances among locks. Change-Id: I85b58fc94a794a85be4713988b218ef6e52f2a27 Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reviewed-by: Davidlohr Bueso <davidlohr@hp.com> Signed-off-by: Weijie Yang <weijie.yang@samsung.com> Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* zram: use size_t instead of u16Minchan Kim2016-06-121-1/+1
| | | | | | | | | | | | | | Some architectures (eg, hexagon and PowerPC) could use PAGE_SHIFT of 16 or more. In these cases u16 is not sufficiently large to represent a compressed page's size so use size_t. Signed-off-by: Minchan Kim <minchan@kernel.org> Reported-by: Weijie Yang <weijie.yang@samsung.com> Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* zram: remove unused SECTOR_SIZE defineSergey Senozhatsky2016-06-121-1/+0
| | | | | | | | | | | Drop SECTOR_SIZE define, because it's not used. Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Weijie Yang <weijie.yang@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* zram: rename struct `table' to `zram_table_entry'Sergey Senozhatsky2016-06-121-2/+2
| | | | | | | | | | | | Andrew Morton has recently noted that `struct table' actually represents table entry and, thus, should be renamed. Rename to `zram_table_entry'. Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Weijie Yang <weijie.yang@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* zram: avoid lockdep splat by revalidate_diskMinchan Kim2016-06-121-4/+18
| | | | | | | | | | | | | | | | | | | | | | | Sasha reported lockdep warning [1] introduced by [2]. It could be fixed by doing disk revalidation out of the init_lock. It's okay because disk capacity change is protected by init_lock so that revalidate_disk always sees up-to-date value so there is no race. [1] https://lkml.org/lkml/2014/7/3/735 [2] zram: revalidate disk after capacity change Fixes 2e32baea46ce ("zram: revalidate disk after capacity change"). Signed-off-by: Minchan Kim <minchan@kernel.org> Reported-by: Sasha Levin <sasha.levin@oracle.com> Cc: "Alexander E. Patrakov" <patrakov@gmail.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> CC: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* zram: revalidate disk after capacity changeMinchan Kim2016-06-121-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Alexander reported mkswap on /dev/zram0 is failed if other process is opening the block device file. Step is as follows, 0. Reset the unused zram device. 1. Use a program that opens /dev/zram0 with O_RDWR and sleeps until killed. 2. While that program sleeps, echo the correct value to /sys/block/zram0/disksize. 3. Verify (e.g. in /proc/partitions) that the disk size is applied correctly. It is. 4. While that program still sleeps, attempt to mkswap /dev/zram0. This fails: mkswap: error: swap area needs to be at least 40 KiB When I investigated, the size get by ioctl(fd, BLKGETSIZE64, xxx) on mkswap to get a size of blockdev was zero although zram0 has right size by 2. The reason is zram didn't revalidate disk after changing capacity so that size of blockdev's inode is not uptodate until all of file is close. This patch should fix the BUG. Signed-off-by: Minchan Kim <minchan@kernel.org> Reported-by: Alexander E. Patrakov <patrakov@gmail.com> Tested-by: Alexander E. Patrakov <patrakov@gmail.com> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Nitin Gupta <ngupta@vflare.org> Acked-by: Jerome Marchand <jmarchan@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* zram: correct offset usage in zram_bio_discardWeijie Yang2016-06-121-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | We want to skip the physical block(PAGE_SIZE) which is partially covered by the discard bio, so we check the remaining size and subtract it if there is a need to goto the next physical block. The current offset usage in zram_bio_discard is incorrect, it will cause its upper filesystem breakdown. Consider the following scenario: On some architecture or config, PAGE_SIZE is 64K for example, filesystem is set up on zram disk without PAGE_SIZE aligned, a discard bio leads to a offset = 4K and size=72K, normally, it should not really discard any physical block as it partially cover two physical blocks. However, with the current offset usage, it will discard the second physical block and free its memory, which will cause filesystem breakdown. This patch corrects the offset usage in zram_bio_discard. Signed-off-by: Weijie Yang <weijie.yang@samsung.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Bob Liu <bob.liu@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>