aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/scsi
Commit message (Collapse)AuthorAgeFilesLines
...
* Fix incorrect memset in bnx2fc_parse_fcp_rspAndi Kleen2013-08-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | commit 16da05b1158d1bcb31656e636a8736a663b1cf1f upstream. gcc 4.8 warns because the memset only clears sizeof(char *) bytes, not the whole buffer. Use the correct buffer size and clear the whole sense buffer. /backup/lsrc/git/linux-lto-2.6/drivers/scsi/bnx2fc/bnx2fc_io.c: In function 'bnx2fc_parse_fcp_rsp': /backup/lsrc/git/linux-lto-2.6/drivers/scsi/bnx2fc/bnx2fc_io.c:1810:41: warning: argument to 'sizeof' in 'memset' call is the same expression as the destination; did you mean to provide an explicit length? [-Wsizeof-pointer-memaccess] memset(sc_cmd->sense_buffer, 0, sizeof(sc_cmd->sense_buffer)); ^ Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* megaraid_sas: fix memory leak if SGL has zero length entriesBjørn Mork2013-07-271-4/+6
| | | | | | | | | | | | | | | | commit 7a6a731bd00ca90d0e250867c3b9c05b5ff0fa49 upstream. commit 98cb7e44 ([SCSI] megaraid_sas: Sanity check user supplied length before passing it to dma_alloc_coherent()) introduced a memory leak. Memory allocated for entries following zero length SGL entries will not be freed. Reference: http://bugs.debian.org/688198 Signed-off-by: Bjørn Mork <bjorn@mork.no> Acked-by: Adam Radford <aradford@gmail.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* block: do not pass disk names as format stringsKees Cook2013-07-271-1/+1
| | | | | | | | | | | | | | | | | | commit ffc8b30866879ed9ba62bd0a86fecdbd51cd3d19 upstream. Disk names may contain arbitrary strings, so they must not be interpreted as format strings. It seems that only md allows arbitrary strings to be used for disk names, but this could allow for a local memory corruption from uid 0 into ring 0. CVE-2013-2851 Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [bwh: Backported to 3.2: adjust device pointer name in nbd.c] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* sd: Fix parsing of 'temporary ' cache mode prefixBen Hutchings2013-07-271-1/+1
| | | | | | | | | | | | | commit 2ee3e26c673e75c05ef8b914f54fadee3d7b9c88 upstream. Commit 39c60a0948cc '[SCSI] sd: fix array cache flushing bug causing performance problems' added temp as a pointer to "temporary " and used sizeof(temp) - 1 as its length. But sizeof(temp) is the size of the pointer, not the size of the string constant. Change temp to a static array so that sizeof() does what was intended. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
* sd: fix array cache flushing bug causing performance problemsJames Bottomley2013-07-272-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | commit 39c60a0948cc06139e2fbfe084f83cb7e7deae3b upstream. Some arrays synchronize their full non volatile cache when the sd driver sends a SYNCHRONIZE CACHE command. Unfortunately, they can have Terrabytes of this and we send a SYNCHRONIZE CACHE for every barrier if an array reports it has a writeback cache. This leads to massive slowdowns on journalled filesystems. The fix is to allow userspace to turn off the writeback cache setting as a temporary measure (i.e. without doing the MODE SELECT to write it back to the device), so even though the device reported it has a writeback cache, the user, knowing that the cache is non volatile and all they care about is filesystem correctness, can turn that bit off in the kernel and avoid the performance ruinous (and safety irrelevant) SYNCHRONIZE CACHE commands. The way you do this is add a 'temporary' prefix when performing the usual cache setting operations, so echo temporary write through > /sys/class/scsi_disk/<disk>/cache_type Reported-by: Ric Wheeler <rwheeler@redhat.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* mpt2sas: fix firmware failure with wrong task attributeSreekanth Reddy2013-07-271-5/+1
| | | | | | | | | | | commit 48ba2efc382f94fae16ca8ca011e5961a81ad1ea upstream. When SCSI command is received with task attribute not set, set it to SIMPLE. Previously it is set to untagged. This causes the firmware to fail the commands. Signed-off-by: Sreekanth Reddy <Sreekanth.Reddy@lsi.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* mpt2sas: Fix for device scan following host reset could get stuck in a ↵Sreekanth Reddy2013-07-271-3/+115
| | | | | | | | | | | | | | | infinite loop commit 6241f22ca12a26ee149cbe31b27bac97dbdc8bc4 upstream. Modified device scan routine so each configuration page read breaks from the while loop when the ioc_status is not equal to MPI2_IOCSTATUS_SUCCESS. [jejb: checkpatch fixes] Signed-off-by: Sreekanth Reddy <Sreekanth.Reddy@lsi.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> [bwh: Backported to 3.2; adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* mpt2sas: Fix for issue Missing delay not getting set during system bootupReddy, Sreekanth2013-07-273-11/+13
| | | | | | | | | | | | | commit b0df96a0068daee4f9c2189c29b9053eb6e46b17 upstream. Missing delay is not getting set properly. The reason is that it is not defined in the same file from where it is being invoked. The fix is to move the missing delay module parameter from mpt2sas_base.c to mpt2sas_scsh.c. Signed-off-by: Sreekanth Reddy <Sreekanth.Reddy@lsi.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* dc395x: uninitialized variable in device_alloc()Dan Carpenter2013-03-201-1/+1
| | | | | | | | | | | | commit 208afec4f3be8c51ad6eebe6611dd6d2ad2fa298 upstream. This bug was introduced back in bitkeeper days in 2003. We use "dcb->dev_mode" before it has been initialized. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Oliver Neukum <oliver@neukum.org> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* sd: Reshuffle init_sd to avoid crashJoel D. Diaz2013-02-061-5/+8
| | | | | | | | | | | | | | | commit afd5e34b2bb34881d3a789e62486814a49b47faa upstream. scsi_register_driver will register a prep_fn() function, which in turn migh need to use the sd_cdp_pool for DIF. Which hasn't been initialised at this point, leading to a crash. So reshuffle the init_sd() and exit_sd() paths to have the driver registered last. Signed-off-by: Joel D. Diaz <joeldiaz@us.ibm.com> Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* efi: Make 'efi_enabled' a function to query EFI facilitiesMatt Fleming2013-02-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 83e68189745ad931c2afd45d8ee3303929233e7f upstream. Originally 'efi_enabled' indicated whether a kernel was booted from EFI firmware. Over time its semantics have changed, and it now indicates whether or not we are booted on an EFI machine with bit-native firmware, e.g. 64-bit kernel with 64-bit firmware. The immediate motivation for this patch is the bug report at, https://bugs.launchpad.net/ubuntu-cdimage/+bug/1040557 which details how running a platform driver on an EFI machine that is designed to run under BIOS can cause the machine to become bricked. Also, the following report, https://bugzilla.kernel.org/show_bug.cgi?id=47121 details how running said driver can also cause Machine Check Exceptions. Drivers need a new means of detecting whether they're running on an EFI machine, as sadly the expression, if (!efi_enabled) hasn't been a sufficient condition for quite some time. Users actually want to query 'efi_enabled' for different reasons - what they really want access to is the list of available EFI facilities. For instance, the x86 reboot code needs to know whether it can invoke the ResetSystem() function provided by the EFI runtime services, while the ACPI OSL code wants to know whether the EFI config tables were mapped successfully. There are also checks in some of the platform driver code to simply see if they're running on an EFI machine (which would make it a bad idea to do BIOS-y things). This patch is a prereq for the samsung-laptop fix patch. Cc: David Airlie <airlied@linux.ie> Cc: Corentin Chary <corentincj@iksaif.net> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Olof Johansson <olof@lixom.net> Cc: Peter Jones <pjones@redhat.com> Cc: Colin Ian King <colin.king@canonical.com> Cc: Steve Langasek <steve.langasek@canonical.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Konrad Rzeszutek Wilk <konrad@kernel.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> [bwh: Backported to 3.2: - Adjust context (a lot) - Add efi_is_native() function from commit 5189c2a7c776 ('x86: efi: Turn off efi_enabled after setup on mixed fw/kernel') - Make efi_init() bail out when booted non-native, as it would previously not be called in this case - Drop inapplicable changes to start_kernel()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* mvsas: Fix oops when ata commond timeout.Jianpeng Ma2013-01-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 95ab000388974d8ffef8257306b4be6e8778b768 upstream. Kernel message follows: [ 511.712011] sd 11:0:0:0: [sdf] command ffff8800a4e81400 timed out [ 511.712022] sas: Enter sas_scsi_recover_host busy: 1 failed: 1 [ 511.712024] sas: trying to find task 0xffff8800a4d24c80 [ 511.712026] sas: sas_scsi_find_task: aborting task 0xffff8800a4d24c80 [ 511.712029] drivers/scsi/mvsas/mv_sas.c 1631:mvs_abort_task() mvi=ffff8800b5300000 task=ffff8800a4d24c80 slot=ffff8800b5325038 slot_idx=x0 [ 511.712035] BUG: unable to handle kernel NULL pointer dereference at 0000000000000058 [ 511.712040] IP: [<ffffffff815f8c0c>] _raw_spin_lock_irqsave+0xc/0x30 [ 511.712047] PGD 0 [ 511.712049] Oops: 0002 [#1] SMP [ 511.712052] Modules linked in: mvsas libsas scsi_transport_sas raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx [last unloaded: mvsas] [ 511.712062] CPU 3 [ 511.712066] Pid: 7322, comm: scsi_eh_11 Not tainted 3.5.0+ #106 To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M. [ 511.712068] RIP: 0010:[<ffffffff815f8c0c>] [<ffffffff815f8c0c>] _raw_spin_lock_irqsave+0xc/0x30 [ 511.712073] RSP: 0018:ffff880098d3bcb0 EFLAGS: 00010086 [ 511.712074] RAX: 0000000000000286 RBX: 0000000000000058 RCX: 00000000000000c3 [ 511.712076] RDX: 0000000000000100 RSI: 0000000000000046 RDI: 0000000000000058 [ 511.712078] RBP: ffff880098d3bcb0 R08: 000000000000000a R09: 0000000000000000 [ 511.712080] R10: 00000000000004e8 R11: 00000000000004e7 R12: ffff8800a4d24c80 [ 511.712082] R13: 0000000000000050 R14: ffff8800b5325038 R15: ffff8800a4eafe00 [ 511.712084] FS: 0000000000000000(0000) GS:ffff8800bdb80000(0000) knlGS:0000000000000000 [ 511.712086] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 511.712088] CR2: 0000000000000058 CR3: 00000000a4ce6000 CR4: 00000000000407e0 [ 511.712090] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 511.712091] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 511.712093] Process scsi_eh_11 (pid: 7322, threadinfo ffff880098d3a000, task ffff8800a61dde40) [ 511.712095] Stack: [ 511.712096] ffff880098d3bce0 ffffffff81060683 ffff880000000000 0000000000000000 [ 511.712099] ffff8800a4d24c80 ffff8800b5300000 ffff880098d3bcf0 ffffffffa0076a88 [ 511.712102] ffff880098d3bd50 ffffffffa0079bb5 ffff880000000000 ffff880000000018 [ 511.712106] Call Trace: [ 511.712110] [<ffffffff81060683>] complete+0x23/0x60 [ 511.712115] [<ffffffffa0076a88>] mvs_tmf_timedout+0x18/0x20 [mvsas] [ 511.712119] [<ffffffffa0079bb5>] mvs_slot_complete+0x765/0x7d0 [mvsas] [ 511.712125] [<ffffffffa005a17d>] sas_scsi_recover_host+0x55d/0xdb0 [libsas] [ 511.712128] [<ffffffff8106d600>] ? idle_balance+0xe0/0x130 [ 511.712133] [<ffffffff813b150c>] scsi_error_handler+0xcc/0x470 [ 511.712136] [<ffffffff815f7ad0>] ? __schedule+0x370/0x730 [ 511.712139] [<ffffffff8105f728>] ? __wake_up_common+0x58/0x90 [ 511.712142] [<ffffffff813b1440>] ? scsi_eh_get_sense+0x110/0x110 [ 511.712146] [<ffffffff810571be>] kthread+0x8e/0xa0 [ 511.712150] [<ffffffff816015f4>] kernel_thread_helper+0x4/0x10 [ 511.712153] [<ffffffff81057130>] ? flush_kthread_work+0x120/0x120 [ 511.712156] [<ffffffff816015f0>] ? gs_change+0xb/0xb [ 511.712157] Code: 8a 00 01 00 00 89 d0 f0 66 0f b1 0f 66 39 d0 0f 94 c0 0f b6 c0 5d c3 0f 1f 84 00 00 00 00 00 55 48 89 e5 9c 58 fa ba 00 01 00 00 <f0> 66 0f c1 17 0f b6 ce 38 d1 74 11 0f 1f 84 00 00 00 00 00 f3 [ 511.712191] RIP [<ffffffff815f8c0c>] _raw_spin_lock_irqsave+0xc/0x30 [ 511.712194] RSP <ffff880098d3bcb0> [ 511.712196] CR2: 0000000000000058 [ 511.712198] ---[ end trace a781c7b1e65db92c ]--- Signed-off-by: Jianpeng Ma <majianpeng@gmail.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* hpsa: gen8plus Smart Array IDsMike Miller2013-01-031-7/+25
| | | | | | | commit fe0c9610bb68dd0aad1017456f5e3c31264d70c2 upstream. Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* qla2xxx: Test and clear FCPORT_UPDATE_NEEDED atomically.David Jeffery2013-01-031-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit a394aac88506159e047630fc90dc2242568382d8 upstream. When the qla2xxx driver loses access to multiple, remote ports, there is a race condition which can occur which will keep the request stuck on a scsi request queue indefinitely. This bad state occurred do to a race condition with how the FCPORT_UPDATE_NEEDED bit is set in qla2x00_schedule_rport_del(), and how it is cleared in qla2x00_do_dpc(). The problem port has its drport pointer set, but it has never been processed by the driver to inform the fc transport that the port has been lost. qla2x00_schedule_rport_del() sets drport, and then sets the FCPORT_UPDATE_NEEDED bit. In qla2x00_do_dpc(), the port lists are walked and any drport pointer is handled and the fc transport informed of the port loss, then the FCPORT_UPDATE_NEEDED bit is cleared. This leaves a race where the dpc thread is processing one port removal, another port removal is marked with a call to qla2x00_schedule_rport_del(), and the dpc thread clears the bit for both removals, even though only the first removal was actually handled. Until another event occurs to set FCPORT_UPDATE_NEEDED, the later port removal is never finished and qla2xxx stays in a bad state which causes requests to become stuck on request queues. This patch updates the driver to test and clear FCPORT_UPDATE_NEEDED atomically. This ensures the port state changes are processed and not lost. Signed-off-by: David Jeffery <djeffery@redhat.com> Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com> Signed-off-by: Saurav Kashyap <saurav.kashyap@qlogic.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* mvsas: fix undefined bit shiftXi Wang2013-01-032-13/+3
| | | | | | | | | | | | | | | | | | | | | | | commit beecadea1b8d67f591b13f7099559f32f3fd601d upstream. The macro bit(n) is defined as ((u32)1 << n), and thus it doesn't work with n >= 32, such as in mvs_94xx_assign_reg_set(): if (i >= 32) { mvi->sata_reg_set |= bit(i); ... } The shift ((u32)1 << n) with n >= 32 also leads to undefined behavior. The result varies depending on the architecture. This patch changes bit(n) to do a 64-bit shift. It also simplifies mv_ffc64() using __ffs64(), since invoking ffz() with ~0 is undefined. Signed-off-by: Xi Wang <xi.wang@gmail.com> Acked-by: Xiangliang Yu <yuxiangl@marvell.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* prevent stack buffer overflow in host_resetSasha Levin2013-01-031-7/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 072f19b4bea31cdd482d79f805413f2f9ac9e233 upstream. store_host_reset() has tried to re-invent the wheel to compare sysfs strings. Unfortunately it did so poorly and never bothered to check the input from userspace before overwriting stack with it, so something simple as: echo "WoopsieWoopsie" > /sys/devices/pseudo_0/adapter0/host0/scsi_host/host0/host_reset would result in: [ 316.310101] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffff81f5bac7 [ 316.310101] [ 316.320051] Pid: 6655, comm: sh Tainted: G W 3.7.0-rc5-next-20121114-sasha-00016-g5c9d68d-dirty #129 [ 316.320051] Call Trace: [ 316.340058] pps pps0: PPS event at 1352918752.620355751 [ 316.340062] pps pps0: capture assert seq #303 [ 316.320051] [<ffffffff83b3856b>] panic+0xcd/0x1f4 [ 316.320051] [<ffffffff81f5bac7>] ? store_host_reset+0xd7/0x100 [ 316.320051] [<ffffffff8110b996>] __stack_chk_fail+0x16/0x20 [ 316.320051] [<ffffffff81f5bac7>] store_host_reset+0xd7/0x100 [ 316.320051] [<ffffffff81e55bb3>] dev_attr_store+0x13/0x30 [ 316.320051] [<ffffffff812f7db1>] sysfs_write_file+0x101/0x170 [ 316.320051] [<ffffffff8127acc8>] vfs_write+0xb8/0x180 [ 316.320051] [<ffffffff8127ae80>] sys_write+0x50/0xa0 [ 316.320051] [<ffffffff83c03418>] tracesys+0xe1/0xe6 Fix this by uninventing whatever was going on there and just use sysfs_streq. Bug introduced by 29443691 ("[SCSI] scsi: Added support for adapter and firmware reset"). [jejb: added necessary const to prevent compile warnings] Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* scsi: aha152x: Fix sparse warning and make printing pointer address more ↵Krzysztof Wilczynski2013-01-031-2/+2
| | | | | | | | | | | | | | | | portable. commit b631cf1f899f9d2e449884dbccc34940637c639f upstream. This is to change use of "0x%08x" in favour of "%p" as per ../Documentation/printk-formats.txt, which also takes care about the following warning during compilation time: drivers/scsi/aha152x.c: In function ‘get_command’: drivers/scsi/aha152x.c:2987: warning: cast from pointer to integer of different size Signed-off-by: Krzysztof Wilczynski <krzysztof.wilczynski@linux.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* isci: copy fis 0x34 response into proper bufferMaciej Patelczyk2012-12-061-1/+1
| | | | | | | | | | | | | | | | commit 49bd665c5407a453736d3232ee58f2906b42e83c upstream. SATA MICROCODE DOWNALOAD fails on isci driver. After receiving Register Device to Host (FIS 0x34) frame Initiator resets phy. In the frame handler routine response (FIS 0x34) was copied into wrong buffer and upper layer did not receive any answer which resulted in timeout and reset. This patch corrects this bug. Signed-off-by: Maciej Patelczyk <maciej.patelczyk@intel.com> Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* scsi_debug: Fix off-by-one bug when unmapping regionLukas Czerner2012-10-301-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | commit bc977749e967daa56de1922cf4cb38525631c51c upstream. Currently it is possible to unmap one more block than user requested to due to the off-by-one error in unmap_region(). This is probably due to the fact that the end variable despite its name actually points to the last block to unmap + 1. However in the condition it is handled as the last block of the region to unmap. The bug was not previously spotted probably due to the fact that the region was not zeroed, which has changed with commit be1dd78de5686c062bb3103f9e86d444a10ed783. With that commit we were able to corrupt the ext4 file system on 256M scsi_debug device with LBPRZ enabled using fstrim. Since the 'end' semantic is the same in several functions there this commit just fixes the condition to use the 'end' variable correctly in that context. Reported-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> [bwh: Backported to 3.2: adjust context; unwrap the if-statement] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* hpsa: dial down lockup detection during firmware flashStephen M. Cameron2012-10-173-5/+37
| | | | | | | | | | | | | | | | | | | | commit e85c59746957fd6e3595d02cf614370056b5816e upstream. Dial back the aggressiveness of the controller lockup detection thread. Currently it will declare the controller to be locked up if it goes for 10 seconds with no interrupts and no change in the heartbeat register. Dial back this to 30 seconds with no heartbeat change, and also snoop the ioctl path and if a firmware flash command is detected, dial it back further to 4 minutes until the firmware flash command completes. The reason for this is that during the firmware flash operation, the controller apparently doesn't update the heartbeat register as frequently as it is supposed to, and we can get a false positive. Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* drivers/scsi/atp870u.c: fix bad use of udelayMartin Michlmayr2012-10-171-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 0f6d93aa9d96cc9022b51bd10d462b03296be146 upstream. The ACARD driver calls udelay() with a value > 2000, which leads to to the following compilation error on ARM: ERROR: "__bad_udelay" [drivers/scsi/atp870u.ko] undefined! make[1]: *** [__modpost] Error 1 This is because udelay is defined on ARM, roughly speaking, as #define udelay(n) ((n) > 2000 ? __bad_udelay() : \ __const_udelay((n) * ((2199023U*HZ)>>11))) The argument to __const_udelay is the number of jiffies to wait divided by 4, but this does not work unless the multiplication does not overflow, and that is what the build error is designed to prevent. The intended behavior can be achieved by using mdelay to call udelay multiple times in a loop. [jrnieder@gmail.com: adding context] Signed-off-by: Martin Michlmayr <tbm@cyrius.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* scsi_dh_alua: Enable STPG for unavailable portsBart Van Assche2012-10-171-2/+1
| | | | | | | | | | | | | | | | commit e47f8976d8e573928824a06748f7bc82c58d747f upstream. A quote from SPC-4: "While in the unavailable primary target port asymmetric access state, the device server shall support those of the following commands that it supports while in the active/optimized state: [ ... ] d) SET TARGET PORT GROUPS; [ ... ]". Hence enable sending STPG to a target port group that is in the unavailable state. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Acked-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* scsi_remove_target: fix softlockup regression on hot removeDan Williams2012-10-171-16/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit bc3f02a795d3b4faa99d37390174be2a75d091bd upstream. John reports: BUG: soft lockup - CPU#2 stuck for 23s! [kworker/u:8:2202] [..] Call Trace: [<ffffffff8141782a>] scsi_remove_target+0xda/0x1f0 [<ffffffff81421de5>] sas_rphy_remove+0x55/0x60 [<ffffffff81421e01>] sas_rphy_delete+0x11/0x20 [<ffffffff81421e35>] sas_port_delete+0x25/0x160 [<ffffffff814549a3>] mptsas_del_end_device+0x183/0x270 ...introduced by commit 3b661a9 "[SCSI] fix hot unplug vs async scan race". Don't restart lookup of more stargets in the multi-target case, just arrange to traverse the list once, on the assumption that new targets are always added at the end. There is no guarantee that the target will change state in scsi_target_reap() so we can end up spinning if we restart. Acked-by: Jack Wang <jack_wang@usish.com> LKML-Reference: <CAEhu1-6wq1YsNiscGMwP4ud0Q+MrViRzv=kcWCQSBNc8c68N5Q@mail.gmail.com> Reported-by: John Drescher <drescherjm@gmail.com> Tested-by: John Drescher <drescherjm@gmail.com> Signed-off-by: Dan Williams <djbw@fb.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* ibmvscsi: Fix host config length field overflowBenjamin Herrenschmidt2012-10-171-0/+3
| | | | | | | | | | | | | commit 225c56960fcafeccc2b6304f96cd3f0dbf42a16a upstream. The length field in the host config packet is only 16-bit long, so passing it 0x10000 (64K which is our standard PAGE_SIZE) doesn't work and result in an empty config from the server. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Robert Jennings <rcj@linux.vnet.ibm.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* hpsa: Use LUN reset instead of target resetStephen M. Cameron2012-10-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 21e89afd325849eb38adccf382df16cc895911f9 upstream. It turns out Smart Array logical drives do not support target reset and when the target reset fails, the logical drive will be taken off line. Symptoms look like this: hpsa 0000:03:00.0: Abort request on C1:B0:T0:L0 hpsa 0000:03:00.0: resetting device 1:0:0:0 hpsa 0000:03:00.0: cp ffff880037c56000 is reported invalid (probably means target device no longer present) hpsa 0000:03:00.0: resetting device failed. sd 1:0:0:0: Device offlined - not ready after error recovery sd 1:0:0:0: rejecting I/O to offline device EXT3-fs error (device sdb1): read_block_bitmap: LUN reset is supported though, and is what we should be using. Target reset is also disruptive in shared SAS situations, for example, an external MSA1210m which does support target reset attached to Smart Arrays in multiple hosts -- a target reset from one host is disruptive to other hosts as all LUNs on the target will be reset and will abort all outstanding i/os back to all the attached hosts. So we should use LUN reset, not target reset. Tested this with Smart Array logical drives and with tape drives. Not sure how this bug survived since 2009, except it must be very rare for a Smart Array to require more than 30s to complete a request. Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* isci: fix isci_pci_probe() generates warning on efi failure pathDan Williams2012-10-172-2/+0
| | | | | | | | | | | | | | | | commit 6d70a74ffd616073a68ae0974d98819bfa8e6da6 upstream. The oem parameter image embedded in the efi variable is at an offset from the start of the variable. However, in the failure path we try to free the 'orom' pointer which is only valid when the paramaters are being read from the legacy option-rom space. Since failure to load the oem parameters is unlikely and we keep the memory around in the success case just defer all de-allocation to devm. Reported-by: Don Morris <don.morris@hp.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* hpsa: fix handling of protocol errorStephen M. Cameron2012-10-101-1/+2
| | | | | | | | | | | | | | | | | | | | | | | commit 256d0eaac87da1e993190846064f339f4c7a63f5 upstream. If a command status of CMD_PROTOCOL_ERR is received, this information should be conveyed to the SCSI mid layer, not dropped on the floor. CMD_PROTOCOL_ERR may be received from the Smart Array for any commands destined for an external RAID controller such as a P2000, or commands destined for tape drives or CD/DVD-ROM drives, if for instance a cable is disconnected. This mostly affects multipath configurations, as disconnecting a cable on a non-multipath configuration is not going to do anything good regardless of whether CMD_PROTOCOL_ERR is handled correctly or not. Not handling CMD_PROTOCOL_ERR correctly in a multipath configaration involving external RAID controllers may cause data corruption, so this is quite a serious bug. This bug should not normally cause a problem for direct attached disk storage. Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* mpt2sas: Fix for issue - Unable to boot from the drive connected to HBAsreekanth.reddy@lsi.com2012-10-101-0/+7
| | | | | | | | | | | | | commit 10cce6d8b5af0b32bc4254ae4a28423a74c0921c upstream. This patch checks whether HBA is SAS2008 B0 controller. if it is a SAS2008 B0 controller then it use IO-APIC interrupt instead of MSIX, as SAS2008 B0 controller doesn't support MSIX interrupts. [jejb: fix whitespace problems] Signed-off-by: Sreekanth Reddy <sreekanth.reddy@lsi.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* bnx2i: Fixed NULL ptr deference for 1G bnx2 Linux iSCSI offloadEddie Wai2012-10-101-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit d6532207116307eb7ecbfa7b9e02c53230096a50 upstream. This patch fixes the following kernel panic invoked by uninitialized fields in the chip initialization for the 1G bnx2 iSCSI offload. One of the bits in the chip initialization is being used by the latest firmware to control overflow packets. When this control bit gets enabled erroneously, it would ultimately result in a bad packet placement which would cause the bnx2 driver to dereference a NULL ptr in the placement handler. This can happen under certain stress I/O environment under the Linux iSCSI offload operation. This change only affects Broadcom's 5709 chipset. Unable to handle kernel NULL pointer dereference at 0000000000000008 RIP: [<ffffffff881f0e7d>] :bnx2:bnx2_poll_work+0xd0d/0x13c5 Pid: 0, comm: swapper Tainted: G ---- 2.6.18-333.el5debug #2 RIP: 0010:[<ffffffff881f0e7d>] [<ffffffff881f0e7d>] :bnx2:bnx2_poll_work+0xd0d/0x13c5 RSP: 0018:ffff8101b575bd50 EFLAGS: 00010216 RAX: 0000000000000005 RBX: ffff81007c5fb180 RCX: 0000000000000000 RDX: 0000000000000ffc RSI: 00000000817e8000 RDI: 0000000000000220 RBP: ffff81015bbd7ec0 R08: ffff8100817e9000 R09: 0000000000000000 R10: ffff81007c5fb180 R11: 00000000000000c8 R12: 000000007a25a010 R13: 0000000000000000 R14: 0000000000000005 R15: ffff810159f80558 FS: 0000000000000000(0000) GS:ffff8101afebc240(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000008 CR3: 0000000000201000 CR4: 00000000000006a0 Process swapper (pid: 0, threadinfo ffff8101b5754000, task ffff8101afebd820) Stack: 000000000000000b ffff810159f80000 0000000000000040 ffff810159f80520 ffff810159f80500 00cf00cf8008e84b ffffc200100939e0 ffff810009035b20 0000502900000000 000000be00000001 ffff8100817e7810 00d08101b575bea8 Call Trace: <IRQ> [<ffffffff8008e0d0>] show_schedstat+0x1c2/0x25b [<ffffffff881f1886>] :bnx2:bnx2_poll+0xf6/0x231 [<ffffffff8000c9b9>] net_rx_action+0xac/0x1b1 [<ffffffff800125a0>] __do_softirq+0x89/0x133 [<ffffffff8005e30c>] call_softirq+0x1c/0x28 [<ffffffff8006d5de>] do_softirq+0x2c/0x7d [<ffffffff8006d46e>] do_IRQ+0xee/0xf7 [<ffffffff8005d625>] ret_from_intr+0x0/0xa <EOI> [<ffffffff801a5780>] acpi_processor_idle_simple+0x1c5/0x341 [<ffffffff801a573d>] acpi_processor_idle_simple+0x182/0x341 [<ffffffff801a55bb>] acpi_processor_idle_simple+0x0/0x341 [<ffffffff80049560>] cpu_idle+0x95/0xb8 [<ffffffff80078b1c>] start_secondary+0x479/0x488 Signed-off-by: Eddie Wai <eddie.wai@broadcom.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* Fix 'Device not ready' issue on mpt2sasJames Bottomley2012-09-192-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 14216561e164671ce147458653b1fea06a4ada1e upstream. This is a particularly nasty SCSI ATA Translation Layer (SATL) problem. SAT-2 says (section 8.12.2) if the device is in the stopped state as the result of processing a START STOP UNIT command (see 9.11), then the SATL shall terminate the TEST UNIT READY command with CHECK CONDITION status with the sense key set to NOT READY and the additional sense code of LOGICAL UNIT NOT READY, INITIALIZING COMMAND REQUIRED; mpt2sas internal SATL seems to implement this. The result is very confusing standby behaviour (using hdparm -y). If you suspend a drive and then send another command, usually it wakes up. However, if the next command is a TEST UNIT READY, the SATL sees that the drive is suspended and proceeds to follow the SATL rules for this, returning NOT READY to all subsequent commands. This means that the ordering of TEST UNIT READY is crucial: if you send TUR and then a command, you get a NOT READY to both back. If you send a command and then a TUR, you get GOOD status because the preceeding command woke the drive. This bit us badly because commit 85ef06d1d252f6a2e73b678591ab71caad4667bb Author: Tejun Heo <tj@kernel.org> Date: Fri Jul 1 16:17:47 2011 +0200 block: flush MEDIA_CHANGE from drivers on close(2) Changed our ordering on TEST UNIT READY commands meaning that SATA drives connected to an mpt2sas now suspend and refuse to wake (because the mpt2sas SATL sees the suspend *before* the drives get awoken by the next ATA command) resulting in lots of failed commands. The standard is completely nuts forcing this inconsistent behaviour, but we have to work around it. The fix for this is twofold: 1. Set the allow_restart flag so we wake the drive when we see it has been suspended 2. Return all TEST UNIT READY status directly to the mid layer without any further error handling which prevents us causing error handling which may offline the device just because of a media check TUR. Reported-by: Matthias Prager <linux@matthiasprager.de> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* megaraid_sas: Move poll_aen_lock initializerKashyap Desai2012-09-191-1/+2
| | | | | | | | | | | | | | commit bd8d6dd43a77bfd2b8fef5b094b9d6095e169dee upstream. The following patch moves the poll_aen_lock initializer from megasas_probe_one() to megasas_init(). This prevents a crash when a user loads the driver and tries to issue a poll() system call on the ioctl interface with no adapters present. Signed-off-by: Kashyap Desai <Kashyap.Desai@lsi.com> Signed-off-by: Adam Radford <aradford@gmail.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* mpt2sas: Fix for Driver oops, when loading driver with max_queue_depth ↵sreekanth.reddy@lsi.com2012-09-191-5/+8
| | | | | | | | | | | | | | | | | | | | | | | | command line option to a very small value commit 338b131a3269881c7431234855c93c219b0979b6 upstream. If the specified max_queue_depth setting is less than the expected number of internal commands, then driver will calculate the queue depth size to a negitive number. This negitive number is actually a very large number because variable is unsigned 16bit integer. So, the driver will ask for a very large amount of memory for message frames and resulting into oops as memory allocation routines will not able to handle such a large request. So, in order to limit this kind of oops, The driver need to set the max_queue_depth to a scsi mid layer's can_queue value. Then the overall message frames required for IO is minimum of either (max_queue_depth plus internal commands) or the IOC global credits. Signed-off-by: Sreekanth Reddy <sreekanth.reddy@lsi.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* libsas: fix sas_discover_devices return code handlingDan Williams2012-08-021-27/+12
| | | | | | | | | | | | | | | | | | | | | | | | commit b17caa174a7e1fd2e17b26e210d4ee91c4c28b37 upstream. commit 198439e4 [SCSI] libsas: do not set res = 0 in sas_ex_discover_dev() commit 19252de6 [SCSI] libsas: fix wide port hotplug issues The above commits seem to have confused the return value of sas_ex_discover_dev which is non-zero on failure and sas_ex_join_wide_port which just indicates short circuiting discovery on already established ports. The result is random discovery failures depending on configuration. Calls to sas_ex_join_wide_port are the source of the trouble as its return value is errantly assigned to 'res'. Convert it to bool and stop returning its result up the stack. Tested-by: Dan Melnic <dan.melnic@amd.com> Reported-by: Dan Melnic <dan.melnic@amd.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jack Wang <jack_wang@usish.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* libsas: continue revalidationDan Williams2012-08-021-4/+4
| | | | | | | | | | | | | commit 26f2f199ff150d8876b2641c41e60d1c92d2fb81 upstream. Continue running revalidation until no more broadcast devices are discovered. Fixes cases where re-discovery completes too early in a domain with multiple expanders with pending re-discovery events. Servicing BCNs can get backed up behind error recovery. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* fix eh wakeup (scsi_schedule_eh vs scsi_restart_operations)Dan Williams2012-08-021-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | commit 57fc2e335fd3c2f898ee73570dc81426c28dc7b4 upstream. Rapid ata hotplug on a libsas controller results in cases where libsas is waiting indefinitely on eh to perform an ata probe. A race exists between scsi_schedule_eh() and scsi_restart_operations() in the case when scsi_restart_operations() issues i/o to other devices in the sas domain. When this happens the host state transitions from SHOST_RECOVERY (set by scsi_schedule_eh) back to SHOST_RUNNING and ->host_busy is non-zero so we put the eh thread to sleep even though ->host_eh_scheduled is active. Before putting the error handler to sleep we need to check if the host_state needs to return to SHOST_RECOVERY for another trip through eh. Since i/o that is released by scsi_restart_operations has been blocked for at least one eh cycle, this implementation allows those i/o's to run before another eh cycle starts to discourage hung task timeouts. Reported-by: Tom Jackson <thomas.p.jackson@intel.com> Tested-by: Tom Jackson <thomas.p.jackson@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* fix hot unplug vs async scan raceDan Williams2012-08-022-15/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 3b661a92e869ebe2358de8f4b3230ad84f7fce51 upstream. The following crash results from cases where the end_device has been removed before scsi_sysfs_add_sdev has had a chance to run. BUG: unable to handle kernel NULL pointer dereference at 0000000000000098 IP: [<ffffffff8115e100>] sysfs_create_dir+0x32/0xb6 ... Call Trace: [<ffffffff8125e4a8>] kobject_add_internal+0x120/0x1e3 [<ffffffff81075149>] ? trace_hardirqs_on+0xd/0xf [<ffffffff8125e641>] kobject_add_varg+0x41/0x50 [<ffffffff8125e70b>] kobject_add+0x64/0x66 [<ffffffff8131122b>] device_add+0x12d/0x63a [<ffffffff814b65ea>] ? _raw_spin_unlock_irqrestore+0x47/0x56 [<ffffffff8107de15>] ? module_refcount+0x89/0xa0 [<ffffffff8132f348>] scsi_sysfs_add_sdev+0x4e/0x28a [<ffffffff8132dcbb>] do_scan_async+0x9c/0x145 ...teach scsi_sysfs_add_devices() to check for deleted devices() before trying to add them, and teach scsi_remove_target() how to remove targets that have not been added via device_add(). Reported-by: Dariusz Majchrzak <dariusz.majchrzak@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* Avoid dangling pointer in scsi_requeue_command()Bart Van Assche2012-08-021-0/+11
| | | | | | | | | | | | | | | | | commit 940f5d47e2f2e1fa00443921a0abf4822335b54d upstream. When we call scsi_unprep_request() the command associated with the request gets destroyed and therefore drops its reference on the device. If this was the only reference, the device may get released and we end up with a NULL pointer deref when we call blk_requeue_request. Reported-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Reviewed-by: Tejun Heo <tj@kernel.org> [jejb: enhance commend and add commit log for stable] Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* Fix device removal NULL pointer dereferenceBart Van Assche2012-08-024-36/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | commit 67bd94130015c507011af37858989b199c52e1de upstream. Use blk_queue_dead() to test whether the queue is dead instead of !sdev. Since scsi_prep_fn() may be invoked concurrently with __scsi_remove_device(), keep the queuedata (sdev) pointer in __scsi_remove_device(). This patch fixes a kernel oops that can be triggered by USB device removal. See also http://www.spinics.net/lists/linux-scsi/msg56254.html. Other changes included in this patch: - Swap the blk_cleanup_queue() and kfree() calls in scsi_host_dev_release() to make that code easier to grasp. - Remove the queue dead check from scsi_run_queue() since the queue state can change anyway at any point in that function where the queue lock is not held. - Remove the queue dead check from the start of scsi_request_fn() since it is redundant with the scsi_device_online() check. Reported-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Reviewed-by: Tejun Heo <tj@kernel.org> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* Fix NULL dereferences in scsi_cmd_to_driverMark Rustad2012-08-021-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 222a806af830fda34ad1f6bc991cd226916de060 upstream. Avoid crashing if the private_data pointer happens to be NULL. This has been seen sometimes when a host reset happens, notably when there are many LUNs: host3: Assigned Port ID 0c1601 scsi host3: libfc: Host reset succeeded on port (0c1601) BUG: unable to handle kernel NULL pointer dereference at 0000000000000350 IP: [<ffffffff81352bb8>] scsi_send_eh_cmnd+0x58/0x3a0 <snip> Process scsi_eh_3 (pid: 4144, threadinfo ffff88030920c000, task ffff880326b160c0) Stack: 000000010372e6ba 0000000000000282 000027100920dca0 ffffffffa0038ee0 0000000000000000 0000000000030003 ffff88030920dc80 ffff88030920dc80 00000002000e0000 0000000a00004000 ffff8803242f7760 ffff88031326ed80 Call Trace: [<ffffffff8105b590>] ? lock_timer_base+0x70/0x70 [<ffffffff81352fbe>] scsi_eh_tur+0x3e/0xc0 [<ffffffff81353a36>] scsi_eh_test_devices+0x76/0x170 [<ffffffff81354125>] scsi_eh_host_reset+0x85/0x160 [<ffffffff81354291>] scsi_eh_ready_devs+0x91/0x110 [<ffffffff813543fd>] scsi_unjam_host+0xed/0x1f0 [<ffffffff813546a8>] scsi_error_handler+0x1a8/0x200 [<ffffffff81354500>] ? scsi_unjam_host+0x1f0/0x1f0 [<ffffffff8106ec3e>] kthread+0x9e/0xb0 [<ffffffff81509264>] kernel_thread_helper+0x4/0x10 [<ffffffff8106eba0>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff81509260>] ? gs_change+0x13/0x13 Code: 25 28 00 00 00 48 89 45 c8 31 c0 48 8b 87 80 00 00 00 48 8d b5 60 ff ff ff 89 d1 48 89 fb 41 89 d6 4c 89 fa 48 8b 80 b8 00 00 00 <48> 8b 80 50 03 00 00 48 8b 00 48 89 85 38 ff ff ff 48 8b 07 4c RIP [<ffffffff81352bb8>] scsi_send_eh_cmnd+0x58/0x3a0 RSP <ffff88030920dc50> CR2: 0000000000000350 Signed-off-by: Mark Rustad <mark.d.rustad@intel.com> Tested-by: Marcus Dennis <marcusx.e.dennis@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> [bwh: Backported to 3.2: adjust filename, context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* libsas: fix taskfile corruption in sas_ata_qc_fill_rtfDan Williams2012-07-252-7/+7
| | | | | | | | | | | | | | | | | | | | commit 6ef1b512f4e6f936d89aa20be3d97a7ec7c290ac upstream. fill_result_tf() grabs the taskfile flags from the originating qc which sas_ata_qc_fill_rtf() promptly overwrites. The presence of an ata_taskfile in the sata_device makes it tempting to just copy the full contents in sas_ata_qc_fill_rtf(). However, libata really only wants the fis contents and expects the other portions of the taskfile to not be touched by ->qc_fill_rtf. To that end store a fis buffer in the sata_device and use ata_tf_from_fis() like every other ->qc_fill_rtf() implementation. Reported-by: Praveen Murali <pmurali@logicube.com> Tested-by: Praveen Murali <pmurali@logicube.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* mpt2sas: Fix unsafe using smp_processor_id() in preemptiblenagalakshmi.nandigama@lsi.com2012-06-191-1/+1
| | | | | | | | | | | | | commit a2c658505bf5c75516ee0a79287223e86a2474af upstream. When CONFIG_DEBUG_PREEMPT is enabled, bug is observed in the smp_processor_id(). This is because smp_processor_id() is not called in preempt safe condition. To fix this issue, use raw_smp_processor_id instead of smp_processor_id. Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* fix scsi_wait_scanJames Bottomley2012-06-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | commit 1ff2f40305772b159a91c19590ee159d3a504afc upstream. Commit c751085943362143f84346d274e0011419c84202 Author: Rafael J. Wysocki <rjw@sisk.pl> Date: Sun Apr 12 20:06:56 2009 +0200 PM/Hibernate: Wait for SCSI devices scan to complete during resume Broke the scsi_wait_scan module in 2.6.30. Apparently debian still uses it so fix it and backport to stable before removing it in 3.6. The breakage is caused because the function template in include/scsi/scsi_scan.h is defined to be a nop unless SCSI is built in. That means that in the modular case (which is every distro), the scsi_wait_scan module does a simple async_synchronize_full() instead of waiting for scans. Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* Fix dm-multipath starvation when scsi host is busyJun'ichi Nomura2012-06-101-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | commit b7e94a1686c5daef4f649f7f4f839cc294f07710 upstream. block congestion control doesn't have any concept of fairness across multiple queues. This means that if SCSI reports the host as busy in the queue congestion control it can result in an unfair starvation situation in dm-mp if there are multiple multipath devices on the same host. For example: http://www.redhat.com/archives/dm-devel/2012-May/msg00123.html The fix for this is to report only the sdev busy state (and ignore the host busy state) in the block congestion control call back. The host is still congested, but the SCSI subsystem will sort out the congestion in a fair way because it knows the relation between the queues and the host. [jejb: fixed up trailing whitespace] Reported-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de> Tested-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* hpsa: Add IRQF_SHARED back in for the non-MSI(X) interrupt handlerStephen M. Cameron2012-05-311-2/+2
| | | | | | | | | | | | | | | commit 45bcf018d1a4779d592764ef57517c92589d55d7 upstream. IRQF_SHARED is required for older controllers that don't support MSI(X) and which may end up sharing an interrupt. All the controllers hpsa normally supports have MSI(X) capability, but older controllers may be encountered via the hpsa_allow_any=1 module parameter. Also remove deprecated IRQF_DISABLED. Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* isci: fix oem parameter validation on single controller skusDan Williams2012-05-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | commit fc25f79af321c01a739150ba2c09435cf977a63d upstream. OEM parameters [1] are parsed from the platform option-rom / efi driver. By default the driver was validating the parameters for the dual-controller case, but in single-controller case only the first set of parameters may be valid. Limit the validation to the number of actual controllers detected otherwise the driver may fail to parse the valid parameters leading to driver-load or runtime failures. [1] the platform specific set of phy address, configuration,and analog tuning values [stable v3.0+] Reported-by: Dave Jiang <dave.jiang@intel.com> Tested-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* mpt2sas: Fix for panic happening because of improper memory allocationnagalakshmi.nandigama@lsi.com2012-05-311-3/+3
| | | | | | | | | | | | | | | | | | | commit e42fafc25fa86c61824e8d4c5e7582316415d24f upstream. The ioc->pfacts member in the IOC structure is getting set to zero following a call to _base_get_ioc_facts due to the memset in that routine. So if the ioc->pfacts was read after a host reset, there would be a NULL pointer dereference. The routine _base_get_ioc_facts is called from context of host reset. The problem in _base_get_ioc_facts is the size of Mpi2IOCFactsReply is 64, whereas the sizeof "struct mpt2sas_facts" is 60, so there is a four byte overflow resulting from the memset. Also, there is memset in _base_get_port_facts using the incorrect structure, it should be "struct mpt2sas_port_facts" instead of Mpi2PortFactsReply. Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* hpsa: Fix problem with MSA2xxx devicesStephen M. Cameron2012-05-311-19/+15
| | | | | | | | | | | | | | | | | | commit 9bc3711cbb67ac620bf09b4a147cbab45b2c36c0 upstream. Upgraded firmware on Smart Array P7xx (and some others) made them show up as SCSI revision 5 devices and this caused the driver to fail to map MSA2xxx logical drives to the correct bus/target/lun. A symptom of this would be that the target ID of the logical drives as presented by the external storage array is ignored, and all such logical drives are assigned to target zero, differentiated only by LUN. Some multipath software reportedly does not deal well with this behavior, failing to recognize different paths to the same device as such. Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: Scott Teel <scott.teel@hp.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* libsas: fix false positive 'device attached' conditionsDan Williams2012-05-111-1/+8
| | | | | | | | | | | | | | | | | | commit 7d1d865181185bdf1316d236b1b4bd02c9020729 upstream. Normalize phy->attached_sas_addr to return a zero-address in the case when device-type == NO_DEVICE or the linkrate is invalid to handle expanders that put non-zero sas addresses in the discovery response: sas: ex 5001b4da000f903f phy02:U:0 attached: 0100000000000000 (no device) sas: ex 5001b4da000f903f phy01:U:0 attached: 0100000000000000 (no device) sas: ex 5001b4da000f903f phy03:U:0 attached: 0100000000000000 (no device) sas: ex 5001b4da000f903f phy00:U:0 attached: 0100000000000000 (no device) Reported-by: Andrzej Jakowski <andrzej.jakowski@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* libsas: fix sas_find_bcast_phy() in the presence of 'vacant' physThomas Jackson2012-05-111-5/+12
| | | | | | | | | | | | | | commit 1699490db339e2c6b3037ea8e7dcd6b2755b688e upstream. If an expander reports 'PHY VACANT' for a phy index prior to the one that generated a BCN libsas fails rediscovery. Since a vacant phy is defined as a valid phy index that will never have an attached device just continue the search. Signed-off-by: Thomas Jackson <thomas.p.jackson@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* osd_uld: Bump MAX_OSD_DEVICES from 64 to 1,048,576Boaz Harrosh2012-03-121-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | commit 41f8ad76362e7aefe3a03949c43e23102dae6e0b upstream. It used to be that minors where 8 bit. But now they are actually 20 bit. So the fix is simplicity itself. I've tested with 300 devices and all user-mode utils work just fine. I have also mechanically added 10,000 to the ida (so devices are /dev/osd10000, /dev/osd10001 ...) and was able to mkfs an exofs filesystem and access osds from user-mode. All the open-osd user-mode code uses the same library to access devices through their symbolic names in /dev/osdX so I'd say it's pretty safe. (Well tested) This patch is very important because some of the systems that will be deploying the 3.2 pnfs-objects code are larger than 64 OSDs and will stop to work properly when reaching that number. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>