aboutsummaryrefslogtreecommitdiffstats
path: root/include/scsi
Commit message (Collapse)AuthorAgeFilesLines
* initial merge with 3.2.72Wolfgang Wiedmeyer2015-10-232-0/+310
|\
| * usb-storage/SCSI: Add broken_fua blacklist flagAlan Stern2014-08-061-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit b14bf2d0c0358140041d1c1805a674376964d0e0 upstream. Some buggy JMicron USB-ATA bridges don't know how to translate the FUA bit in READs or WRITEs. This patch adds an entry in unusual_devs.h and a blacklist flag to tell the sd driver not to use FUA. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Reported-by: Michael Büsch <m@bues.ch> Tested-by: Michael Büsch <m@bues.ch> Acked-by: James Bottomley <James.Bottomley@HansenPartnership.com> CC: Matthew Dharm <mdharm-usb@one-eyed-alien.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [bwh: Backported to 3.2: - Adjust context - Use sd_printk() not sd_first_printk()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
| * fix our current target reap infrastructureJames Bottomley2014-07-111-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit e63ed0d7a98014fdfc2cfeb3f6dada313dcabb59 upstream. This patch eliminates the reap_ref and replaces it with a proper kref. On last put of this kref, the target is removed from visibility in sysfs. The final call to scsi_target_reap() for the device is done from __scsi_remove_device() and only if the device was made visible. This ensures that the target disappears as soon as the last device is gone rather than waiting until final release of the device (which is often too long). Reviewed-by: Alan Stern <stern@rowland.harvard.edu> Tested-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
| * ore: Fix wrong math in allocation of per device BIOBoaz Harrosh2014-04-021-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit aad560b7f63b495f48a7232fd086c5913a676e6f upstream. At IO preparation we calculate the max pages at each device and allocate a BIO per device of that size. The calculation was wrong on some unaligned corner cases offset/length combination and would make prepare return with -ENOMEM. This would be bad for pnfs-objects that would in that case IO through MDS. And fatal for exofs were it would fail writes with EIO. Fix it by doing the proper math, that will work in all cases. (I ran a test with all possible offset/length combinations this time round). Also when reading we do not need to allocate for the parity units since we jump over them. Also lower the max_io_length to take into account the parity pages so not to allocate BIOs bigger than PAGE_SIZE Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
| * libsas: fix taskfile corruption in sas_ata_qc_fill_rtfDan Williams2012-07-251-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 6ef1b512f4e6f936d89aa20be3d97a7ec7c290ac upstream. fill_result_tf() grabs the taskfile flags from the originating qc which sas_ata_qc_fill_rtf() promptly overwrites. The presence of an ata_taskfile in the sata_device makes it tempting to just copy the full contents in sas_ata_qc_fill_rtf(). However, libata really only wants the fis contents and expects the other portions of the taskfile to not be touched by ->qc_fill_rtf. To that end store a fis buffer in the sata_device and use ata_tf_from_fis() like every other ->qc_fill_rtf() implementation. Reported-by: Praveen Murali <pmurali@logicube.com> Tested-by: Praveen Murali <pmurali@logicube.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
| * [SCSI] fcoe: fix fcoe in a DCB environment by adding DCB notifiers to set ↵john fastabend2011-12-151-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | skb priority Use DCB notifiers to set the skb priority to allow packets to be steered and tagged correctly over DCB enabled drivers that setup traffic classes. This allows queue_mapping() routines to be removed in these drivers that were previously inspecting the ethertype of every skb to mark FCoE/FIP frames. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
| * Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6Linus Torvalds2011-10-289-117/+356
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (204 commits) [SCSI] qla4xxx: export address/port of connection (fix udev disk names) [SCSI] ipr: Fix BUG on adapter dump timeout [SCSI] megaraid_sas: Fix instance access in megasas_reset_timer [SCSI] hpsa: change confusing message to be more clear [SCSI] iscsi class: fix vlan configuration [SCSI] qla4xxx: fix data alignment and use nl helpers [SCSI] iscsi class: fix link local mispelling [SCSI] iscsi class: Replace iscsi_get_next_target_id with IDA [SCSI] aacraid: use lower snprintf() limit [SCSI] lpfc 8.3.27: Change driver version to 8.3.27 [SCSI] lpfc 8.3.27: T10 additions for SLI4 [SCSI] lpfc 8.3.27: Fix queue allocation failure recovery [SCSI] lpfc 8.3.27: Change algorithm for getting physical port name [SCSI] lpfc 8.3.27: Changed worst case mailbox timeout [SCSI] lpfc 8.3.27: Miscellanous logic and interface fixes [SCSI] megaraid_sas: Changelog and version update [SCSI] megaraid_sas: Add driver workaround for PERC5/1068 kdump kernel panic [SCSI] megaraid_sas: Add multiple MSI-X vector/multiple reply queue support [SCSI] megaraid_sas: Add support for MegaRAID 9360/9380 12GB/s controllers [SCSI] megaraid_sas: Clear FUSION_IN_RESET before enabling interrupts ...
| | * [SCSI] iscsi class: fix vlan configurationMike Christie2011-10-201-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Userspace was sending the priority/id part of the vlan tag and sysfs was displaying the id in the vlan file. This renames the vlan sysfs file to vlan_id to reflect that it was showing the id and to match the vlan_priority file. This also adds a ISCSI_NET_PARAM_VLAN_TAG iscsi nl command to relfect that we are sending down the vlan/priority part of the tag. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
| | * [SCSI] qla4xxx: fix data alignment and use nl helpersMike Christie2011-10-201-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This has the driver use helpers for a common operation and fixes a issue where if multiple iscsi params are sent they could be sent at offsets that cause unaligned accesses. The nla helpers account for the padding needed to align properly for the driver. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
| | * [SCSI] iscsi class: Replace iscsi_get_next_target_id with IDAMike Christie2011-10-201-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replaced the iscsi_get_next_target_id with IDA to make target-id allocation efficient for iscsi offload drivers This patch should be applied after Jonathen Cameron Patch "ida : simplified functions for id allocation" Signed-off-by: John Soni Jose <jose0here@gmail.com> Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
| | * [SCSI] libsas: fix port->dev_list lockingDan Williams2011-10-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | port->dev_list maintains a list of devices attached to a given sas root port. It needs to be mutated under a lock as contexts outside of the single-threaded-libsas-workqueue access the list via sas_find_dev_by_rphy(). Fixup locations where the list was being mutated without a lock. This is a follow-up to commit 5911e963 "[SCSI] libsas: remove expander from dev list on error", where Luben noted [1]: > 2/ We have unlocked list manipulations in sas_ex_discover_end_dev(), > sas_unregister_common_dev(), and sas_ex_discover_end_dev() Yes, I can see that and that is very unfortunate. [1]: http://marc.info/?l=linux-scsi&m=131480962006471&w=2 Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
| | * [SCSI] fcoe,libfcoe: Move common code for fcoe_get_lesb to fcoe_transportBhanu Prakash Gollapudi2011-10-161-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | Except for obtaining the netdev from lport, fcoe_get_lesb is the common code for the LLDs. Signed-off-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com> Acked-by: Yi Zou <yi.zou@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
| | * [SCSI] isci: export phy events via ->lldd_control_phy()Dan Williams2011-10-021-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow the sas-transport-class to update events for local phys via a new PHY_FUNC_GET_EVENTS command to ->lldd_control_phy(). Fixup drivers that are not prepared for new enum phy_func values, and unify ->lldd_control_phy() error codes. These are the SAS defined phy events that are reported in a smp-report-phy-error-log command: * /sys/class/sas_phy/<phyX>/invalid_dword_count * /sys/class/sas_phy/<phyX>/running_disparity_error_count * /sys/class/sas_phy/<phyX>/loss_of_dword_sync_count * /sys/class/sas_phy/<phyX>/phy_reset_problem_count Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
| | * [SCSI] isci: atapi supportDan Williams2011-10-021-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Based on original implementation from Jiangbi Liu and Maciej Trela. ATAPI transfers happen in two-to-three stages. The two stage atapi commands are those that include a dma data transfer. The data transfer portion of these operations is handled by the hardware packet-dma acceleration. The three-stage commands do not have a data transfer and are handled without hardware assistance in raw frame mode. stage1: transmit host-to-device fis to notify the device of an incoming atapi cdb. Upon reception of the pio-setup-fis repost the task_context to perform the dma transfer of the cdb+data (go to stage3), or repost the task_context to transmit the cdb as a raw frame (go to stage 2). stage2: wait for hardware notification of the cdb transmission and then go to stage 3. stage3: wait for the arrival of the terminating device-to-host fis and terminate the command. To keep the implementation simple we only support ATAPI packet-dma protocol (for commands with data) to avoid needing to handle the data transfer manually (like we do for SATA-PIO). This may affect compatibility for a small number of devices (see ATA_HORKAGE_ATAPI_MOD16_DMA). If the data-transfer underruns, or encounters an error the device-to-host fis is expected to arrive in the unsolicited frame queue to pass to libata for disposition. However, in the DONE_UNEXP_FIS (data underrun) case it appears we need to craft a response. In the DONE_REG_ERR case we do receive the UF and propagate it to libsas. Signed-off-by: Maciej Trela <maciej.trela@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
| | * [SCSI] libfc: cache align struct fc_exch fieldsVasu Dev2011-10-021-15/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cache aligned xid and ex_lock beside removing holes. Signed-off-by: Vasu Dev <vasu.dev@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Yi Zou <yi.zou@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
| | * [SCSI] libfc: cache align struct fc_fcp_pkt fieldsVasu Dev2011-10-021-28/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Re-arrange its fields to avoid padding and have better cacheline alignments. Removed not used start_time, end_time and last_pkt_time fields. This all reduced this struct size to 448 from 480 and that also reduced one cacheline on x86_64 beside eliminating 8 pads. However kept logical fields together. Signed-off-by: Vasu Dev <vasu.dev@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Yi Zou <yi.zou@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
| | * [SCSI] libsas: fix warnings when checking sata/stp protocolDan Williams2011-10-021-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Several sas drivers legitimately check the protocol against the union of SAS_PROTOCOL_SATA and SAS_PROTOCOL_STP. Provide a SAS_PROTOCOL_STP_ALL to silence warnings like: drivers/scsi/pm8001/pm8001_sas.c:438:3: warning: case value ‘5’ not in enumerated type ‘enum sas_protocol’ [-Wswitch] drivers/scsi/mvsas/mv_sas.c:798:2: warning: case value ‘5’ not in enumerated type ‘enum sas_protocol’ [-Wswitch] drivers/scsi/mvsas/mv_sas.c:1783:2: warning: case value ‘5’ not in enumerated type ‘enum sas_protocol’ [-Wswitch] drivers/scsi/mvsas/mv_sas.c:1886:2: warning: case value ‘5’ not in enumerated type ‘enum sas_protocol’ [-Wswitch] drivers/scsi/isci/request.c:3565:2: warning: case value ‘5’ not in enumerated type ‘enum sas_protocol’ [-Wswitch] Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
| | * [SCSI] libsas: fix try_test_sas_gpio_gp_bit() build errorDan Williams2011-10-021-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the user has disabled CONFIG_SCSI_SAS_HOST_SMP then libsas drivers will not be receiving smp-gpio frames and do not need this lookup code. Reported-by: Randy Dunlap <rdunlap@xenotime.net> Tested-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
| | * [SCSI] libsas: Allow expander T-T attachmentsLuben Tuikov2011-10-022-2/+15
| | | | | | | | | | | | | | | | | | | | | | | | Allow expander table-to-table attachments for expanders that support it. Signed-off-by: Luben Tuikov <ltuikov@yahoo.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
| | * [SCSI] libsas: sgpio write supportDan Williams2011-09-222-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add SFF-8485 v0.7 / SAS-1 smp-write-gpio register support to libsas. Defer SAS-2 support unless/until it defines an sgpio interface. Minimum implementation needed to get the lights blinking. try_test_sas_gpio_gp_bit() provides a common method to parse the incoming write data (raw bitstream), and the to_sas_gpio_gp_bit() helper routine can be used as a basis for the set/clear operations for the 'read' implementation. Host implementations parse as many bits (ODx.[012]) as are locally supported and report the number of registers successfully written. If the submitted data overruns the internal number of registers available report the write as a success with the number of bytes remaining reported in ->resid_len. Example (assuming an active backplane) set the "identify" pattern for the first 21 devices: smp_write_gpio --count=2 --data=92,49,24,92,24,92,49,24 -t 4 --index=1 /dev/bsg/sas_hostX Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
| | * [SCSI] scsi_dh: Implement match callback functionHannes Reinecke2011-08-301-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some device handler types are not tied to the vendor/model but rather to a specific capability. Eg ALUA is supported if the 'TPGS' setting in the standard inquiry is set. This patch implements a 'match' callback for device handler which supersedes the original vendor/model lookup and implements the callback for the ALUA handler. Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
| | * [SCSI] scsi_dh_alua: Evaluate TPGS setting from inquiry dataHannes Reinecke2011-08-301-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of issuing a standard inquiry from within the alua device handler we can evaluate the TPGS setting from the existing inquiry data of the sdev and save us the I/O. Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
| | * [SCSI] scsi scan: don't fail scans when host is in recoveryMike Christie2011-08-291-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The problem is that if we are doing a scsi scan then the device goes into recovery then we will wait for the recovery to complete. It waits because scsi-ml will send inquiries or report luns and the queueing code will have been blocked due to the host not being ready. However, if we are in recovery and then a scan is started the scan will silently fail and some devices will not be added. It is easy to hit the problem where devices do not show up with FC where we are doing tests that disrupt the target controllers. When the controller is disruprted (reboot, or setting firmware, etc), and we cause the dev loss tmo to fire then devices will be removed Then when the problem has been fixed, the rport will be scanned and devices should be added back. But if we cause another disruption before scanning has started then devices will not get added back. If the problem is not started until the scan is started then the devices will be added back. This patch fixes that problem by not failing scans when the host is in recovery. We will let scsi-ml send the IO and let the queueing and scsi error handling deal with it like is done if we went into recovery while scanning. For recovery cases where the host is being torn down then with the patch we will still fail the scan since there is not point in scanning. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
| | * [SCSI] scsi: Added support for adapter and firmware resetVikas Chaudhary2011-08-271-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Added new sysfs attr 'host_reset' in scsi_sysfs.c to perform adapter or firmware reset as suggested by Mike Christie here: http://marc.info/?l=linux-scsi&m=127359347111167&w=2 user/application can write "adapter" or "firmware" on this attr and it will call newly added function hook in scsi_host_template to call LDD adapter or firmware reset implementation. Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
| | * [SCSI] scsi_transport_iscsi: Added support to update initiator iscsi portVikas Chaudhary2011-08-271-0/+1
| | | | | | | | | | | | | | | | | | Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
| | * [SCSI] scsi_transport_iscsi: Added support to update mtuVikas Chaudhary2011-08-271-0/+1
| | | | | | | | | | | | | | | | | | Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
| | * [SCSI] scsi_transport_iscsi: Add conn login, kernel to user, event to ↵Manish Rangankar2011-08-272-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | support offload session login. Offload drivers like qla4xxx will offload the sending of the login/logout pdus still, so this patch adds iscsi_conn_login_event which is used by these types of drivers to notify userspace that the connection has changed state. It also adds a iscsi_is_session_online helper so the lld can query the sessions state field. Signed-off-by: Manish Rangankar <manish.rangankar@qlogic.com> Signed-off-by: Lalit Chandivade <lalit.chandivade@qlogic.com> Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
| | * [SCSI] iscsi class: add bsg support to iscsi classMike Christie2011-08-272-1/+116
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds bsg support to the iscsi class. There is only 1 request, the host vendor one, supported. It is expected that this would be used for things like flash updates. This patch is made over this one http://marc.info/?l=linux-scsi&m=131149780020992&w=2 Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
| | * [SCSI] iscsi class: expand vlan supportMike Christie2011-08-271-2/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | Add support to set vlan priority and enable/disble a vlan. Patch based on code from Vikas Chaudhary. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
| | * [SCSI] iscsi class: sysfs group is_visible callout for iscsi host attrsMike Christie2011-08-272-7/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The iscsi class currently does not support writable sysfs attrs for LLD sysfs settings. This patch converts the iscsi class and driver's host attrs to use the attribute container sysfs group and the sysfs group's is_visible callout to be able to support readable or writable sysfs attrs. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
| | * [SCSI] iscsi class: remove iface param maskMike Christie2011-08-272-18/+0
| | | | | | | | | | | | | | | | | | | | | | | | We can replace the iface param mask with the attr_is_visible callback. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
| | * [SCSI] iscsi class: sysfs group is_visible callout for session attrsMike Christie2011-08-272-39/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The iscsi class currently does not support writable sysfs attrs for LLD sysfs settings. This patch converts the iscsi class and driver's session attrs to use the attribute container sysfs group and the sysfs group's is_visible callout to be able to support readable or writable sysfs attrs. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
| | * [SCSI] iscsi cls: sysfs group is_visible callout for conn attrsMike Christie2011-08-271-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The iscsi class currently does not support writable sysfs attrs for LLD sysfs settings. This patch converts the iscsi class and drivers to use the attribute container sysfs group and the sysfs group's is_visible callout to be able to support readable or writable sysfs attrs. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
| | * [SCSI] iscsi class: add iface representationMike Christie2011-08-272-2/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A iscsi host can have multiple interfaces. This patch adds a new iface iscsi class for this. It exports the network settings now, and will be extended to also export iscsi initiator port settings like the isid and initiator name for drivers that can support multiple initiator ports. Based on patch from Lalit Chandivade. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
| | * [SCSI] iscsi_transport: add support for net settingsMike Christie2011-08-272-0/+64
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allows user space (iscsiadm) to send down network configuration parameters for LLD to set private network configuration on the iSCSI adapters. Based on patch from Lalit Chandivade. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
| | * [SCSI] fcoe: Move common functions to fcoe_transport libraryBhanu Prakash Gollapudi2011-08-271-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | Export fcoe_get_wwn, fcoe_validate_vport_create and fcoe_wwn_to_str so that all LLDs can use these common function. Signed-off-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
| | * [SCSI] libsas: export sas_alloc_task()Dan Williams2011-08-271-24/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | Now that isci has added a 3rd open coded user of this functionality just share the libsas version. Acked-by: Jack Wang <jack_wang@usish.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
| * | ore: RAID5 WriteBoaz Harrosh2011-10-241-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is finally the RAID5 Write support. The bigger part of this patch is not the XOR engine itself, But the read4write logic, which is a complete mini prepare_for_striping reading engine that can read scattered pages of a stripe into cache so it can be used for XOR calculation. That is, if the write was not stripe aligned. The main algorithm behind the XOR engine is the 2 dimensional array: struct __stripe_pages_2d. A drawing might save 1000 words --- __stripe_pages_2d | n = pages_in_stripe_unit; w = group_width - parity; | pages array presented to the XOR lib | | V | __1_page_stripe[0].pages --> [c0][c1]..[cw][c_par] <---| | | __1_page_stripe[1].pages --> [c0][c1]..[cw][c_par] <--- | ... | ... | __1_page_stripe[n].pages --> [c0][c1]..[cw][c_par] ^ | data added columns first then row --- The pages are put on this array columns first. .i.e: p0-of-c0, p1-of-c0, ... pn-of-c0, p0-of-c1, ... So we are doing a corner turn of the pages. Note that pages will zigzag down and left. but are put sequentially in growing order. So when the time comes to XOR the stripe, only the beginning and end of the array need be checked. We scan the array and any NULL spot will be field by pages-to-be-read. The FS that wants to support RAID5 needs to supply an operations-vector that searches a given page in cache, and specifies if the page is uptodate or need reading. All these pages to be read are put on a slave ore_io_state and synchronously read. All the pages of a stripe are read in one IO, using the scatter gather mechanism. In write we constrain our IO to only be incomplete on a single stripe. Meaning either the complete IO is within a single stripe so we might have pages to read from both beginning or end of the strip. Or we have some reading to do at beginning but end at strip boundary. The left over pages are pushed to the next IO by the API already established by previous work, where an IO offset/length combination presented to the ORE might get the length truncated and the user must re-submit the leftover pages. (Both exofs and NFS support this) But any ORE user should make it's best effort to align it's IO before hand and avoid complications. A cached ore_layout->stripe_size member can be used for that calculation. (NOTE: that ORE demands that stripe_size may not be bigger then 32bit) What else? Well read it and tell me. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
| * | ore: RAID5 readBoaz Harrosh2011-10-241-3/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces the first stage of RAID5 support mainly the skip-over-raid-units when reading. For writes it inserts BLANK units, into where XOR blocks should be calculated and written to. It introduces the new "general raid maths", and the main additional parameters and components needed for raid5. Since at this stage it could corrupt future version that actually do support raid5. The enablement of raid5 mounting and setting of parity-count > 0 is disabled. So the raid5 code will never be used. Mounting of raid5 is only enabled later once the basic XOR write is also in. But if the patch "enable RAID5" is applied this code has been tested to be able to properly read raid5 volumes and is according to standard. Also it has been tested that the new maths still properly supports RAID0 and grouping code just as before. (BTW: I have found more bugs in the pnfs-obj RAID math fixed here) The ore.c file is getting too big, so new ore_raid.[hc] files are added that will include the special raid stuff that are not used in striping and mirrors. In future write support these will get bigger. When adding the ore_raid.c to Kbuild file I was forced to rename ore.ko to libore.ko. Is it possible to keep source file, say ore.c and module file ore.ko the same even if there are multiple files inside ore.ko? Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
| * | ore: Make ore_calc_stripe_info EXPORT_SYMBOLBoaz Harrosh2011-10-241-0/+3
| | | | | | | | | | | | | | | | | | | | | ore_calc_stripe_info is needed by exofs::export.c for the layout calculations. Make it exportable Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
| * | ore/exofs: Change ore_check_io APIBoaz Harrosh2011-10-141-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current ore_check_io API receives a residual pointer, to report partial IO. But it is actually not used, because in a multiple devices IO there is never a linearity in the IO failure. On the other hand if every failing device is reported through a received callback measures can be taken to handle only failed devices. One at a time. This will also be needed by the objects-layout-driver for it's error reporting facility. Exofs is not currently using the new information and keeps the old behaviour of failing the complete IO in case of an error. (No partial completion) TODO: Use an ore_check_io callback to set_page_error only the failing pages. And re-dirty write pages. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
| * | ore/exofs: Define new ore_verify_layoutBoaz Harrosh2011-10-141-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All users of the ore will need to check if current code supports the given layout. For example RAID5/6 is not currently supported. So move all the checks from exofs/super.c to a new ore_verify_layout() to be used by ore users. Note that any new layout should be passed through the ore_verify_layout() because the ore engine will prepare and verify some internal members of ore_layout, and assumes it's called. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
| * | ore: Support for partial component tableBoaz Harrosh2011-10-141-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Users like the objlayout-driver would like to only pass a partial device table that covers the IO in question. For example exofs divides the file into raid-group-sized chunks and only serves group_width number of devices at a time. The partiality is communicated by setting ore_componets->first_dev and the array covers all logical devices from oc->first_dev upto (oc->first_dev + oc->numdevs) The ore_comp_dev() API receives a logical device index and returns the actual present device in the table. An out-of-range dev_index will BUG. Logical device index is the theoretical device index as if all the devices of a file are present. .i.e: total_devs = group_width * mirror_p1 * group_count 0 <= dev_index < total_devs Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
| * | ore: cleanup: Embed an ore_striping_info inside ore_io_stateBoaz Harrosh2011-10-141-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that each ore_io_state covers only a single raid group. A single striping_info math is needed. Embed one inside ore_io_state to cache the calculation results and eliminate an extra call. Also the outer _prepare_for_striping is removed since it does nothing. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
| * | ore/exofs: Change the type of the devices array (API change)Boaz Harrosh2011-10-041-1/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the pNFS obj-LD the device table at the layout level needs to point to a device_cache node, where it is possible and likely that many layouts will point to the same device-nodes. In Exofs we have a more orderly structure where we have a single array of devices that repeats twice for a round-robin view of the device table This patch moves to a model that can be used by the pNFS obj-LD where struct ore_components holds an array of ore_dev-pointers. (ore_dev is newly defined and contains a struct osd_dev *od member) Each pointer in the array of pointers will point to a bigger user-defined dev_struct. That can be accessed by use of the container_of macro. In Exofs an __alloc_dev_table() function allocates the ore_dev-pointers array as well as an exofs_dev array, in one allocation and does the addresses dance to set everything pointing correctly. It still keeps the double allocation trick for the inodes round-robin view of the table. The device table is always allocated dynamically, also for the single device case. So it is unconditionally freed at umount. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
| * | ore: Make ore_striping_info and ore_calc_stripe_info publicBoaz Harrosh2011-10-031-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | The struct ore_striping_info will be used later in other structures. And ore_calc_stripe_info as well. Rename them make struct ore_striping_info public. ore_calc_stripe_info is still static, will be made public on first use. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
| * | exofs: Remove unused data_map member from exofs_sb_infoBoaz Harrosh2011-10-031-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The struct pnfs_osd_data_map data_map member of exofs_sb_info was never used after mount. In fact all it's members were duplicated by the ore_layout structure. So just remove the duplicated information. Also removed some stupid, but perfectly supported, restrictions on layout parameters. The case where num_devices is not divisible by mirror_count+1 is perfectly fine since the rotating device view will eventually use all the devices it can get. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com>
| * | exofs: Rename struct ore_components comps => ocBoaz Harrosh2011-10-031-1/+1
| |/ | | | | | | | | | | | | | | | | ore_components already has a comps member so this leads to things like comps->comps which is annoying. the name oc was already used in new code. So rename all old usage of ore_components comps => ore_components oc. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
| * Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osdLinus Torvalds2011-08-061-0/+125
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-linus' of git://git.open-osd.org/linux-open-osd: ore: Make ore its own module exofs: Rename raid engine from exofs/ios.c => ore exofs: ios: Move to a per inode components & device-table exofs: Move exofs specific osd operations out of ios.c exofs: Add offset/length to exofs_get_io_state exofs: Fix truncate for the raid-groups case exofs: Small cleanup of exofs_fill_super exofs: BUG: Avoid sbi realloc exofs: Remove pnfs-osd private definitions nfs_xdr: Move nfs4_string definition out of #ifdef CONFIG_NFS_V4
| | * exofs: Rename raid engine from exofs/ios.c => oreBoaz Harrosh2011-08-061-0/+125
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ORE stands for "Objects Raid Engine" This patch is a mechanical rename of everything that was in ios.c and its API declaration to an ore.c and an osd_ore.h header. The ore engine will later be used by the pnfs objects layout driver. * File ios.c => ore.c * Declaration of types and API are moved from exofs.h to a new osd_ore.h * All used types are prefixed by ore_ from their exofs_ name. * Shift includes from exofs.h to osd_ore.h so osd_ore.h is independent, include it from exofs.h. Other than a pure rename there are no other changes. Next patch will move the ore into it's own module and will export the API to be used by exofs and later the layout driver Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>