aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/infiniband
Commit message (Collapse)AuthorAgeFilesLines
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6Linus Torvalds2008-07-155-334/+380
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (102 commits) [SCSI] scsi_dh: fix kconfig related build errors [SCSI] sym53c8xx: Fix bogus sym_que_entry re-implementation of container_of [SCSI] scsi_cmnd.h: remove double inclusion of linux/blkdev.h [SCSI] make struct scsi_{host,target}_type static [SCSI] fix locking in host use of blk_plug_device() [SCSI] zfcp: Cleanup external header file [SCSI] zfcp: Cleanup code in zfcp_erp.c [SCSI] zfcp: zfcp_fsf cleanup. [SCSI] zfcp: consolidate sysfs things into one file. [SCSI] zfcp: Cleanup of code in zfcp_aux.c [SCSI] zfcp: Cleanup of code in zfcp_scsi.c [SCSI] zfcp: Move status accessors from zfcp to SCSI include file. [SCSI] zfcp: Small QDIO cleanups [SCSI] zfcp: Adapter reopen for large number of unsolicited status [SCSI] zfcp: Fix error checking for ELS ADISC requests [SCSI] zfcp: wait until adapter is finished with ERP during auto-port [SCSI] ibmvfc: IBM Power Virtual Fibre Channel Adapter Client Driver [SCSI] sg: Add target reset support [SCSI] lib: Add support for the T10 (SCSI) Data Integrity Field CRC [SCSI] sd: Move scsi_disk() accessor function to sd.h ...
| * [SCSI] libiscsi, iscsi_tcp, ib_iser: fix setting of can_queue with old tools.Mike Christie2008-07-121-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes two bugs that are related. 1. Old tools did not set can_queue/cmds_max. This patch modifies libiscsi so that when we add the host we catch this and set it to the default. 2. iscsi_tcp thought that the scsi command that was passed to the eh functions needed a iscsi_cmd_task allocated for it. It only needed a mgmt task, and now it does not matter since it all comes from the same pool and libiscsi handles this for the drivers. ib_iser had copied iscsi_tcp's code and set can_queue to its max - 1 to handle this. So this patch removes the max -1, and just sets it to the max. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
| * [SCSI] libiscsi, iser, tcp: remove recv_lockMike Christie2008-07-123-12/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The recv lock was defined so the iscsi layer could block the recv path from processing IO during recovery. It turns out iser just set a lock to that pointer which was pointless. We now disconnect the transport connection before doing recovery so we do not need the recv lock. For iscsi_tcp we still stop the recv path incase older tools are being used. This patch also has iscsi_itt_to_ctask user grab the session lock and has the caller access the task with the lock or get a ref to it in case the target is broken and sends a tmf success response then sends data or a response for the command that was supposed to be affected bty the tmf. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
| * [SCSI] iscsi class: Add session initiatorname and ifacename sysfs attrs.Mike Christie2008-07-121-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds two new attrs used for creating initiator ports and binding sessions to hardware. The session level initiatorname: Since bnx2i does a scsi_host per host device, we need to add the iface initiator port settings on the session, so we can create multiple initiator ports (each with different inames) per device/scsi_host. The current iname reflects that qla4xxx can have one iname per hba, and we are allocating a host per session for software. The iname on the host will remain so we can export and set the hba level qla4xxx setting. The ifacename attr: To bind a session to a some peice of hardware in userspace we maintain some mappings, but during boot or iscsid restart (iscsid contains the user space part of the driver) we need to be able to figure out which of those host mappings abstractions maps to certain sessions. This patch adds a ifacename attr, which userspace can set to id the host side of the endpoint across pivot_roots and iscsid restarts. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
| * [SCSI] iser: Modify iser to take a iscsi_endpoint struct in ep callouts and ↵Mike Christie2008-07-123-62/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | session setup This hooks iser into the iscsi endpoint code. Previously it handled the lookup and allocation. This has been made generic so bnx2i and iser can share it. It also allows us to pass iser the leading conn's ep, so we know the ib_deivce being used and can set it as the scsi_host's parent. And that allows scsi-ml to set the dma_mask based on those values. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
| * [SCSI] iscsi class: user device_for_each_child instead of duplicating ↵Mike Christie2008-07-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | session list Currently we duplicate the list of sessions, because we were using the test for if a session was on the host list to indicate if the session was bound or unbound. We can instead use the target_id and fix up the class so that drivers like bnx2i do not have to manage the target id space. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
| * [SCSI] iser: handle iscsi_cmd_task renameMike Christie2008-07-124-202/+203
| | | | | | | | | | | | | | | | This handles the iscsi_cmd_task rename and renames the iser cmd task to iser task. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
| * [SCSI] iser: convert ib_iser to support merged tasksMike Christie2008-07-123-70/+68
| | | | | | | | | | | | | | Convert ib_iser to support merged tasks. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
| * [SCSI] libiscsi, iscsi_tcp, iser: add session cmds array accessorMike Christie2008-07-122-20/+11
| | | | | | | | | | | | | | | | | | | | | | Currently to get a ctask from the session cmd array, you have to know to use the itt modifier. To make this easier on LLDs and so in the future we can easilly kill the session array and use the host shared map instead, this patch adds a nice wrapper to strip the itt into a session->cmds index and return a ctask. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
| * [SCSI] iser: fix handling of scsi cmnds during recovery.Mike Christie2008-07-123-4/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After the stop_conn callback has returned the LLD should not touch the scsi cmds. iscsi_tcp and libiscsi use the conn->recv_lock and suspend_rx field to halt recv path processing, but iser does not have any protection. This patch modifies iser so that userspace can just call the ep_disconnect callback, which will halt all recv IO, before calling the stop_conn callback so we do not have to worry about the conn->recv_lock and suspend rx field. iser just needs to stop the send side from accessing the ib conn. Fixup to handle when the ep poll fails and ep disconnect is called from Erez. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
| * [SCSI] iscsi: remove session/conn_data_size from iscsi_transportMike Christie2008-07-121-13/+3
| | | | | | | | | | | | | | | | | | This removes the session and conn data_size fields from the iscsi_transport. Just pass in the value like with host allocation. This patch also makes it so the LLD iscsi_conn data is allocated with the iscsi_cls_conn. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
| * [SCSI] iscsi: add iscsi host helpersMike Christie2008-07-121-11/+6
| | | | | | | | | | | | | | | | This finishes the host/session unbinding, by adding some helpers to add and remove hosts and the session they manage. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
| * [SCSI] iscsi: remove session and host binding in libiscsiMike Christie2008-07-121-20/+54
| | | | | | | | | | | | | | | | | | | | | | bnx2i allocates a host per netdevice but will use libiscsi, so this unbinds the session from the host in that code. This will also be useful for the iser parent device dma settings fixes. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
| * [SCSI] iscsi class, iscsi drivers: remove unused iscsi_transport attrsMike Christie2008-07-122-2/+0
| | | | | | | | | | | | | | | | | | max_cmd_len and max_conn are not really used. max_cmd_len is always 16 and can be set by the LLD. max_conn is always one since we do not support MCS. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
| * [SCSI] iscsi class, iscsi_tcp/iser: add host arg to session creationMike Christie2008-07-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | iscsi offload (bnx2i and qla4xx) allocate a scsi host per hba, so the session creation path needs a shost/host_no argument. Software iscsi/iser will follow the same behabior as before where it allcoates a host per session, but in the future iser will probably look more like bnx2i where the host's parent is the hardware (rnic for iser and for bnx2i it is the nic), because it does not use a socket layer like how iscsi_tcp does. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
* | Merge branch 'core/rcu' into core/rcu-for-linusIngo Molnar2008-07-152-2/+2
|\ \
| * \ Merge branch 'linus' into core/rcuIngo Molnar2008-07-111-0/+4
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: include/linux/rculist.h kernel/rcupreempt.c Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * \ \ Merge commit 'v2.6.26-rc8' into core/rcuIngo Molnar2008-06-261-1/+5
| |\ \ \
| * \ \ \ Merge branch 'linus' into core/rcuIngo Molnar2008-06-232-4/+2
| |\ \ \ \
| * \ \ \ \ Merge branch 'linus' into core/rcuIngo Molnar2008-06-1615-40/+69
| |\ \ \ \ \
| * | | | | | rcu: split list.h and move rcu-protected lists into rculist.hFranck Bui-Huu2008-05-192-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move rcu-protected lists from list.h into a new header file rculist.h. This is done because list are a very used primitive structure all over the kernel and it's currently impossible to include other header files in this list.h without creating some circular dependencies. For example, list.h implements rcu-protected list and uses rcu_dereference() without including rcupdate.h. It actually compiles because users of rcu_dereference() are macros. Others RCU functions could be used too but aren't probably because of this. Therefore this patch creates rculist.h which includes rcupdates without to many changes/troubles. Signed-off-by: Franck Bui-Huu <fbuihuu@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Josh Triplett <josh@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | | | | | IB/mlx4: Use kzalloc() for new QPs so flags are initialized to 0Eli Cohen2008-07-141-13/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current code uses kmalloc() and then just does a bitwise OR operation on qp->flags in create_qp_common(), which means that qp->flags may potentially have some unintended bits set. This patch uses kzalloc() and avoids further explicit clearing of structure members, which also shrinks the code: add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-65 (-65) function old new delta create_qp_common 2024 1959 -65 Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* | | | | | | RDMA/cma: Simplify locking needed for serialization of callbacksOr Gerlitz2008-07-141-56/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The RDMA CM has some logic in place to make sure that callbacks on a given CM ID are delivered to the consumer in a serialized manner. Specifically it has code to protect against a device removal racing with a running callback function. This patch simplifies this logic by using a mutex per ID instead of a wait queue and atomic variable. This means that cma_disable_remove() now is more properly named to cma_disable_callback(), and cma_enable_remove() can now be removed because it just would become a trivial wrapper around mutex_unlock(). Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* | | | | | | RDMA/addr: Keep pointer to netdevice in struct rdma_dev_addrOr Gerlitz2008-07-142-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Keep a pointer to the local (src) netdevice in struct rdma_dev_addr, and copy it in as part of rdma_copy_addr(). Use rdma_translate_ip() in cma_new_conn_id() to reduce some code duplication and also make sure the src_dev member gets set. In a high-availability configuration the netdevice pointer can be used by the RDMA CM to align RDMA sessions to use the same links as the IP stack does under fail-over and route change cases. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* | | | | | | RDMA/cxgb3: Fixes for zero STagSteve Wise2008-07-144-26/+130
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Handling the zero STag in receive work request requires some extra logic in the driver: - Only set the QP_PRIV bit for kernel mode QPs. - Add a zero STag build function for recv wrs. The uP needs a PBL allocated and passed down in the recv WR so it can construct a HW PBL for the zero STag S/G entries. Note: we need to place a few restrictions on zero STag usage because of this: 1) all SGEs in a recv WR must either be zero STag or not. No mixing. 2) an individual SGE length cannot exceed 128MB for a zero-stag SGE. This should be OK since it's not really practical to allocate such a large chunk of pinned contiguous DMA mapped memory. - Add an optimized non-zero-STag recv wr format for kernel users. This is needed to optimize both zero and non-zero STag cracking in the recv path for kernel users. - Remove the iwch_ prefix from the static build functions. - Bump required FW version. Signed-off-by: Steve Wise <swise@opengridcomputing.com>
* | | | | | | RDMA/core: Add local DMA L_Key supportSteve Wise2008-07-143-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Change the IB_DEVICE_ZERO_STAG flag to the transport-neutral name IB_DEVICE_LOCAL_DMA_LKEY, which is used by iWARP RNICs to indicate 0 STag support and IB HCAs to indicate reserved L_Key support. - Add a u32 local_dma_lkey member to struct ib_device. Drivers fill this in with the appropriate local DMA L_Key (if they support it). - Fix up the drivers using this flag. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* | | | | | | IB/mthca: Fix check of max_send_sge for special QPsRoland Dreier2008-07-141-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The MLX transport requires two extra gather entries for sends (one for the header and one for the checksum at the end, as the comment says). However the code checked that max_recv_sge was not too big, instead of checking max_send_sge as it should have. Fix the code to check the correct condition. Signed-off-by: Roland Dreier <rolandd@cisco.com>
* | | | | | | IB/mthca: Use round_jiffies() for catastrophic error polling timerRoland Dreier2008-07-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Exactly when the catastrophic error polling timer function runs is not important, so use round_jiffies() to save unnecessary wakeups. Signed-off-by: Roland Dreier <rolandd@cisco.com>
* | | | | | | IB/mthca: Remove "stop" flag for catastrophic error polling timerRoland Dreier2008-07-142-14/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since we use del_timer_sync() anyway, there's no need for an additional flag to tell the timer not to rearm. Signed-off-by: Roland Dreier <rolandd@cisco.com>
* | | | | | | IPoIB: Double default RX/TX ring sizesEli Cohen2008-07-141-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Increase IPoIB ring sizes to twice their original sizes (RX: 128->256, TX: 64->128) to act as a shock absorber for high traffic peaks. With the current settings, we have seen cases that there are many calls to netif_stop_queue(), which causes degradation in throughput. Also, larger receive buffer sizes help IPoIB in CM mode to avoid experiencing RNR NAK conditions due to insufficient receive buffers at the SRQ. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* | | | | | | IPoIB/cm: Reduce connected mode TX object sizeEli Cohen2008-07-142-7/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since IPoIB connected mode does not NETIF_F_SG, we only have one DMA mapping per send, so we don't need a mapping[] array. Define a new struct with a single u64 mapping member and use it for the CM tx_ring. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* | | | | | | IB/ipath: Use IEEE OUI for vendor_id reported by ibv_query_device()Ralph Campbell2008-07-142-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The IB spe. for SubnGet(NodeInfo) and query HCA says that the vendor ID field should be the IEEE OUI assigned to the vendor. The ipath driver was returning the PCI vendor ID instead. This will affect applications which call ibv_query_device(). The old value was 0x001fc1 or 0x001077, the new value is 0x001175. The vendor ID doesn't appear to be exported via /sys so that should reduce possible compatibility issues. I'm only aware of Open MPI as a major application which depends on this change, and they have made necessary adjustments. Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* | | | | | | IPoIB: Use dev_set_mtu() to change mtuEli Cohen2008-07-142-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the driver sets the MTU of the net device outside of its change_mtu method, it should make use of dev_set_mtu() instead of directly setting the mtu field of struct netdevice. Otherwise functions registered to be called upon MTU change will not get called (this is done through call_netdevice_notifiers() in dev_set_mtu()). Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* | | | | | | IPoIB: Use rtnl lock/unlock when changing device flagsEli Cohen2008-07-142-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use of this lock is required to synchronize changes to the netdvice's data structs. Also move the call to ipoib_flush_paths() after the modification of the netdevice flags in set_mode(). Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* | | | | | | IPoIB: Get rid of ipoib_mcast_detach() wrapperRoland Dreier2008-07-143-17/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ipoib_mcast_detach() does nothing except call ib_detach_mcast(), so just use the core API in the one place that does a multicast group detach. add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-105 (-105) function old new delta ipoib_mcast_leave 357 319 -38 ipoib_mcast_detach 67 - -67 Signed-off-by: Roland Dreier <rolandd@cisco.com>
* | | | | | | IPoIB: Only set Q_Key once: after joining broadcast groupEli Cohen2008-07-143-15/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current code will set the Q_Key for any join of a non-sendonly multicast group. The operation involves a modify QP operation, which is fairly heavyweight, and is only really required after the join of the broadcast group. Fix this by adding a parameter to ipoib_mcast_attach() to control when the Q_Key is set. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* | | | | | | IPoIB: Remove priv->mcast_mutexEli Cohen2008-07-143-6/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | No need for a mutex around calls to ib_attach_mcast/ib_detach_mcast since these operations are synchronized at the HW driver layer. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* | | | | | | IPoIB: Remove unused IPOIB_MCAST_STARTED codeEli Cohen2008-07-142-9/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The IPOIB_MCAST_STARTED flag is not used at all since commit b3e2749b ("IPoIB: Don't drop multicast sends when they can be queued"), so remove it. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* | | | | | | RDMA/cxgb3: Set rkey field for new memory windows in iwch_alloc_mw()Steve Wise2008-07-141-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* | | | | | | RDMA/nes: Get rid of ring_doorbell parameter of nes_post_cqp_request()Roland Dreier2008-07-146-23/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Every caller of nes_post_cqp_request() passed it NES_CQP_REQUEST_RING_DOORBELL, so just remove that parameter and always ring the doorbell. Signed-off-by: Roland Dreier <rolandd@cisco.com> Acked-by: Faisal Latif <flatif@neteffect.com>
* | | | | | | RDMA/cxgb3: Propagate HW page size capabilitiesJon Mason2008-07-143-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cxgb3 does not currently report the page size capabilities, and incorrectly reports them internally. This version changes the bit-shifting to a static value (per Steve's request). Signed-off-by: Jon Mason <jon@opengridcomputing.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* | | | | | | RDMA/nes: Encapsulate logic nes_put_cqp_request()Roland Dreier2008-07-144-211/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The iw_nes driver repeats the logic if (atomic_dec_and_test(&cqp_request->refcount)) { if (cqp_request->dynamic) { kfree(cqp_request); } else { spin_lock_irqsave(&nesdev->cqp.lock, flags); list_add_tail(&cqp_request->list, &nesdev->cqp_avail_reqs); spin_unlock_irqrestore(&nesdev->cqp.lock, flags); } } over and over. Wrap this up in functions nes_free_cqp_request() and nes_put_cqp_request() to simplify such code. In addition to making the source smaller and more readable, this shrinks the compiled code quite a bit: add/remove: 2/0 grow/shrink: 0/13 up/down: 164/-1692 (-1528) function old new delta nes_free_cqp_request - 147 +147 nes_put_cqp_request - 17 +17 nes_modify_qp 2316 2293 -23 nes_hw_modify_qp 737 657 -80 nes_dereg_mr 945 860 -85 flush_wqes 501 416 -85 nes_manage_apbvt 648 560 -88 nes_reg_mr 1117 1026 -91 nes_cqp_ce_handler 927 769 -158 nes_alloc_mw 1052 884 -168 nes_create_qp 5314 5141 -173 nes_alloc_fmr 2212 2035 -177 nes_destroy_cq 1097 918 -179 nes_create_cq 2787 2598 -189 nes_dealloc_mw 762 566 -196 Signed-off-by: Roland Dreier <rolandd@cisco.com> Acked-by: Faisal Latif <flatif@neteffect.com>
* | | | | | | IPoIB: Refresh paths instead of flushing them on SM change eventsMoni Shoua2008-07-144-30/+91
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The patch tries to solve the problem of device going down and paths being flushed on an SM change event. The method is to mark the paths as candidates for refresh (by setting the new valid flag to 0), and wait for an ARP probe a new path record query. The solution requires a different and less intrusive handling of SM change event. For that, the second argument of the flush function changes its meaning from a boolean flag to a level. In most cases, SM failover doesn't cause LID change so traffic won't stop. In the rare cases of LID change, the remote host (the one that hadn't changed its LID) will lose connectivity until paths are refreshed. This is no worse than the current state. In fact, preventing the device from going down saves packets that otherwise would be lost. Signed-off-by: Moni Levy <monil@voltaire.com> Signed-off-by: Moni Shoua <monis@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* | | | | | | IB/ehca: Make device table externally visibleJoachim Fenkes2008-07-141-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This gives ehca an autogenerated modalias and therefore enables automatic loading. Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* | | | | | | IPoIB: add LRO supportVladimir Sokolovsky2008-07-145-1/+127
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add "ipoib_use_lro" module parameter to enable LRO and an "ipoib_lro_max_aggr" module parameter to set the max number of packets to be aggregated. Make LRO controllable and LRO statistics accessible through ethtool. Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.co.il> Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* | | | | | | IPoIB: Use multicast loopback blocking if availableRon Livne2008-07-141-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Set IB_QP_CREATE_BLOCK_MULTICAST_LOOPBACK for IPoIB's UD QPs if supported by the underlying device. This creates an improvement of up to 39% in bandwidth when sending multicast packets with IPoIB, and an improvment of 12% in cpu usage. Signed-off-by: Ron Livne <ronli@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* | | | | | | IB/mlx4: Add support for blocking multicast loopback packetsRon Livne2008-07-143-6/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for handling the IB_QP_CREATE_MULTICAST_BLOCK_LOOPBACK flag by using the per-multicast group loopback blocking feature of mlx4 hardware. Signed-off-by: Ron Livne <ronli@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* | | | | | | RDMA/cxgb3: Add support for protocol statisticsSteve Wise2008-07-141-4/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Add a new rdma ctl command called RDMA_GET_MIB to the cxgb3 low level driver to obtain the protocol mib from the rnic hardware. - Add new iw_cxgb3 provider method to get the MIB from the low level driver. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* | | | | | | RDMA/core: Add iWARP protocol statistics attributes in sysfsSteve Wise2008-07-141-0/+120
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a sysfs attribute group called "proto_stats" under /sys/class/infiniband/$device/ and populates this group with protocol statistics if they exist for a given device. Currently, only iWARP stats are defined, but the code is designed to allow InfiniBand protocol stats if they become available. These stats are per-device and more importantly -not- per port. Details: - Add union rdma_protocol_stats in ib_verbs.h. This union allows defining transport-specific stats. Currently only iwarp stats are defined. - Add struct iw_protocol_stats to define the current set of iwarp protocol stats. - Add new ib_device method called get_proto_stats() to return protocol statistics. - Add logic in core/sysfs.c to create iwarp protocol stats attributes if the device is an RNIC and has a get_proto_stats() method. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* | | | | | | IPoIB/cm: Fix racy use of receive WR/SGL in ipoib_cm_post_receive_nonsrq()Roland Dreier2008-07-141-16/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For devices that don't support SRQs, ipoib_cm_post_receive_nonsrq() is called from both ipoib_cm_handle_rx_wc() and ipoib_cm_nonsrq_init_rx(), and these two callers are not synchronized against each other. However, ipoib_cm_post_receive_nonsrq() always reuses the same receive work request and scatter list structures, so multiple callers can end up stepping on each other, which leads to posting garbled work requests. Fix this by having the caller pass in the ib_recv_wr and ib_sge structures to use, and allocating new local structures in ipoib_cm_nonsrq_init_rx(). Based on a patch by Pradeep Satyanarayana <pradeep@us.ibm.com> and David Wilder <dwilder@us.ibm.com>, with debugging help from Hoang-Nam Nguyen <hnguyen@de.ibm.com>. Signed-off-by: Roland Dreier <rolandd@cisco.com>