aboutsummaryrefslogtreecommitdiffstats
path: root/net/core/dev.c
Commit message (Collapse)AuthorAgeFilesLines
* net: check queue_index from sock is valid for deviceTom Herbert2010-11-011-1/+1
| | | | | | | | | | | | | In dev_pick_tx recompute the queue index if the value stored in the socket is greater than or equal to the number of real queues for the device. The saved index in the sock structure is not guaranteed to be appropriate for the egress device (this could happen on a route change or in presence of tunnelling). The result of the queue index being bad would be to return a bogus queue (crash could prersumably follow). Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: NETIF_F_HW_CSUM does not imply FCoE CRC offloadBen Hutchings2010-10-271-3/+3
| | | | | | | | | | NETIF_F_HW_CSUM indicates the ability to update an TCP/IP-style 16-bit checksum with the checksum of an arbitrary part of the packet data, whereas the FCoE CRC is something entirely different. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Cc: stable@kernel.org [2.6.32+] Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Fix some corner cases in dev_can_checksum()Ben Hutchings2010-10-271-10/+6
| | | | | | | | | | | | | | | dev_can_checksum() incorrectly returns true in these cases: 1. The skb has both out-of-band and in-band VLAN tags and the device supports checksum offload for the encapsulated protocol but only with one layer of encapsulation. 2. The skb has a VLAN tag and the device supports generic checksumming but not in conjunction with VLAN encapsulation. Rearrange the VLAN tag checks to avoid these. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* rps: add __rcu annotationsEric Dumazet2010-10-251-6/+6
| | | | | | | | | | | | Add __rcu annotations to : (struct netdev_rx_queue)->rps_map (struct netdev_rx_queue)->rps_flow_table struct rps_sock_flow_table *rps_sock_flow_table; And use appropriate rcu primitives. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: ip6_ptr rcu annotationsEric Dumazet2010-10-251-1/+1
| | | | | | | | | (struct net_device)->ip6_ptr is rcu protected : add __rcu annotation and proper rcu primitives. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Increase xmit RECURSION_LIMIT to 10.David S. Miller2010-10-251-1/+1
| | | | | | | | Three is definitely too low, and we know from reports that GRE tunnels stacked as deeply as 37 levels cause stack overflows, so pick some reasonable value between those two. Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6Linus Torvalds2010-10-231-234/+363
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1699 commits) bnx2/bnx2x: Unsupported Ethtool operations should return -EINVAL. vlan: Calling vlan_hwaccel_do_receive() is always valid. tproxy: use the interface primary IP address as a default value for --on-ip tproxy: added IPv6 support to the socket match cxgb3: function namespace cleanup tproxy: added IPv6 support to the TPROXY target tproxy: added IPv6 socket lookup function to nf_tproxy_core be2net: Changes to use only priority codes allowed by f/w tproxy: allow non-local binds of IPv6 sockets if IP_TRANSPARENT is enabled tproxy: added tproxy sockopt interface in the IPV6 layer tproxy: added udp6_lib_lookup function tproxy: added const specifiers to udp lookup functions tproxy: split off ipv6 defragmentation to a separate module l2tp: small cleanup nf_nat: restrict ICMP translation for embedded header can: mcp251x: fix generation of error frames can: mcp251x: fix endless loop in interrupt handler if CANINTF_MERRF is set can-raw: add msg_flags to distinguish local traffic 9p: client code cleanup rds: make local functions/variables static ... Fix up conflicts in net/core/dev.c, drivers/net/pcmcia/smc91c92_cs.c and drivers/net/wireless/ath/ath9k/debug.c as per David
| * Merge branch 'master' of ↵David S. Miller2010-10-211-1/+1
| |\ | | | | | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/core/dev.c
| | * net/core: Allow tagged VLAN packets to flow through VETH devices.Ben Greear2010-10-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When there are VLANs on a VETH device, the packets being transmitted through the VETH device may be 4 bytes bigger than MTU. A check in dev_forward_skb did not take this into account and so dropped these packets. This patch is needed at least as far back as 2.6.34.7 and should be considered for -stable. Signed-off-by: Ben Greear <greearb@candelatech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | napi: unexport napi_reuse_skbstephen hemminger2010-10-211-2/+1
| | | | | | | | | | | | | | | | | | | | | The function napi_reuse_skb is only used inside core. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | vlan: Centralize handling of hardware acceleration.Jesse Gross2010-10-211-32/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently each driver that is capable of vlan hardware acceleration must be aware of the vlan groups that are configured and then pass the stripped tag to a specialized receive function. This is different from other types of hardware offload in that it places a significant amount of knowledge in the driver itself rather keeping it in the networking core. This makes vlan offloading function more similarly to other forms of offloading (such as checksum offloading or TSO) by doing the following: * On receive, stripped vlans are passed directly to the network core, without attempting to check for vlan groups or reconstructing the header if no group * vlans are made less special by folding the logic into the main receive routines * On transmit, the device layer will add the vlan header in software if the hardware doesn't support it, instead of spreading that logic out in upper layers, such as bonding. There are a number of advantages to this: * Fixes all bugs with drivers incorrectly dropping vlan headers at once. * Avoids having to disable VLAN acceleration when in promiscuous mode (good for bridging since it always puts devices in promiscuous mode). * Keeps VLAN tag separate until given to ultimate consumer, which avoids needing to do header reconstruction as in tg3 unless absolutely necessary. * Consolidates common code in core networking. Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | vlan: Enable software emulation for vlan accleration.Jesse Gross2010-10-211-3/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently users of hardware vlan accleration need to know whether the device supports it before generating packets. However, vlan acceleration will soon be available in a more flexible manner so knowing ahead of time becomes much more difficult. This adds a software fallback path for vlan packets on devices without the necessary offloading support, similar to other types of hardware accleration. Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: allocate tx queues in register_netdeviceTom Herbert2010-10-201-53/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces netif_alloc_netdev_queues which is called from register_device instead of alloc_netdev_mq. This makes TX queue allocation symmetric with RX allocation. Also, queue locks allocation is done in netdev_init_one_queue. Change set_real_num_tx_queues to fail if requested number < 1 or greater than number of allocated queues. Signed-off-by: Tom Herbert <therbert@google.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: cleanups in RX queue allocationTom Herbert2010-10-201-19/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clean up in RX queue allocation. In netif_set_real_num_rx_queues return error on attempt to set zero queues, or requested number is greater than number of allocated queues. In netif_alloc_rx_queues, do BUG_ON if queue_count is zero. Signed-off-by: Tom Herbert <therbert@google.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: fail alloc_netdev_mq if queue count < 1Tom Herbert2010-10-201-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | In alloc_netdev_mq fail if requested queue_count < 1. Signed-off-by: Tom Herbert <therbert@google.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: percpu net_device refcountEric Dumazet2010-10-121-7/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We tried very hard to remove all possible dev_hold()/dev_put() pairs in network stack, using RCU conversions. There is still an unavoidable device refcount change for every dst we create/destroy, and this can slow down some workloads (routers or some app servers, mmap af_packet) We can switch to a percpu refcount implementation, now dynamic per_cpu infrastructure is mature. On a 64 cpus machine, this consumes 256 bytes per device. On x86, dev_hold(dev) code : before lock incl 0x280(%ebx) after: movl 0x260(%ebx),%eax incl fs:(%eax) Stress bench : (Sending 160.000.000 UDP frames, IP route cache disabled, dual E5540 @2.53GHz, 32bit kernel, FIB_TRIE) Before: real 1m1.662s user 0m14.373s sys 12m55.960s After: real 0m51.179s user 0m15.329s sys 10m15.942s Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: Fix rxq ref countingTom Herbert2010-10-081-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The rx->count reference is used to track reference counts to the number of rx-queue kobjects created for the device. This patch eliminates initialization of the counter in netif_alloc_rx_queues and instead increments the counter each time a kobject is created. This is now symmetric with the decrement that is done when an object is released. Signed-off-by: Tom Herbert <therbert@google.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: Update kernel-doc for netif_set_real_num_rx_queues()Ben Hutchings2010-10-081-2/+2
| | | | | | | | | | | | | | | | | | | | | Synchronise the comment with the preceding implementation change. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: netif_set_real_num_rx_queues may cap num_rx_queues at init timeJohn Fastabend2010-10-061-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Do not set num_rx_queues in netif_set_real_num_rx_queues() some drivers will increase the real_num_rx_queues later due to a feature changes or available interrupts increasing. By setting num_rx_queues here this ends up creating a cap on the number of rx queues available. For example the ixgbe driver sets the max number of queues it intends to use ever then sets the current number in use with the netif_set_num_{rx|tx}_queues calls. With the current implementation the number of rx queues gets limited so when a feature such as DCB or FCoE is enabled the queues are no longer available. kobjects will only be allocated for real_num_rx_queues so the waste in memory is minimal. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: add a core netdev->rx_dropped counterEric Dumazet2010-10-051-8/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In various situations, a device provides a packet to our stack and we drop it before it enters protocol stack : - softnet backlog full (accounted in /proc/net/softnet_stat) - bad vlan tag (not accounted) - unknown/unregistered protocol (not accounted) We can handle a per-device counter of such dropped frames at core level, and automatically adds it to the device provided stats (rx_dropped), so that standard tools can be used (ifconfig, ip link, cat /proc/net/dev) This is a generalization of commit 8990f468a (net: rx_dropped accounting), thus reverting it. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: dynamic ingress_queue allocationEric Dumazet2010-10-051-8/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ingress being not used very much, and net_device->ingress_queue being quite a big object (128 or 256 bytes), use a dynamic allocation if needed (tc qdisc add dev eth0 ingress ...) dev_ingress_queue(dev) helper should be used only with RTNL taken. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: rename netdev rx_queue to ingress_queueEric Dumazet2010-09-291-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is some confusion with rx_queue name after RPS, and net drivers private rx_queue fields. I suggest to rename "struct net_device"->rx_queue to ingress_queue. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: add a recursion limit in xmit pathEric Dumazet2010-09-291-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As tunnel devices are going to be lockless, we need to make sure a misconfigured machine wont enter an infinite loop. Add a percpu variable, and limit to three the number of stacked xmits. Reported-by: Jesse Gross <jesse@nicira.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: Allow changing number of RX queues after device allocationBen Hutchings2010-09-271-4/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For RPS, we create a kobject for each RX queue based on the number of queues passed to alloc_netdev_mq(). However, drivers generally do not determine the numbers of hardware queues to use until much later, so this usually represents the maximum number the driver may use and not the actual number in use. For TX queues, drivers can update the actual number using netif_set_real_num_tx_queues(). Add a corresponding function for RX queues, netif_set_real_num_rx_queues(). Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | Merge branch 'master' of ↵David S. Miller2010-09-271-1/+1
| |\ \ | | |/ | | | | | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/qlcnic/qlcnic_init.c net/ipv4/ip_output.c
| * | rps: allocate rx queues in register_netdevice onlyEric Dumazet2010-09-261-44/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of having two places were we allocate dev->_rx, introduce netif_alloc_rx_queues() helper and call it only from register_netdevice(), not from alloc_netdev_mq() Goal is to let drivers change dev->num_rx_queues after allocating netdev and before registering it. This also removes a lot of ifdefs in net/core/dev.c Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: propagate NETIF_F_HIGHDMA to vlansEric Dumazet2010-09-261-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Automatically allows vlans to get NETIF_F_HIGHDMA if underlying device supports it. On 32bit arches (and more precisely if CONFIG_HIGHMEM is enabled), it can help to reduce cost of illegal_highdma() and __skb_linearize() calls. Tested on tg3 , bnx2, bonding, this worked very well. This is a generalization of a patch provided by Yi Zou & Jeff Kirsher. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | netns: keep vlan slaves on master netns moveDavid Lamparter2010-09-171-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | previously, if a vlan master device was moved from one network namespace to another, all 802.1q and macvlan slaves were deleted. we can use dev->reg_state to figure out whether dev_change_net_namespace is happening, since that won't set dev->reg_state NETREG_UNREGISTERING. so, this changes 8021q and macvlan to ignore NETDEV_UNREGISTER when reg_state is not NETREG_UNREGISTERING. Signed-off-by: David Lamparter <equinox@diac24.net> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Daniel Lezcano <daniel.lezcano@free.fr> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: include inetdevice.h for rcu_dereference_raw api changeStephen Rothwell2010-09-161-0/+1
| | | | | | | | | | | | | | | | | | | | | rcu_dereference_raw() now needs to know the type of its argument. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: enable GRO by default for vlan devicesBrandon Philips2010-09-151-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently vlan devices don't have GRO by default as none of the Ethernet drivers add NETIF_F_GRO to their vlan_features. As GRO is a software feature add GRO to dev->vlan_features in register_netdevice() and let vlan_dev_init() take care that it gets enabled only when dev->features has NETIF_F_GRO too. Signed-off-by: Brandon Philips <bphilips@suse.de> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ipv4: ip_ptr cleanupsEric Dumazet2010-09-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | dev->ip_ptr is protected by rtnl and rcu. Yet some places dont use appropriate primitives and/or locking rules. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | Merge branch 'master' of ↵David S. Miller2010-09-091-8/+8
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/mac80211/main.c
| * | | net: rps: add the shortcut for one rps_cpusChangli Gao2010-09-081-3/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | When there is only one rps_cpus, skb_get_rxhash() can be eliminated. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: dev_add_pack() & __dev_remove_pack() changesEric Dumazet2010-09-021-17/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a small helper ptype_head() to get the head to manipulate dev_add_pack() & __dev_remove_pack() can use a spinlock without blocking BH, since softirq use RCU, and these functions are run from process context only. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | skge: add GRO supportEric Dumazet2010-09-011-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - napi_gro_flush() is exported from net/core/dev.c, to avoid an irq_save/irq_restore in the packet receive path. - use napi_gro_receive() instead of netif_receive_skb() - use napi_gro_flush() before calling __napi_complete() - turn on NETIF_F_GRO by default - Tested on a Marvell 88E8001 Gigabit NIC Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | gro: __napi_gro_receive() optimizationsEric Dumazet2010-08-261-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | compare_ether_header() can have a special implementation on 64 bit arches if CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is defined. __napi_gro_receive() and vlan_gro_common() can avoid a conditional branch to perform device match. On x86_64, __napi_gro_receive() has now 38 instructions instead of 53 As gcc-4.4.3 still choose to not inline it, add inline keyword to this performance critical function. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: Rename skb_has_frags to skb_has_frag_listDavid S. Miller2010-08-231-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SKBs can be "fragmented" in two ways, via a page array (called skb_shinfo(skb)->frags[]) and via a list of SKBs (called skb_shinfo(skb)->frag_list). Since skb_has_frags() tests the latter, it's name is confusing since it sounds more like it's testing the former. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: 802.1q: make vlan_hwaccel_do_receive() return voidChangli Gao2010-08-221-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | vlan_hwaccel_do_receive() always returns 0, so make it return void. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | Merge branch 'master' of ↵David S. Miller2010-08-211-1/+1
| |\ \ \ | | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
| * | | | net: rps: fix the wrong network header pointerChangli Gao2010-08-211-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | __skb_get_rxhash() was broken after the commit: commit bfb564e7391340638afe4ad67744a8f3858e7566 Author: Krishna Kumar <krkumar2@in.ibm.com> Date: Wed Aug 4 06:15:52 2010 +0000 core: Factor out flow calculation from get_rps_cpu Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | net: rps: use proto_ports_offset() to handle the AH message correctlyChangli Gao2010-08-191-16/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The SPI isn't at the beginning of an AH message. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | net: rps: skip fragment when computing rxhashChangli Gao2010-08-191-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fragmented IP packets may have no transfer header, so when computing rxhash, we should skip them. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | net: rps: reset network header before calling skb_get_rxhash()Changli Gao2010-08-191-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | skb_get_rxhash() assumes the network header pointer of the skb is set properly after the commit: commit bfb564e7391340638afe4ad67744a8f3858e7566 Author: Krishna Kumar <krkumar2@in.ibm.com> Date: Wed Aug 4 06:15:52 2010 +0000 core: Factor out flow calculation from get_rps_cpu Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | net: simplify flags for tx timestampingOliver Hartkopp2010-08-191-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch removes the abstraction introduced by the union skb_shared_tx in the shared skb data. The access of the different union elements at several places led to some confusion about accessing the shared tx_flags e.g. in skb_orphan_try(). http://marc.info/?l=linux-netdev&m=128084897415886&w=2 Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | core: Factor out flow calculation from get_rps_cpuKrishna Kumar2010-08-161-44/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Factor out flow calculation code from get_rps_cpu, since other functions can use the same code. Revisions: v2 (Ben): Separate flow calcuation out and use in select queue. v3 (Arnd): Don't re-implement MIN. v4 (Changli): skb->data points to ethernet header in macvtap, and make a fast path. Tested macvtap with this patch. v5 (Changli): - Cache skb->rxhash in skb_get_rxhash - macvtap may not have pow(2) queues, so change code for queue selection. (Arnd): - Use first available queue if all fails. Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | Merge commit 'v2.6.36-rc5' into perf/coreIngo Molnar2010-09-211-1/+1
|\ \ \ \ \ | | |_|_|/ | |/| | | | | | | | | | | | | | | | | | Merge reason: Pick up the latest fixes in -rc5. Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | net: use rcu_barrier() in rollback_registered_manyEric Dumazet2010-09-141-1/+1
| | |_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | netdev_wait_allrefs() waits that all references to a device vanishes. It currently uses a _very_ pessimistic 250 ms delay between each probe. Some users reported that no more than 4 devices can be dismantled per second, this is a pretty serious problem for some setups. Most of the time, a refcount is about to be released by an RCU callback, that is still in flight because rollback_registered_many() uses a synchronize_rcu() call instead of rcu_barrier(). Problem is visible if number of online cpus is one, because synchronize_rcu() is then a no op. time to remove 50 ipip tunnels on a UP machine : before patch : real 11.910s after patch : real 1.250s Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reported-by: Octavian Purdila <opurdila@ixiacom.com> Reported-by: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | Merge branch 'tip/perf/core' of ↵Ingo Molnar2010-09-151-8/+8
|\ \ \ \ | |/ / / | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core
| * | | net: fix tx queue selection for bridged devices implementing select_queueHelmut Schaa2010-09-071-8/+8
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a net device is implementing the select_queue callback and is part of a bridge, frames coming from the bridge already have a tx queue associated to the socket (introduced in commit a4ee3ce3293dc931fab19beb472a8bde1295aebe, "net: Use sk_tx_queue_mapping for connected sockets"). The call to sk_tx_queue_get will then return the tx queue used by the bridge instead of calling the select_queue callback. In case of mac80211 this broke QoS which is implemented by using the select_queue callback. Furthermore it introduced problems with rt2x00 because frames with the same TID and RA sometimes appeared on different tx queues which the hw cannot handle correctly. Fix this by always calling select_queue first if it is available and only afterwards use the socket tx queue mapping. Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | skb: Add tracepoints to freeing skbKoki Sanagi2010-09-071-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds tracepoint to consume_skb and add trace_kfree_skb before __kfree_skb in skb_free_datagram_locked and net_tx_action. Combinating with tracepoint on dev_hard_start_xmit, we can check how long it takes to free transmitted packets. And using it, we can calculate how many packets driver had at that time. It is useful when a drop of transmitted packet is a problem. sshd-6828 [000] 112689.258154: consume_skb: skbaddr=f2d99bb8 Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Neil Horman <nhorman@tuxdriver.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Kaneshige Kenji <kaneshige.kenji@jp.fujitsu.com> Cc: Izumo Taku <izumi.taku@jp.fujitsu.com> Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Scott Mcmillan <scott.a.mcmillan@intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> LKML-Reference: <4C724364.50903@jp.fujitsu.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>