aboutsummaryrefslogtreecommitdiffstats
path: root/include/linux/skbuff.h
Commit message (Collapse)AuthorAgeFilesLines
* Backport mac80211 from 3.4 kernelWolfgang Wiedmeyer2017-01-211-0/+16
| | | | | | | | | The ath9k_htc driver depends on mac80211, but mac80211 can't be build. The reason is that net/wireless is almost completely backported from a 3.4 kernel. To follow suit, mac80211 is also backported from 3.4, more precisely from 3.4.113. This makes mac80211 build. Signed-off-by: Wolfgang Wiedmeyer <wolfgit@wiedmeyer.de>
* net: add length argument to skb_copy_and_csum_datagram_iovecSabrina Dubroca2016-03-181-1/+2
| | | | | | | | | | | | | | | | | | Without this length argument, we can read past the end of the iovec in memcpy_toiovec because we have no way of knowing the total length of the iovec's buffers. This is needed for stable kernels where 89c22d8c3b27 ("net: Fix skb csum races when peeking") has been backported but that don't have the ioviter conversion, which is almost all the stable trees <= 3.18. This also fixes a kernel crash for NFS servers when the client uses -onfsvers=3,proto=udp to mount the export. Change-Id: I1865e3d7a1faee42a5008a9ad58c4d3323ea4bab Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> (cherry picked from commit c91234366e4cfd4f70c73e7d79ede92a6e462a88)
* Merge remote-tracking branch 'korg/linux-3.0.y' into cm-13.0rogersb112015-11-101-0/+8
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: crypto/algapi.c drivers/gpu/drm/i915/i915_debugfs.c drivers/gpu/drm/i915/intel_display.c drivers/video/fbmem.c include/linux/nls.h kernel/cgroup.c kernel/signal.c kernel/timeconst.pl net/ipv4/ping.c Change-Id: I1f532925d1743df74d66bcdd6fc92f05c72ee0dd
| * netfilter: don't reset nf_trace in nf_reset()Patrick McHardy2013-05-011-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 124dff01afbdbff251f0385beca84ba1b9adda68 ] Commit 130549fe ("netfilter: reset nf_trace in nf_reset") added code to reset nf_trace in nf_reset(). This is wrong and unnecessary. nf_reset() is used in the following cases: - when passing packets up the the socket layer, at which point we want to release all netfilter references that might keep modules pinned while the packet is queued. nf_trace doesn't matter anymore at this point. - when encapsulating or decapsulating IPsec packets. We want to continue tracing these packets after IPsec processing. - when passing packets through virtual network devices. Only devices on that encapsulate in IPv4/v6 matter since otherwise nf_trace is not used anymore. Its not entirely clear whether those packets should be traced after that, however we've always done that. - when passing packets through virtual network devices that make the packet cross network namespace boundaries. This is the only cases where we clearly want to reset nf_trace and is also what the original patch intended to fix. Add a new function nf_reset_trace() and use it in dev_forward_skb() to fix this properly. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | Merge remote-tracking branch 'kernelorg/linux-3.0.y' into 3_0_64Andrew Dodd2013-02-271-2/+0
|\ \ | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: arch/arm/Kconfig arch/arm/include/asm/hwcap.h arch/arm/kernel/smp.c arch/arm/plat-samsung/adc.c drivers/gpu/drm/i915/i915_reg.h drivers/gpu/drm/i915/intel_drv.h drivers/mmc/core/sd.c drivers/net/tun.c drivers/net/usb/usbnet.c drivers/regulator/max8997.c drivers/usb/core/hub.c drivers/usb/host/xhci.h drivers/usb/serial/qcserial.c fs/jbd2/transaction.c include/linux/migrate.h kernel/sys.c kernel/time/timekeeping.c lib/genalloc.c mm/memory-failure.c mm/memory_hotplug.c mm/mempolicy.c mm/page_alloc.c mm/vmalloc.c mm/vmscan.c mm/vmstat.c scripts/Kbuild.include Change-Id: I91e2d85c07320c7ccfc04cf98a448e89bed6ade6
| * skb: avoid unnecessary reallocations in __skb_cowFelix Fietkau2012-06-101-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 617c8c11236716dcbda877e764b7bf37c6fd8063 ] At the beginning of __skb_cow, headroom gets set to a minimum of NET_SKB_PAD. This causes unnecessary reallocations if the buffer was not cloned and the headroom is just below NET_SKB_PAD, but still more than the amount requested by the caller. This was showing up frequently in my tests on VLAN tx, where vlan_insert_tag calls skb_cow_head(skb, VLAN_HLEN). Locally generated packets should have enough headroom, and for forward paths, we already have NET_SKB_PAD bytes of headroom, so we don't need to add any extra space here. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | merge opensource jb u5codeworkx2012-09-221-2/+8
|/ | | | Change-Id: I1aaec157aa196f3448eff8636134fce89a814cf2
* ipsec: be careful of non existing mac headersEric Dumazet2012-03-191-0/+10
| | | | | | | | | | | | | | | | | [ Upstream commit 03606895cd98c0a628b17324fd7b5ff15db7e3cd ] Niccolo Belli reported ipsec crashes in case we handle a frame without mac header (atm in his case) Before copying mac header, better make sure it is present. Bugzilla reference: https://bugzilla.kernel.org/show_bug.cgi?id=42809 Reported-by: Niccolò Belli <darkbasic@linuxsystems.it> Tested-by: Niccolò Belli <darkbasic@linuxsystems.it> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* net: hold sock reference while processing tx timestampsRichard Cochran2011-11-111-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit da92b194cc36b5dc1fbd85206aeeffd80bee0c39 upstream. The pair of functions, * skb_clone_tx_timestamp() * skb_complete_tx_timestamp() were designed to allow timestamping in PHY devices. The first function, called during the MAC driver's hard_xmit method, identifies PTP protocol packets, clones them, and gives them to the PHY device driver. The PHY driver may hold onto the packet and deliver it at a later time using the second function, which adds the packet to the socket's error queue. As pointed out by Johannes, nothing prevents the socket from disappearing while the cloned packet is sitting in the PHY driver awaiting a timestamp. This patch fixes the issue by taking a reference on the socket for each such packet. In addition, the comments regarding the usage of these function are expanded to highlight the rule that PHY drivers must use skb_complete_tx_timestamp() to release the packet, in order to release the socket reference, too. These functions first appeared in v2.6.36. Reported-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Richard Cochran <richard.cochran@omicron.at> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* vlan: Fix the ingress VLAN_FLAG_REORDER_HDR checkJiri Pirko2011-06-111-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | Testing of VLAN_FLAG_REORDER_HDR does not belong in vlan_untag but rather in vlan_do_receive. Otherwise the vlan header will not be properly put on the packet in the case of vlan header accelleration. As we remove the check from vlan_check_reorder_header rename it vlan_reorder_header to keep the naming clean. Fix up the skb->pkt_type early so we don't look at the packet after adding the vlan tag, which guarantees we don't goof and look at the wrong field. Use a simple if statement instead of a complicated switch statement to decided that we need to increment rx_stats for a multicast packet. Hopefully at somepoint we will just declare the case where VLAN_FLAG_REORDER_HDR is cleared as unsupported and remove the code. Until then this keeps it working correctly. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Jiri Pirko <jpirko@redhat.com> Acked-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds2011-05-231-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (27 commits) bnx2x: allow device properly initialize after hotplug bnx2x: fix DMAE timeout according to hw specifications bnx2x: properly handle CFC DEL in cnic flow bnx2x: call dev_kfree_skb_any instead of dev_kfree_skb net: filter: move forward declarations to avoid compile warnings pktgen: refactor pg_init() code pktgen: use vzalloc_node() instead of vmalloc_node() + memset() net: skb_trim explicitely check the linearity instead of data_len ipv4: Give backtrace in ip_rt_bug(). net: avoid synchronize_rcu() in dev_deactivate_many net: remove synchronize_net() from netdev_set_master() rtnetlink: ignore NETDEV_RELEASE and NETDEV_JOIN event net: rename NETDEV_BONDING_DESLAVE to NETDEV_RELEASE bridge: call NETDEV_JOIN notifiers when add a slave netpoll: disable netpoll when enslave a device macvlan: Forward unicast frames in bridge mode to lowerdev net: Remove linux/prefetch.h include from linux/skbuff.h ipv4: Include linux/prefetch.h in fib_trie.c netlabel: Remove prefetches from list handlers. drivers/net: add prefetch header for prefetch users ... Fixed up prefetch parts: removed a few duplicate prefetch.h includes, fixed the location of the igb prefetch.h, took my version of the skbuff.h code without the extra parentheses etc.
| * net: skb_trim explicitely check the linearity instead of data_lenEmmanuel Grumbach2011-05-221-1/+1
| | | | | | | | | | | | | | | | The purpose of the check on data_len is to check linearity, so use the inline helper for this. No overhead and more explicit. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: Remove linux/prefetch.h include from linux/skbuff.hDavid S. Miller2011-05-221-1/+0
| | | | | | | | | | | | No longer needed. Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: Remove prefetches from SKB list handlers.David S. Miller2011-05-221-3/+3
| | | | | | | | | | | | Noticed by Linus. Signed-off-by: David S. Miller <davem@davemloft.net>
* | Remove prefetch() from <linux/skbuff.h> and "netlabel_addrlist.h"Linus Torvalds2011-05-221-4/+3
|/ | | | | | | | | | | | | | | | | | Commit e66eed651fd1 ("list: remove prefetching from regular list iterators") removed the include of prefetch.h from list.h. The skbuff list traversal still had them. Quoth David Miller: "Please just remove the prefetches. Those are modelled after list.h as I intend to eventually convert SKB list handling to "struct list_head" but we're not there yet. Therefore if we kill prefetches from list.h we should kill it from these things in skbuff.h too." Requested-by: David Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* net: add missing prefetch.h includeHeiko Carstens2011-05-221-0/+1
| | | | | | | | | | | Fixes build errors on s390 and probably other archs as well: In file included from net/ipv4/ip_forward.c:32:0: include/net/udp.h: In function 'udp_csum_outgoing': include/net/udp.h:141:2: error: implicit declaration of function 'prefetch' Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* net: filter: Just In Time compiler for x86-64Eric Dumazet2011-04-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to speedup packet filtering, here is an implementation of a JIT compiler for x86_64 It is disabled by default, and must be enabled by the admin. echo 1 >/proc/sys/net/core/bpf_jit_enable It uses module_alloc() and module_free() to get memory in the 2GB text kernel range since we call helpers functions from the generated code. EAX : BPF A accumulator EBX : BPF X accumulator RDI : pointer to skb (first argument given to JIT function) RBP : frame pointer (even if CONFIG_FRAME_POINTER=n) r9d : skb->len - skb->data_len (headlen) r8 : skb->data To get a trace of generated code, use : echo 2 >/proc/sys/net/core/bpf_jit_enable Example of generated code : # tcpdump -p -n -s 0 -i eth1 host 192.168.20.0/24 flen=18 proglen=147 pass=3 image=ffffffffa00b5000 JIT code: ffffffffa00b5000: 55 48 89 e5 48 83 ec 60 48 89 5d f8 44 8b 4f 60 JIT code: ffffffffa00b5010: 44 2b 4f 64 4c 8b 87 b8 00 00 00 be 0c 00 00 00 JIT code: ffffffffa00b5020: e8 24 7b f7 e0 3d 00 08 00 00 75 28 be 1a 00 00 JIT code: ffffffffa00b5030: 00 e8 fe 7a f7 e0 24 00 3d 00 14 a8 c0 74 49 be JIT code: ffffffffa00b5040: 1e 00 00 00 e8 eb 7a f7 e0 24 00 3d 00 14 a8 c0 JIT code: ffffffffa00b5050: 74 36 eb 3b 3d 06 08 00 00 74 07 3d 35 80 00 00 JIT code: ffffffffa00b5060: 75 2d be 1c 00 00 00 e8 c8 7a f7 e0 24 00 3d 00 JIT code: ffffffffa00b5070: 14 a8 c0 74 13 be 26 00 00 00 e8 b5 7a f7 e0 24 JIT code: ffffffffa00b5080: 00 3d 00 14 a8 c0 75 07 b8 ff ff 00 00 eb 02 31 JIT code: ffffffffa00b5090: c0 c9 c3 BPF program is 144 bytes long, so native program is almost same size ;) (000) ldh [12] (001) jeq #0x800 jt 2 jf 8 (002) ld [26] (003) and #0xffffff00 (004) jeq #0xc0a81400 jt 16 jf 5 (005) ld [30] (006) and #0xffffff00 (007) jeq #0xc0a81400 jt 16 jf 17 (008) jeq #0x806 jt 10 jf 9 (009) jeq #0x8035 jt 10 jf 17 (010) ld [28] (011) and #0xffffff00 (012) jeq #0xc0a81400 jt 16 jf 13 (013) ld [38] (014) and #0xffffff00 (015) jeq #0xc0a81400 jt 16 jf 17 (016) ret #65535 (017) ret #0 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Hagen Paul Pfeifer <hagen@jauu.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6Linus Torvalds2011-04-071-1/+1
|\ | | | | | | | | * 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6: Fix common misspellings
| * Fix common misspellingsLucas De Marchi2011-03-311-1/+1
| | | | | | | | | | | | Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
* | net: Fix warnings caused by MAX_SKB_FRAGS change.David S. Miller2011-03-291-1/+1
|/ | | | | | | | | | | | | | | | | After commit a715dea3c8e9ef2771c534e05ee1d36f65987e64 ("net: Always allocate at least 16 skb frags regardless of page size"), the value of MAX_SKB_FRAGS can now take on either an "unsigned long" or an "int" value. This causes warnings like: net/packet/af_packet.c: In function ‘tpacket_fill_skb’: net/packet/af_packet.c:948: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 2 has type ‘int’ Fix by forcing the constant to be unsigned long, otherwise we have a situation where the type of a system wide constant is variable. Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Always allocate at least 16 skb frags regardless of page sizeAnton Blanchard2011-03-281-1/+7
| | | | | | | | | | | | | | | When analysing performance of the cxgb3 on a ppc64 box I noticed that we weren't doing much GRO merging. It turns out we are limited by the number of SKB frags: #define MAX_SKB_FRAGS (65536/PAGE_SIZE + 2) With a 4kB page size we have 18 frags, but with a 64kB page size we only have 3 frags. I ran a single stream TCP bandwidth test to compare the performance of Signed-off-by: David S. Miller <davem@davemloft.net>
* net: introduce rx_handler results and logic around thatJiri Pirko2011-03-161-4/+1
| | | | | | | | | | | This patch allows rx_handlers to better signalize what to do next to it's caller. That makes skb->deliver_no_wcard no longer needed. kernel-doc for rx_handler_result is taken from Nicolas' patch. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Reviewed-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: change netdev->features to u32Michał Mirosław2011-01-241-1/+1
| | | | | | | | | | | | | Quoting Ben Hutchings: we presumably won't be defining features that can only be enabled on 64-bit architectures. Occurences found by `grep -r` on net/, drivers/net, include/ [ Move features and vlan_features next to each other in struct netdev, as per Eric Dumazet's suggestion -DaveM ] Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Add safe reverse SKB queue walkers.David S. Miller2011-01-201-0/+9
| | | | Signed-off-by: David S. Miller <davem@davemloft.net>
* netfilter: fix compilation when conntrack is disabled but tproxy is enabledKOVACS Krisztian2011-01-121-0/+15
| | | | | | | | | | | | | | | | | The IPv6 tproxy patches split IPv6 defragmentation off of conntrack, but failed to update the #ifdef stanzas guarding the defragmentation related fields and code in skbuff and conntrack related code in nf_defrag_ipv6.c. This patch adds the required #ifdefs so that IPv6 tproxy can truly be used without connection tracking. Original report: http://marc.info/?l=linux-netdev&m=129010118516341&w=2 Reported-by: Randy Dunlap <randy.dunlap@oracle.com> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: KOVACS Krisztian <hidden@balabit.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* net: Introduce skb_checksum_start_offset()Michał Mirosław2010-12-161-0/+5
| | | | | | | Introduce skb_checksum_start_offset() to replace repetitive calculation. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
* bnx2x: Take the distribution range definition out of skb_tx_hash()Vladislav Zolotarov2010-12-161-2/+3
| | | | | | | | | | | Move the calcualation of the Tx hash for a given hash range into a separate function and define the skb_tx_hash(), which calculates a Tx hash for a [0; dev->real_num_tx_queues - 1] hash values range, using this function (__skb_tx_hash()). Signed-off-by: Vladislav Zolotarov <vladz@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* xps: Improvements in TX queue selectionTom Herbert2010-11-241-1/+2
| | | | | | | | | | | | | | | | | | | | In dev_pick_tx, don't do work in calculating queue index or setting the index in the sock unless the device has more than one queue. This allows the sock to be set only with a queue index of a multi-queue device which is desirable if device are stacked like in a tunnel. We also allow the mapping of a socket to queue to be changed. To maintain in order packet transmission a flag (ooo_okay) has been added to the sk_buff structure. If a transport layer sets this flag on a packet, the transmit queue can be changed for the socket. Presumably, the transport would set this if there was no possbility of creating OOO packets (for instance, there are no packets in flight for the socket). This patch includes the modification in TCP output for setting this flag. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: avoid RCU for NOCACHE dstEric Dumazet2010-10-201-13/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is no point using RCU for dst we allocate for a very short time (used once). Change dst_release() to take DST_NOCACHE into account, but also change skb_dst_set_noref() to force a refcount increment for such dst. This is a _huge_ gain, because we dont waste memory to store xx thousand of dsts. Instead of queueing them to RCU, we can free them instantly. CPU caches can stay hot, re-using same memory blocks to hold temporary dsts. Note : remove unneeded smp_mb__before_atomic_dec(); in dst_release(), since atomic_dec_return() implies a full memory barrier. Stress test, 160.000.000 udp frames sent, IP route cache disabled (DDOS). Before: real 0m38.091s user 0m13.189s sys 7m53.018s After: real 0m29.946s user 0m12.157s sys 7m40.605s For reference, if IP route cache was enabled : real 0m32.030s user 0m10.521s sys 8m15.243s Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: allocate skbs on local nodeEric Dumazet2010-10-161-4/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | commit b30973f877 (node-aware skb allocation) spread a wrong habit of allocating net drivers skbs on a given memory node : The one closest to the NIC hardware. This is wrong because as soon as we try to scale network stack, we need to use many cpus to handle traffic and hit slub/slab management on cross-node allocations/frees when these cpus have to alloc/free skbs bound to a central node. skb allocated in RX path are ephemeral, they have a very short lifetime : Extra cost to maintain NUMA affinity is too expensive. What appeared as a nice idea four years ago is in fact a bad one. In 2010, NIC hardwares are multiqueue, or we use RPS to spread the load, and two 10Gb NIC might deliver more than 28 million packets per second, needing all the available cpus. Cost of cross-node handling in network and vm stacks outperforms the small benefit hardware had when doing its DMA transfert in its 'local' memory node at RX time. Even trying to differentiate the two allocations done for one skb (the sk_buff on local node, the data part on NIC hardware node) is not enough to bring good performance. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: skb_frag_t can be smaller on small archesEric Dumazet2010-09-261-0/+5
| | | | | | | | | On 32bit arches, if PAGE_SIZE is smaller than 65536, we can use 16bit offset and size fields. This patch saves 72 bytes per skb on i386, or 128 bytes after rounding. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: return operator cleanupEric Dumazet2010-09-231-3/+3
| | | | | | | | | Change "return (EXPR);" to "return EXPR;" return is not a function, parentheses are not required. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* drivers/net: avoid some skb->ip_summed initializationsEric Dumazet2010-09-021-0/+15
| | | | | | | | | | | | | | | | | | | | fresh skbs have ip_summed set to CHECKSUM_NONE (0) We can avoid setting again skb->ip_summed to CHECKSUM_NONE in drivers. Introduce skb_checksum_none_assert() helper so that we keep this assertion documented in driver sources. Change most occurrences of : skb->ip_summed = CHECKSUM_NONE; by : skb_checksum_none_assert(skb); Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Rename skb_has_frags to skb_has_frag_listDavid S. Miller2010-08-231-2/+2
| | | | | | | | | | | SKBs can be "fragmented" in two ways, via a page array (called skb_shinfo(skb)->frags[]) and via a list of SKBs (called skb_shinfo(skb)->frag_list). Since skb_has_frags() tests the latter, it's name is confusing since it sounds more like it's testing the former. Signed-off-by: David S. Miller <davem@davemloft.net>
* net: simplify flags for tx timestampingOliver Hartkopp2010-08-191-28/+16
| | | | | | | | | | | | | This patch removes the abstraction introduced by the union skb_shared_tx in the shared skb data. The access of the different union elements at several places led to some confusion about accessing the shared tx_flags e.g. in skb_orphan_try(). http://marc.info/?l=linux-netdev&m=128084897415886&w=2 Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* core: Factor out flow calculation from get_rps_cpuKrishna Kumar2010-08-161-0/+9
| | | | | | | | | | | | | | | | | | | | | Factor out flow calculation code from get_rps_cpu, since other functions can use the same code. Revisions: v2 (Ben): Separate flow calcuation out and use in select queue. v3 (Arnd): Don't re-implement MIN. v4 (Changli): skb->data points to ethernet header in macvtap, and make a fast path. Tested macvtap with this patch. v5 (Changli): - Cache skb->rxhash in skb_get_rxhash - macvtap may not have pow(2) queues, so change code for queue selection. (Arnd): - Use first available queue if all fails. Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sk_buff: introduce pskb_network_may_pull()Changli Gao2010-08-041-0/+5
| | | | | Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* can-raw: Fix skb_orphan_try handlingOliver Hartkopp2010-08-031-1/+3
| | | | | | | | | | | | | | | | | | | Commit fc6055a5ba31e2c14e36e8939f9bf2b6d586a7f5 (net: Introduce skb_orphan_try()) allows an early orphan of the skb and takes care on tx timestamping, which needs the sk-reference in the skb on driver level. So does the can-raw socket, which has not been taken into account here. The patch below adds a 'prevent_sk_orphan' bit in the skb tx shared info, which fixes the problem discovered by Matthias Fuchs here: http://marc.info/?t=128030411900003&r=1&w=2 Even if it's not a primary tx timestamp topic it fits well into some skb shared tx context. Or should be find a different place for the information to protect the sk reference until it reaches the driver level? Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: pskb_expand_head() optimizationEric Dumazet2010-07-241-1/+2
| | | | | | | | | | | Move frags[] at the end of struct skb_shared_info, and make pskb_expand_head() copy only the used part of it instead of whole array. This should avoid kmemcheck warnings and speedup pskb_expand_head() as well, avoiding a lot of cache misses. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: support time stamping in phy devices.Richard Cochran2010-07-181-0/+31
| | | | | | | | | | | | | | | This patch adds a new networking option to allow hardware time stamps from PHY devices. When enabled, likely candidates among incoming and outgoing network packets are offered to the PHY driver for possible time stamping. When accepted by the PHY driver, incoming packets are deferred for later delivery by the driver. The patch also adds phylib driver methods for the SIOCSHWTSTAMP ioctl and callbacks for transmit and receive time stamping. Drivers may optionally implement these functions. Signed-off-by: Richard Cochran <richard.cochran@omicron.at> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: add driver hook for tx time stamping.Richard Cochran2010-07-181-0/+21
| | | | | | | | | | This patch adds a hook for transmit time stamps. The transmit hook allows a software fallback for transmit time stamps, for MACs lacking time stamping hardware. Using the hook will still require adding an inline function call to each MAC driver. Signed-off-by: Richard Cochran <richard.cochran@omicron.at> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: NET_SKB_PAD should depend on L1_CACHE_BYTESEric Dumazet2010-06-151-3/+5
| | | | | | | | | | | | | | | | | | | | | | In old kernels, NET_SKB_PAD was defined to 16. Then commit d6301d3dd1c2 (net: Increase default NET_SKB_PAD to 32), and commit 18e8c134f4e9 (net: Increase NET_SKB_PAD to 64 bytes) increased it to 64. While first patch was governed by network stack needs, second was more driven by performance issues on current hardware. Real intent was to align data on a cache line boundary. So use max(32, L1_CACHE_BYTES) instead of 64, to be more generic. Remove microblaze and powerpc own NET_SKB_PAD definitions. Thanks to Alexander Duyck and David Miller for their comments. Suggested-by: David Miller <davem@davemloft.net> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'master' of ↵David S. Miller2010-06-111-1/+4
|\ | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
| * net: deliver skbs on inactive slaves to exact matchesJohn Fastabend2010-06-101-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the accelerated receive path for VLAN's will drop packets if the real device is an inactive slave and is not one of the special pkts tested for in skb_bond_should_drop(). This behavior is different then the non-accelerated path and for pkts over a bonded vlan. For example, vlanx -> bond0 -> ethx will be dropped in the vlan path and not delivered to any packet handlers at all. However, bond0 -> vlanx -> ethx and bond0 -> ethx will be delivered to handlers that match the exact dev, because the VLAN path checks the real_dev which is not a slave and netif_recv_skb() doesn't drop frames but only delivers them to exact matches. This patch adds a sk_buff flag which is used for tagging skbs that would previously been dropped and allows the skb to continue to skb_netif_recv(). Here we add logic to check for the deliver_no_wcard flag and if it is set only deliver to handlers that match exactly. This makes both paths above consistent and gives pkt handlers a way to identify skbs that come from inactive slaves. Without this patch in some configurations skbs will be delivered to handlers with exact matches and in others be dropped out right in the vlan path. I have tested the following 4 configurations in failover modes and load balancing modes. # bond0 -> ethx # vlanx -> bond0 -> ethx # bond0 -> vlanx -> ethx # bond0 -> ethx | vlanx -> -- Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | skbuff: add check for non-linear to warn_if_lro and needs_linearizeAlexander Duyck2010-06-051-1/+2
|/ | | | | | | | | | We can avoid an unecessary cache miss by checking if the skb is non-linear before accessing gso_size/gso_type in skb_warn_if_lro, the same can also be done to avoid a cache miss on nr_frags if data_len is 0. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* skb: make skb_recycle_check() return a bool valueChangli Gao2010-05-291-1/+1
| | | | | Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: add a noref bit on skb dstEric Dumazet2010-05-171-4/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use low order bit of skb->_skb_dst to tell dst is not refcounted. Change _skb_dst to _skb_refdst to make sure all uses are catched. skb_dst() returns the dst, regardless of noref bit set or not, but with a lockdep check to make sure a noref dst is not given if current user is not rcu protected. New skb_dst_set_noref() helper to set an notrefcounted dst on a skb. (with lockdep check) skb_dst_drop() drops a reference only if skb dst was refcounted. skb_dst_force() helper is used to force a refcount on dst, when skb is queued and not anymore RCU protected. Use skb_dst_force() in __sk_add_backlog(), __dev_xmit_skb() if !IFF_XMIT_DST_RELEASE or skb enqueued on qdisc queue, in sock_queue_rcv_skb(), in __nf_queue(). Use skb_dst_force() in dev_requeue_skb(). Note: dst_use_noref() still dirties dst, we might transform it later to do one dirtying per jiffies. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Increase NET_SKB_PAD to 64 bytesEric Dumazet2010-05-061-1/+4
| | | | | | | | | | | | | eth_type_trans() & get_rps_cpus() currently need two 64bytes cache lines in packet to compute rxhash. Increasing NET_SKB_PAD from 32 to 64 reduces the need to one cache line only, and makes RPS faster. NET_IP_ALIGN(2) + ethernet_header(14) + IP_header(20/40) + ports(8) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: __alloc_skb() speedupEric Dumazet2010-05-051-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | With following patch I can reach maximum rate of my pktgen+udpsink simulator : - 'old' machine : dual quad core E5450 @3.00GHz - 64 UDP rx flows (only differ by destination port) - RPS enabled, NIC interrupts serviced on cpu0 - rps dispatched on 7 other cores. (~130.000 IPI per second) - SLAB allocator (faster than SLUB in this workload) - tg3 NIC - 1.080.000 pps without a single drop at NIC level. Idea is to add two prefetchw() calls in __alloc_skb(), one to prefetch first sk_buff cache line, the second to prefetch the shinfo part. Also using one memset() to initialize all skb_shared_info fields instead of one by one to reduce number of instructions, using long word moves. All skb_shared_info fields before 'dataref' are cleared in __alloc_skb(). Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Inline skb_pull() in eth_type_trans().David S. Miller2010-05-021-0/+5
| | | | | | | | | | | | | | | In commit 6be8ac2f ("[NET]: uninline skb_pull, de-bloats a lot") we uninlined skb_pull. But in some critical paths it makes sense to inline this thing and it helps performance significantly. Create an skb_pull_inline() so that we can do this in a way that serves also as annotation. Based upon a patch by Eric Dumazet. Signed-off-by: David S. Miller <davem@davemloft.net>