aboutsummaryrefslogtreecommitdiffstats
path: root/net/core/datagram.c
Commit message (Collapse)AuthorAgeFilesLines
* Fix a typo in datagram.c and sctp/socket.c.David Shwatrz2010-12-061-1/+1
| | | | | | | | | | | Hi, This patch fixes a typo in net/core/datagram.c and in net/sctp/socket.c Regards, David Shwartz Signed-off-by: David Shwartz <dshwatrz@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6Linus Torvalds2010-10-231-3/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1699 commits) bnx2/bnx2x: Unsupported Ethtool operations should return -EINVAL. vlan: Calling vlan_hwaccel_do_receive() is always valid. tproxy: use the interface primary IP address as a default value for --on-ip tproxy: added IPv6 support to the socket match cxgb3: function namespace cleanup tproxy: added IPv6 support to the TPROXY target tproxy: added IPv6 socket lookup function to nf_tproxy_core be2net: Changes to use only priority codes allowed by f/w tproxy: allow non-local binds of IPv6 sockets if IP_TRANSPARENT is enabled tproxy: added tproxy sockopt interface in the IPV6 layer tproxy: added udp6_lib_lookup function tproxy: added const specifiers to udp lookup functions tproxy: split off ipv6 defragmentation to a separate module l2tp: small cleanup nf_nat: restrict ICMP translation for embedded header can: mcp251x: fix generation of error frames can: mcp251x: fix endless loop in interrupt handler if CANINTF_MERRF is set can-raw: add msg_flags to distinguish local traffic 9p: client code cleanup rds: make local functions/variables static ... Fix up conflicts in net/core/dev.c, drivers/net/pcmcia/smc91c92_cs.c and drivers/net/wireless/ath/ath9k/debug.c as per David
| * net: poll() optimizationsEric Dumazet2010-09-061-3/+2
| | | | | | | | | | | | | | No need to test twice sk->sk_shutdown & RCV_SHUTDOWN Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | skb: Add tracepoints to freeing skbKoki Sanagi2010-09-071-0/+1
|/ | | | | | | | | | | | | | | | | | | | | | | | | This patch adds tracepoint to consume_skb and add trace_kfree_skb before __kfree_skb in skb_free_datagram_locked and net_tx_action. Combinating with tracepoint on dev_hard_start_xmit, we can check how long it takes to free transmitted packets. And using it, we can calculate how many packets driver had at that time. It is useful when a drop of transmitted packet is a problem. sshd-6828 [000] 112689.258154: consume_skb: skbaddr=f2d99bb8 Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Neil Horman <nhorman@tuxdriver.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Kaneshige Kenji <kaneshige.kenji@jp.fujitsu.com> Cc: Izumo Taku <izumi.taku@jp.fujitsu.com> Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Scott Mcmillan <scott.a.mcmillan@intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> LKML-Reference: <4C724364.50903@jp.fujitsu.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
* net/core: EXPORT_SYMBOL cleanupsEric Dumazet2010-07-121-5/+3
| | | | | | | | | CodingStyle cleanups EXPORT_SYMBOL should immediately follow the symbol declaration. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: fix lock_sock_bh/unlock_sock_bhEric Dumazet2010-05-271-2/+4
| | | | | | | | | | | | | | | | | | | | | This new sock lock primitive was introduced to speedup some user context socket manipulation. But it is unsafe to protect two threads, one using regular lock_sock/release_sock, one using lock_sock_bh/unlock_sock_bh This patch changes lock_sock_bh to be careful against 'owned' state. If owned is found to be set, we must take the slow path. lock_sock_bh() now returns a boolean to say if the slow path was taken, and this boolean is used at unlock_sock_bh time to call the appropriate unlock function. After this change, BH are either disabled or enabled during the lock_sock_bh/unlock_sock_bh protected section. This might be misleading, so we rename these functions to lock_sock_fast()/unlock_sock_fast(). Reported-by: Anton Blanchard <anton@samba.org> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Tested-by: Anton Blanchard <anton@samba.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: skb_free_datagram_locked() fixEric Dumazet2010-05-031-2/+7
| | | | | | | | | | | | | | | Commit 4b0b72f7dd617b ( net: speedup udp receive path ) introduced a bug in skb_free_datagram_locked(). We should not skb_orphan() skb if we dont have the guarantee we are the last skb user, this might happen with MSG_PEEK concurrent users. To keep socket locked for the smallest period of time, we split consume_skb() logic, inlined in skb_free_datagram_locked() Reported-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: speedup udp receive pathEric Dumazet2010-04-281-3/+7
| | | | | | | | | | | | | | | | | | | | | Since commit 95766fff ([UDP]: Add memory accounting.), each received packet needs one extra sock_lock()/sock_release() pair. This added latency because of possible backlog handling. Then later, ticket spinlocks added yet another latency source in case of DDOS. This patch introduces lock_sock_bh() and unlock_sock_bh() synchronization primitives, avoiding one atomic operation and backlog processing. skb_free_datagram_locked() uses them instead of full blown lock_sock()/release_sock(). skb is orphaned inside locked section for proper socket memory reclaim, and finally freed outside of it. UDP receive path now take the socket spinlock only once. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: sk_sleep() helperEric Dumazet2010-04-201-3/+3
| | | | | | | | | | | | | | | | | Define a new function to return the waitqueue of a "struct sock". static inline wait_queue_head_t *sk_sleep(struct sock *sk) { return sk->sk_sleep; } Change all read occurrences of sk_sleep by a call to this function. Needed for a future RCU conversion. sk_sleep wont be a field directly available. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo2010-03-301-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
* Merge branch 'master' of ↵David S. Miller2009-11-061-1/+9
|\ | | | | | | | | | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/usb/cdc_ether.c All CDC ethernet devices of type USB_CLASS_COMM need to use '&mbm_info'. Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: fix sk_forward_alloc corruptionEric Dumazet2009-10-301-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | On UDP sockets, we must call skb_free_datagram() with socket locked, or risk sk_forward_alloc corruption. This requirement is not respected in SUNRPC. Add a convenient helper, skb_free_datagram_locked() and use it in SUNRPC Reported-by: Francis Moreau <francis.moro@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: sk_drops consolidation part 2Eric Dumazet2009-10-181-0/+1
|/ | | | | | | | | - skb_kill_datagram() can increment sk->sk_drops itself, not callers. - UDP on IPV4 & IPV6 dropped frames (because of bad checksum or policy checks) increment sk_drops Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: skb ftracer - add tracepoint to skb_copy_datagram_iovec (v3)Neil Horman2009-08-131-0/+3
| | | | | | | | | | | | | | | | | skb allocation / cosumption tracer - Add consumption tracepoint This patch adds a tracepoint to skb_copy_datagram_iovec, which is called each time a userspace process copies a frame from a socket receive queue to a user space buffer. It allows us to hook in and examine each sk_buff that the system receives on a per-socket bases, and can be use to compile a list of which skb's were received by which processes. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> include/trace/events/skb.h | 20 ++++++++++++++++++++ net/core/datagram.c | 3 +++ 2 files changed, 23 insertions(+) Signed-off-by: David S. Miller <davem@davemloft.net>
* net: adding memory barrier to the poll and receive callbacksJiri Olsa2009-07-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adding memory barrier after the poll_wait function, paired with receive callbacks. Adding fuctions sock_poll_wait and sk_has_sleeper to wrap the memory barrier. Without the memory barrier, following race can happen. The race fires, when following code paths meet, and the tp->rcv_nxt and __add_wait_queue updates stay in CPU caches. CPU1 CPU2 sys_select receive packet ... ... __add_wait_queue update tp->rcv_nxt ... ... tp->rcv_nxt check sock_def_readable ... { schedule ... if (sk->sk_sleep && waitqueue_active(sk->sk_sleep)) wake_up_interruptible(sk->sk_sleep) ... } If there was no cache the code would work ok, since the wait_queue and rcv_nxt are opposit to each other. Meaning that once tp->rcv_nxt is updated by CPU2, the CPU1 either already passed the tp->rcv_nxt check and sleeps, or will get the new value for tp->rcv_nxt and will return with new data mask. In both cases the process (CPU1) is being added to the wait queue, so the waitqueue_active (CPU2) call cannot miss and will wake up CPU1. The bad case is when the __add_wait_queue changes done by CPU1 stay in its cache, and so does the tp->rcv_nxt update on CPU2 side. The CPU1 will then endup calling schedule and sleep forever if there are no more data on the socket. Calls to poll_wait in following modules were ommited: net/bluetooth/af_bluetooth.c net/irda/af_irda.c net/irda/irnet/irnet_ppp.c net/mac80211/rc80211_pid_debugfs.c net/phonet/socket.c net/rds/af_rds.c net/rfkill/core.c net/sunrpc/cache.c net/sunrpc/rpc_pipe.c net/tipc/socket.c Signed-off-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* datagram: Use frag list abstraction interfaces.David S. Miller2009-06-091-95/+83
| | | | Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Fix skb_copy_datagram_from_iovec() to pass the right offsetSridhar Samudrala2009-06-081-1/+2
| | | | | | | | | | | | | | | | | | | I am working on enabling UFO between KVM guests using virtio-net and i have some patches that i got working with 2.6.30-rc8. When i wanted to try them with net-next-2.6, i noticed that virtio-net is not working with that tree. After some debugging, it turned out to be several bugs in the recent patches to fix aio with tun driver, specifically the following 2 commits. http://git.kernel.org/?p=linux/kernel/git/davem/net-next-2.6.git;a=commitdiff;h=0a1ec07a67bd8b0033dace237249654d015efa21 http://git.kernel.org/?p=linux/kernel/git/davem/net-next-2.6.git;a=commitdiff;h=6f26c9a7555e5bcca3560919db9b852015077dae Fix the call to memcpy_from_iovecend() in skb_copy_datagram_from_iovec to pass the right iovec offset. Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Network Drop Monitor: Fix skb_kill_datagramJohn Dykstra2009-05-081-1/+3
| | | | | | | | | | | | | Commit ead2ceb0ec9f85cff19c43b5cdb2f8a054484431 ("Network Drop Monitor: Adding kfree_skb_clean for non-drops and modifying end-of-line points for skbs") established new conventions for identifying dropped packets. Align skb_kill_datagram() with these conventions so that packets that get dropped just before the copy to userspace are properly tracked. Signed-off-by: John Dykstra <john.dykstra1@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'master' of ↵David S. Miller2009-04-291-1/+13
|\ | | | | | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: Documentation/isdn/00-INDEX drivers/net/wireless/iwlwifi/iwl-scan.c drivers/net/wireless/rndis_wlan.c net/mac80211/main.c
| * net: Avoid extra wakeups of threads blocked in wait_for_packet()Eric Dumazet2009-04-281-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In 2.6.25 we added UDP mem accounting. This unfortunatly added a penalty when a frame is transmitted, since we have at TX completion time to call sock_wfree() to perform necessary memory accounting. This calls sock_def_write_space() and utimately scheduler if any thread is waiting on the socket. Thread(s) waiting for an incoming frame was scheduled, then had to sleep again as event was meaningless. (All threads waiting on a socket are using same sk_sleep anchor) This adds lot of extra wakeups and increases latencies, as noted by Christoph Lameter, and slows down softirq handler. Reference : http://marc.info/?l=linux-netdev&m=124060437012283&w=2 Fortunatly, Davide Libenzi recently added concept of keyed wakeups into kernel, and particularly for sockets (see commit 37e5540b3c9d838eb20f2ca8ea2eb8072271e403 epoll keyed wakeups: make sockets use keyed wakeups) Davide goal was to optimize epoll, but this new wakeup infrastructure can help non epoll users as well, if they care to setup an appropriate handler. This patch introduces new DEFINE_WAIT_FUNC() helper and uses it in wait_for_packet(), so that only relevant event can wakeup a thread blocked in this function. Trace of function calls from bnx2 TX completion bnx2_poll_work() is : __kfree_skb() skb_release_head_state() sock_wfree() sock_def_write_space() __wake_up_sync_key() __wake_up_common() receiver_wake_function() : Stops here since thread is waiting for an INPUT Reported-by: Christoph Lameter <cl@linux.com> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tun: fix tun_chr_aio_write so that aio worksMichael S. Tsirkin2009-04-211-6/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | aio_write gets const struct iovec * but tun_chr_aio_write casts this to struct iovec * and modifies the iovec. As a result, attempts to use io_submit to send packets to a tun device fail with weird errors such as EINVAL. Since tun is the only user of skb_copy_datagram_from_iovec, we can fix this simply by changing the later so that it does not touch the iovec passed to it. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: skb_copy_datagram_const_iovec()Michael S. Tsirkin2009-04-211-0/+92
|/ | | | | | | | | | | | | There's an skb_copy_datagram_iovec() to copy out of a paged skb, but it modifies the iovec, and does not support starting at an offset in the destination. We want both in tun.c, so let's add the function. It's a carbon copy of skb_copy_datagram_iovec() with enough changes to be annoying. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Network Drop Monitor: Adding kfree_skb_clean for non-drops and modifying ↵Neil Horman2009-03-131-1/+1
| | | | | | | | | | | | | | | end-of-line points for skbs Signed-off-by: Neil Horman <nhorman@tuxdriver.com> include/linux/skbuff.h | 4 +++- net/core/datagram.c | 2 +- net/core/skbuff.c | 22 ++++++++++++++++++++++ net/ipv4/arp.c | 2 +- net/ipv4/udp.c | 2 +- net/packet/af_packet.c | 2 +- 6 files changed, 29 insertions(+), 5 deletions(-) Signed-off-by: David S. Miller <davem@davemloft.net>
* net: sk_free_datagram() should use sk_mem_reclaim_partial()Eric Dumazet2008-11-051-3/+2
| | | | | | | | | | | | | | | | | | | | | | I noticed a contention on udp_memory_allocated on regular UDP applications. While tcp_memory_allocated is seldom used, it appears each incoming UDP frame is currently touching udp_memory_allocated when queued, and when received by application. One possible solution is to use sk_mem_reclaim_partial() instead of sk_mem_reclaim(), so that we keep a small reserve (less than one page) of memory for each UDP socket. We did something very similar on TCP side in commit 9993e7d313e80bdc005d09c7def91903e0068f07 ([TCP]: Do not purge sk_forward_alloc entirely in tcp_delack_timer()) A more complex solution would need to convert prot->memory_allocated to use a percpu_counter with batches of 64 or 128 pages. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Rationalise email address: Network Specific PartsAlan Cox2008-10-131-1/+1
| | | | | | | | | | Clean up the various different email addresses of mine listed in the code to a single current and valid address. As Dave says his network merges for 2.6.28 are now done this seems a good point to send them in where they won't risk disrupting real changes. Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: skb_copy_datagram_from_iovec()Rusty Russell2008-08-151-0/+87
| | | | | | | | | | | | There's an skb_copy_datagram_iovec() to copy out of a paged skb, but nothing the other way around (because we don't do that). We want to allocate big skbs in tun.c, so let's add the function. It's a carbon copy of skb_copy_datagram_iovec() with enough changes to be annoying. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: convert BUG_TRAP to generic WARN_ONIlpo Järvinen2008-07-251-4/+4
| | | | | | | | | | | | | | Removes legacy reinvent-the-wheel type thing. The generic machinery integrates much better to automated debugging aids such as kerneloops.org (and others), and is unambiguous due to better naming. Non-intuively BUG_TRAP() is actually equal to WARN_ON() rather than BUG_ON() though some might actually be promoted to BUG_ON() but I left that to future. I could make at least one BUILD_BUG_ON conversion. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET] CORE: Introducing new memory accounting interface.Hideo Aoki2008-01-281-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces new memory accounting functions for each network protocol. Most of them are renamed from memory accounting functions for stream protocols. At the same time, some stream memory accounting functions are removed since other functions do same thing. Renaming: sk_stream_free_skb() -> sk_wmem_free_skb() __sk_stream_mem_reclaim() -> __sk_mem_reclaim() sk_stream_mem_reclaim() -> sk_mem_reclaim() sk_stream_mem_schedule -> __sk_mem_schedule() sk_stream_pages() -> sk_mem_pages() sk_stream_rmem_schedule() -> sk_rmem_schedule() sk_stream_wmem_schedule() -> sk_wmem_schedule() sk_charge_skb() -> sk_mem_charge() Removeing sk_stream_rfree(): consolidates into sock_rfree() sk_stream_set_owner_r(): consolidates into skb_set_owner_r() sk_stream_mem_schedule() The following functions are added. sk_has_account(): check if the protocol supports accounting sk_mem_uncharge(): do the opposite of sk_mem_charge() In addition, to achieve consolidation, updating sk_wmem_queued is removed from sk_mem_charge(). Next, to consolidate memory accounting functions, this patch adds memory accounting calls to network core functions. Moreover, present memory accounting call is renamed to new accounting call. Finally we replace present memory accounting calls with new interface in TCP and SCTP. Signed-off-by: Takahiro Yasui <tyasui@redhat.com> Signed-off-by: Hideo Aoki <haoki@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [UDP]: Only increment counter on first peek/recvHerbert Xu2008-01-281-16/+27
| | | | | | | | | | | | | | | | | | | | | | The previous move of the the UDP inDatagrams counter caused each peek of the same packet to be counted separately. This may be undesirable. This patch fixes this by adding a bit to sk_buff to record whether this packet has already been seen through skb_recv_datagram. We then only increment the counter when the packet is seen for the first time. The only dodgy part is the fact that skb_recv_datagram doesn't have a good way of returning this new bit of information. So I've added a new function __skb_recv_datagram that does return this and made skb_recv_datagram a wrapper around it. The plan is to eventually replace all uses of skb_recv_datagram with this new function at which time it can be renamed its proper name. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [UDP]: Avoid repeated counting of checksum errors due to peekingHerbert Xu2008-01-281-1/+8
| | | | | | | | | | | | Currently it is possible for two processes to peek on the same socket and end up incrementing the error counter twice for the same packet. This patch fixes it by making skb_kill_datagram return whether it succeeded in unlinking the packet and only incrementing the counter if it did. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Do not dereference iov if length is zeroHerbert Xu2007-09-111-0/+3
| | | | | | | | | | | | When msg_iovlen is zero we shouldn't try to dereference msg_iov. Right now the only thing that tries to do so is skb_copy_and_csum_datagram_iovec. Since the total length should also be zero if msg_iovlen is zero, it's sufficient to check the total length there and simply return if it's zero. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Revert sk_buff walker cleanups.David S. Miller2007-04-271-17/+33
| | | | | | | | | | | This reverts eefa3906283a2b60a6d02a2cda593a7d7d7946c5 The simplification made in that change works with the assumption that the 'offset' parameter to these functions is always positive or zero, which is not true. It can be and often is negative in order to access SKB header values in front of skb->data. Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Clean up sk_buff walkers.Jean Delvare2007-04-261-33/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I noticed recently that, in skb_checksum(), "offset" and "start" are essentially the same thing and have the same value throughout the function, despite being computed differently. Using a single variable allows some cleanups and makes the skb_checksum() function smaller, more readable, and presumably marginally faster. We appear to have many other "sk_buff walker" functions built on the exact same model, so the cleanup applies to them, too. Here is a list of the functions I found to be affected: net/appletalk/ddp.c:atalk_sum_skb() net/core/datagram.c:skb_copy_datagram_iovec() net/core/datagram.c:skb_copy_and_csum_datagram() net/core/skbuff.c:skb_copy_bits() net/core/skbuff.c:skb_store_bits() net/core/skbuff.c:skb_checksum() net/core/skbuff.c:skb_copy_and_csum_bit() net/core/user_dma.c:dma_skb_copy_datagram_iovec() net/xfrm/xfrm_algo.c:skb_icv_walk() net/xfrm/xfrm_algo.c:skb_to_sgvec() OTOH, I admit I'm a bit surprised, the cleanup is rather obvious so I'm really wondering if I am missing something. Can anyone please comment on this? Signed-off-by: Jean Delvare <jdelvare@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* [UDP]: Clean up UDP-Lite receive checksumHerbert Xu2007-04-251-2/+8
| | | | | | | | | | | | | | | | | | | | | | This patch eliminates some duplicate code for the verification of receive checksums between UDP-Lite and UDP. It does this by introducing __skb_checksum_complete_head which is identical to __skb_checksum_complete_head apart from the fact that it takes a length parameter rather than computing the first skb->len bytes. As a result UDP-Lite will be able to use hardware checksum offload for packets which do not use partial coverage checksums. It also means that UDP-Lite loopback no longer does unnecessary checksum verification. If any NICs start support UDP-Lite this would also start working automatically. This patch removes the assumption that msg_flags has MSG_TRUNC clear upon entry in recvmsg. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET] CORE: Fix whitespace errors.YOSHIFUJI Hideaki2007-02-101-1/+1
| | | | | Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Annotate __skb_checksum_complete() and friends.Al Viro2006-12-021-1/+1
| | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Annotate callers of csum_partial_copy_...() and csum_and_copy...() in ↵Al Viro2006-12-021-3/+3
| | | | | | | net/* Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Annotate callers of csum_fold() in net/*Al Viro2006-12-021-4/+4
| | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Replace CHECKSUM_HW by CHECKSUM_PARTIAL/CHECKSUM_COMPLETEPatrick McHardy2006-09-221-2/+2
| | | | | | | | | | | Replace CHECKSUM_HW by CHECKSUM_PARTIAL (for outgoing packets, whose checksum still needs to be completed) and CHECKSUM_COMPLETE (for incoming packets, device supplied full checksum). Patch originally from Herbert Xu, updated by myself for 2.6.18-rc3. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [PATCH] POLLRDHUP/EPOLLRDHUP handling for half-closed devices notificationsDavide Libenzi2006-03-251-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement the half-closed devices notifiation, by adding a new POLLRDHUP (and its alias EPOLLRDHUP) bit to the existing poll/select sets. Since the existing POLLHUP handling, that does not report correctly half-closed devices, was feared to be changed, this implementation leaves the current POLLHUP reporting unchanged and simply add a new bit that is set in the few places where it makes sense. The same thing was discussed and conceptually agreed quite some time ago: http://lkml.org/lkml/2003/7/12/116 Since this new event bit is added to the existing Linux poll infrastruture, even the existing poll/select system calls will be able to use it. As far as the existing POLLHUP handling, the patch leaves it as is. The pollrdhup-2.6.16.rc5-0.10.diff defines the POLLRDHUP for all the existing archs and sets the bit in the six relevant files. The other attached diff is the simple change required to sys/epoll.h to add the EPOLLRDHUP definition. There is "a stupid program" to test POLLRDHUP delivery here: http://www.xmailserver.org/pollrdhup-test.c It tests poll(2), but since the delivery is same epoll(2) will work equally. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [NET]: Revert skb_copy_datagram_iovec() recursion elimination.David S. Miller2006-02-131-28/+53
| | | | | | | | | | | | | Revert the following changeset: bc8dfcb93970ad7139c976356bfc99d7e251deaf Recursive SKB frag lists are really possible and disallowing them breaks things. Noticed by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IP]: Simplify and consolidate MSG_PEEK error handlingHerbert Xu2006-01-031-0/+36
| | | | | | | | | | | | | | | | When a packet is obtained from skb_recv_datagram with MSG_PEEK enabled it is left on the socket receive queue. This means that when we detect a checksum error we have to be careful when trying to free the packet as someone could have dequeued it in the time being. Currently this delicate logic is duplicated three times between UDPv4, UDPv6 and RAWv6. This patch moves them into a one place and simplifies the code somewhat. This is based on a suggestion by Eric Dumazet. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Detect hardware rx checksum faults correctlyHerbert Xu2005-11-101-3/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | Here is the patch that introduces the generic skb_checksum_complete which also checks for hardware RX checksum faults. If that happens, it'll call netdev_rx_csum_fault which currently prints out a stack trace with the device name. In future it can turn off RX checksum. I've converted every spot under net/ that does RX checksum checks to use skb_checksum_complete or __skb_checksum_complete with the exceptions of: * Those places where checksums are done bit by bit. These will call netdev_rx_csum_fault directly. * The following have not been completely checked/converted: ipmr ip_vs netfilter dccp This patch is based on patches and suggestions from Stephen Hemminger and David S. Miller. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Fix zero-size datagram receptionHerbert Xu2005-11-021-0/+4
| | | | | | | | The recent rewrite of skb_copy_datagram_iovec broke the reception of zero-size datagrams. This patch fixes it. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
* [NET]: Use non-recursive algorithm in skb_copy_datagram_iovec()Daniel Phillips2005-09-271-55/+26
| | | | | | | | Use iteration instead of recursion. Fraglists within fraglists should never occur, so we BUG check this. Signed-off-by: Daniel Phillips <phillips@istop.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP]: Move the tcp sock states to net/tcp_states.hArnaldo Carvalho de Melo2005-08-291-3/+3
| | | | | | | | | | | Lots of places just needs the states, not even linux/tcp.h, where this enum was, needs it. This speeds up development of the refactorings as less sources are rebuilt when things get moved from net/tcp.h. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [PATCH] DocBook: fix some descriptionsMartin Waitz2005-05-011-2/+2
| | | | | | | | | Some KernelDoc descriptions are updated to match the current code. No code changes. Signed-off-by: Martin Waitz <tali@admingilde.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] DocBook: changes and extensions to the kernel documentationPavel Pisa2005-05-011-14/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I have recompiled Linux kernel 2.6.11.5 documentation for me and our university students again. The documentation could be extended for more sources which are equipped by structured comments for recent 2.6 kernels. I have tried to proceed with that task. I have done that more times from 2.6.0 time and it gets boring to do same changes again and again. Linux kernel compiles after changes for i386 and ARM targets. I have added references to some more files into kernel-api book, I have added some section names as well. So please, check that changes do not break something and that categories are not too much skewed. I have changed kernel-doc to accept "fastcall" and "asmlinkage" words reserved by kernel convention. Most of the other changes are modifications in the comments to make kernel-doc happy, accept some parameters description and do not bail out on errors. Changed <pid> to @pid in the description, moved some #ifdef before comments to correct function to comments bindings, etc. You can see result of the modified documentation build at http://cmp.felk.cvut.cz/~pisa/linux/lkdb-2.6.11.tar.gz Some more sources are ready to be included into kernel-doc generated documentation. Sources has been added into kernel-api for now. Some more section names added and probably some more chaos introduced as result of quick cleanup work. Signed-off-by: Pavel Pisa <pisa@cmp.felk.cvut.cz> Signed-off-by: Martin Waitz <tali@admingilde.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* Linux-2.6.12-rc2Linus Torvalds2005-04-161-0/+482
Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!