aboutsummaryrefslogtreecommitdiffstats
path: root/net/sctp/socket.c
Commit message (Collapse)AuthorAgeFilesLines
* merged 3.0.101 tagWolfgang Wiedmeyer2015-10-221-0/+6
|
* net: sctp: sctp_setsockopt_auth_key: use kzfree instead of kfreeDaniel Borkmann2013-02-141-1/+1
| | | | | | | | | | | | | | | [ Upstream commit 6ba542a291a5e558603ac51cda9bded347ce7627 ] In sctp_setsockopt_auth_key, we create a temporary copy of the user passed shared auth key for the endpoint or association and after internal setup, we free it right away. Since it's sensitive data, we should zero out the key before returning the memory back to the allocator. Thus, use kzfree instead of kfree, just as we do in sctp_auth_key_put(). Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* sctp: fix -ENOMEM result with invalid user space pointer in sendto() syscallTommi Rantala2013-01-111-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 6e51fe7572590d8d86e93b547fab6693d305fd0d ] Consider the following program, that sets the second argument to the sendto() syscall incorrectly: #include <string.h> #include <arpa/inet.h> #include <sys/socket.h> int main(void) { int fd; struct sockaddr_in sa; fd = socket(AF_INET, SOCK_STREAM, 132 /*IPPROTO_SCTP*/); if (fd < 0) return 1; memset(&sa, 0, sizeof(sa)); sa.sin_family = AF_INET; sa.sin_addr.s_addr = inet_addr("127.0.0.1"); sa.sin_port = htons(11111); sendto(fd, NULL, 1, 0, (struct sockaddr *)&sa, sizeof(sa)); return 0; } We get -ENOMEM: $ strace -e sendto ./demo sendto(3, NULL, 1, 0, {sa_family=AF_INET, sin_port=htons(11111), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 ENOMEM (Cannot allocate memory) Propagate the error code from sctp_user_addto_chunk(), so that we will tell user space what actually went wrong: $ strace -e sendto ./demo sendto(3, NULL, 1, 0, {sa_family=AF_INET, sin_port=htons(11111), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EFAULT (Bad address) Noticed while running Trinity (the syscall fuzzer). Signed-off-by: Tommi Rantala <tt.rantala@gmail.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* sctp: Fix list corruption resulting from freeing an association on a listNeil Horman2012-08-091-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 2eebc1e188e9e45886ee00662519849339884d6d ] A few days ago Dave Jones reported this oops: [22766.294255] general protection fault: 0000 [#1] PREEMPT SMP [22766.295376] CPU 0 [22766.295384] Modules linked in: [22766.387137] ffffffffa169f292 6b6b6b6b6b6b6b6b ffff880147c03a90 ffff880147c03a74 [22766.387135] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 00000000000 [22766.387136] Process trinity-watchdo (pid: 10896, threadinfo ffff88013e7d2000, [22766.387137] Stack: [22766.387140] ffff880147c03a10 [22766.387140] ffffffffa169f2b6 [22766.387140] ffff88013ed95728 [22766.387143] 0000000000000002 [22766.387143] 0000000000000000 [22766.387143] ffff880003fad062 [22766.387144] ffff88013c120000 [22766.387144] [22766.387145] Call Trace: [22766.387145] <IRQ> [22766.387150] [<ffffffffa169f292>] ? __sctp_lookup_association+0x62/0xd0 [sctp] [22766.387154] [<ffffffffa169f2b6>] __sctp_lookup_association+0x86/0xd0 [sctp] [22766.387157] [<ffffffffa169f597>] sctp_rcv+0x207/0xbb0 [sctp] [22766.387161] [<ffffffff810d4da8>] ? trace_hardirqs_off_caller+0x28/0xd0 [22766.387163] [<ffffffff815827e3>] ? nf_hook_slow+0x133/0x210 [22766.387166] [<ffffffff815902fc>] ? ip_local_deliver_finish+0x4c/0x4c0 [22766.387168] [<ffffffff8159043d>] ip_local_deliver_finish+0x18d/0x4c0 [22766.387169] [<ffffffff815902fc>] ? ip_local_deliver_finish+0x4c/0x4c0 [22766.387171] [<ffffffff81590a07>] ip_local_deliver+0x47/0x80 [22766.387172] [<ffffffff8158fd80>] ip_rcv_finish+0x150/0x680 [22766.387174] [<ffffffff81590c54>] ip_rcv+0x214/0x320 [22766.387176] [<ffffffff81558c07>] __netif_receive_skb+0x7b7/0x910 [22766.387178] [<ffffffff8155856c>] ? __netif_receive_skb+0x11c/0x910 [22766.387180] [<ffffffff810d423e>] ? put_lock_stats.isra.25+0xe/0x40 [22766.387182] [<ffffffff81558f83>] netif_receive_skb+0x23/0x1f0 [22766.387183] [<ffffffff815596a9>] ? dev_gro_receive+0x139/0x440 [22766.387185] [<ffffffff81559280>] napi_skb_finish+0x70/0xa0 [22766.387187] [<ffffffff81559cb5>] napi_gro_receive+0xf5/0x130 [22766.387218] [<ffffffffa01c4679>] e1000_receive_skb+0x59/0x70 [e1000e] [22766.387242] [<ffffffffa01c5aab>] e1000_clean_rx_irq+0x28b/0x460 [e1000e] [22766.387266] [<ffffffffa01c9c18>] e1000e_poll+0x78/0x430 [e1000e] [22766.387268] [<ffffffff81559fea>] net_rx_action+0x1aa/0x3d0 [22766.387270] [<ffffffff810a495f>] ? account_system_vtime+0x10f/0x130 [22766.387273] [<ffffffff810734d0>] __do_softirq+0xe0/0x420 [22766.387275] [<ffffffff8169826c>] call_softirq+0x1c/0x30 [22766.387278] [<ffffffff8101db15>] do_softirq+0xd5/0x110 [22766.387279] [<ffffffff81073bc5>] irq_exit+0xd5/0xe0 [22766.387281] [<ffffffff81698b03>] do_IRQ+0x63/0xd0 [22766.387283] [<ffffffff8168ee2f>] common_interrupt+0x6f/0x6f [22766.387283] <EOI> [22766.387284] [22766.387285] [<ffffffff8168eed9>] ? retint_swapgs+0x13/0x1b [22766.387285] Code: c0 90 5d c3 66 0f 1f 44 00 00 4c 89 c8 5d c3 0f 1f 00 55 48 89 e5 48 83 ec 20 48 89 5d e8 4c 89 65 f0 4c 89 6d f8 66 66 66 66 90 <0f> b7 87 98 00 00 00 48 89 fb 49 89 f5 66 c1 c0 08 66 39 46 02 [22766.387307] [22766.387307] RIP [22766.387311] [<ffffffffa168a2c9>] sctp_assoc_is_match+0x19/0x90 [sctp] [22766.387311] RSP <ffff880147c039b0> [22766.387142] ffffffffa16ab120 [22766.599537] ---[ end trace 3f6dae82e37b17f5 ]--- [22766.601221] Kernel panic - not syncing: Fatal exception in interrupt It appears from his analysis and some staring at the code that this is likely occuring because an association is getting freed while still on the sctp_assoc_hashtable. As a result, we get a gpf when traversing the hashtable while a freed node corrupts part of the list. Nominally I would think that an mibalanced refcount was responsible for this, but I can't seem to find any obvious imbalance. What I did note however was that the two places where we create an association using sctp_primitive_ASSOCIATE (__sctp_connect and sctp_sendmsg), have failure paths which free a newly created association after calling sctp_primitive_ASSOCIATE. sctp_primitive_ASSOCIATE brings us into the sctp_sf_do_prm_asoc path, which issues a SCTP_CMD_NEW_ASOC side effect, which in turn adds a new association to the aforementioned hash table. the sctp command interpreter that process side effects has not way to unwind previously processed commands, so freeing the association from the __sctp_connect or sctp_sendmsg error path would lead to a freed association remaining on this hash table. I've fixed this but modifying sctp_[un]hash_established to use hlist_del_init, which allows us to proerly use hlist_unhashed to check if the node is on a hashlist safely during a delete. That in turn alows us to safely call sctp_unhash_established in the __sctp_connect and sctp_sendmsg error paths before freeing them, regardles of what the associations state is on the hash list. I noted, while I was doing this, that the __sctp_unhash_endpoint was using hlist_unhsashed in a simmilar fashion, but never nullified any removed nodes pointers to make that function work properly, so I fixed that up in a simmilar fashion. I attempted to test this using a virtual guest running the SCTP_RR test from netperf in a loop while running the trinity fuzzer, both in a loop. I wasn't able to recreate the problem prior to this fix, nor was I able to trigger the failure after (neither of which I suppose is suprising). Given the trace above however, I think its likely that this is what we hit. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reported-by: davej@redhat.com CC: davej@redhat.com CC: "David S. Miller" <davem@davemloft.net> CC: Vlad Yasevich <vyasevich@gmail.com> CC: Sridhar Samudrala <sri@us.ibm.com> CC: linux-sctp@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* sctp: Allow struct sctp_event_subscribe to grow without breaking binariesThomas Graf2012-04-271-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit acdd5985364f8dc511a0762fab2e683f29d9d692 ] getsockopt(..., SCTP_EVENTS, ...) performs a length check and returns an error if the user provides less bytes than the size of struct sctp_event_subscribe. Struct sctp_event_subscribe needs to be extended by an u8 for every new event or notification type that is added. This obviously makes getsockopt fail for binaries that are compiled against an older versions of <net/sctp/user.h> which do not contain all event types. This patch changes getsockopt behaviour to no longer return an error if not enough bytes are being provided by the user. Instead, it returns as much of sctp_event_subscribe as fits into the provided buffer. This leads to the new behavior that users see what they have been aware of at compile time. The setsockopt(..., SCTP_EVENTS, ...) API is already behaving like this. Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* sctp: fix incorrect overflow check on autocloseXi Wang2012-01-061-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 2692ba61a82203404abd7dd2a027bda962861f74 ] Commit 8ffd3208 voids the previous patches f6778aab and 810c0719 for limiting the autoclose value. If userspace passes in -1 on 32-bit platform, the overflow check didn't work and autoclose would be set to 0xffffffff. This patch defines a max_autoclose (in seconds) for limiting the value and exposes it through sysctl, with the following intentions. 1) Avoid overflowing autoclose * HZ. 2) Keep the default autoclose bound consistent across 32- and 64-bit platforms (INT_MAX / HZ in this patch). 3) Keep the autoclose value consistent between setsockopt() and getsockopt() calls. Suggested-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* sctp: ABORT if receive, reassmbly, or reodering queue is not empty while ↵Thomas Graf2011-07-081-5/+8
| | | | | | | | | | | | | | | | | | | | | closing socket Trigger user ABORT if application closes a socket which has data queued on the socket receive queue or chunks waiting on the reassembly or ordering queue as this would imply data being lost which defeats the point of a graceful shutdown. This behavior is already practiced in TCP. We do not check the input queue because that would mean to parse all chunks on it to look for unacknowledged data which seems too much of an effort. Control chunks or duplicated chunks may also be in the input queue and should not be stopping a graceful shutdown. Signed-off-by: Thomas Graf <tgraf@infradead.org> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: fix missing send up SCTP_SENDER_DRY_EVENT when subscribe itWei Yongjun2011-07-071-0/+23
| | | | | | | | | | | | | | We forgot to send up SCTP_SENDER_DRY_EVENT notification when user app subscribes to this event, and there is no data to be sent or retransmit. This is required by the Socket API and used by the DTLS/SCTP implementation. Reported-by: Michael Tüxen <Michael.Tuexen@lurchi.franken.de> Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Tested-by: Robin Seggelmann <seggelmann@fh-muenster.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: sctp_sendmsg: Don't test known non-null sinfoJoe Perches2011-05-121-6/+4
| | | | | | | It's already known non-null above. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: sctp_sendmsg: Don't initialize default_sinfoJoe Perches2011-05-121-1/+2
| | | | | | | | | | | This variable only needs initialization when cmsgs.info is NULL. Use memset to ensure padding is also zeroed so kernel doesn't leak any data. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: cache the ipv6 source after route lookupVlad Yasevich2011-04-271-1/+1
| | | | | | | | | | | | | | | The ipv6 routing lookup does give us a source address, but instead of filling it into the dst, it's stored in the flowi. We can use that instead of going through the entire source address selection again. Also the useless ->dst_saddr member of sctp_pf is removed. And sctp_v6_dst_saddr() is removed, instead by introduce sctp_v6_to_addr(), which can be reused to cleanup some dup code. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: implement socket option SCTP_GET_ASSOC_ID_LISTWei Yongjun2011-04-211-0/+52
| | | | | | | | | | | | | | | This patch Implement socket option SCTP_GET_ASSOC_ID_LIST. SCTP Socket API Extension: 8.2.6. Get the Current Identifiers of Associations (SCTP_GET_ASSOC_ID_LIST) This option gets the current list of SCTP association identifiers of the SCTP associations handled by a one-to-many style socket. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: Allow bindx_del to accept 0 portVlad Yasevich2011-04-191-1/+5
| | | | | | | | | | | We allow 0 port when adding new addresses. It only makes sence to allow 0 port when removing addresses. When removing the currently bound port will be used when the port in the address is set to 0. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: use memdup_user to copy data from userspaceShan Wei2011-04-191-16/+6
| | | | | | | | | Use common function to simply code. Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Fix common misspellingsLucas De Marchi2011-03-311-1/+1
| | | | | | Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
* sctp: several declared/set but unused fixesHagen Paul Pfeifer2011-03-071-2/+0
| | | | | Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: add __rcu annotations to sk_wq and wqEric Dumazet2011-02-221-4/+5
| | | | | | | | | | | Add proper RCU annotations/verbs to sk_wq and wq members Fix __sctp_write_space() sk_sleep() abuse (and sock->wq access) Fix sunrpc sk_sleep() abuse too Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: user perfect name for Delayed SACK Timer optionShan Wei2011-01-191-2/+2
| | | | | | | | | | | | | | | | | The option name of Delayed SACK Timer should be SCTP_DELAYED_SACK, not SCTP_DELAYED_ACK. Left SCTP_DELAYED_ACK be concomitant with SCTP_DELAYED_SACK, for making compatibility with existing applications. Reference: 8.1.19. Get or Set Delayed SACK Timer (SCTP_DELAYED_SACK) (http://tools.ietf.org/html/draft-ietf-tsvwg-sctpsocket-25) Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Acked-by: Wei Yongjun <yjwei@cn.fujitsu.com> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'master' of ↵David S. Miller2010-12-171-1/+9
|\ | | | | | | | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/bnx2x/bnx2x.h drivers/net/wireless/iwlwifi/iwl-1000.c drivers/net/wireless/iwlwifi/iwl-6000.c drivers/net/wireless/iwlwifi/iwl-core.h drivers/vhost/vhost.c
| * sctp: fix the return value of getting the sctp partial delivery pointWei Yongjun2010-12-161-1/+1
| | | | | | | | | | | | | | | | | | Get the sctp partial delivery point using SCTP_PARTIAL_DELIVERY_POINT socket option should return 0 if success, not -ENOTSUPP. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * SCTP: Fix SCTP_SET_PEER_PRIMARY_ADDR to accpet v4mapped addressWei Yongjun2010-12-101-0/+8
| | | | | | | | | | | | | | | | | | | | | | SCTP_SET_PEER_PRIMARY_ADDR does not accpet v4mapped address, using v4mapped address in SCTP_SET_PEER_PRIMARY_ADDR socket option will get -EADDRNOTAVAIL error if v4map is enabled. This patch try to fix it by mapping v4mapped address to v4 address if allowed. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Fix a typo in datagram.c and sctp/socket.c.David Shwatrz2010-12-061-1/+1
|/ | | | | | | | | | | Hi, This patch fixes a typo in net/core/datagram.c and in net/sctp/socket.c Regards, David Shwartz Signed-off-by: David Shwartz <dshwatrz@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: avoid limits overflowEric Dumazet2010-11-101-2/+2
| | | | | | | | | | | | | | Robin Holt tried to boot a 16TB machine and found some limits were reached : sysctl_tcp_mem[2], sysctl_udp_mem[2] We can switch infrastructure to use long "instead" of "int", now atomic_long_t primitives are available for free. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Reported-by: Robin Holt <holt@sgi.com> Reviewed-by: Robin Holt <holt@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'master' of ↵David S. Miller2010-10-041-1/+12
|\ | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/ipv4/Kconfig net/ipv4/tcp_timer.c
| * sctp: prevent reading out-of-bounds memoryDan Rosenberg2010-10-031-1/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Two user-controlled allocations in SCTP are subsequently dereferenced as sockaddr structs, without checking if the dereferenced struct members fall beyond the end of the allocated chunk. There doesn't appear to be any information leakage here based on how these members are used and additional checking, but it's still worth fixing. [akpm@linux-foundation.org: remove unfashionable newlines, fix gmail tab->space conversion] Signed-off-by: Dan Rosenberg <dan.j.rosenberg@gmail.com> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | sctp: Fix break indentation in sctp_ioctl().David S. Miller2010-10-031-1/+1
| | | | | | | | Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: return operator cleanupEric Dumazet2010-09-231-3/+3
| | | | | | | | | | | | | | | | | | Change "return (EXPR);" to "return EXPR;" return is not a function, parentheses are not required. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | sctp: implement SIOCINQ ioctl() (take 3)Diego Elio 'Flameeyes' Pettenò2010-09-081-1/+34
| | | | | | | | | | | | | | | | | | | | This simple patch copies the current approach for SIOCINQ ioctl() from DCCP into SCTP so that the userland code working with SCTP can use a similar interface across different protocols to know how much space to allocate for a buffer. Signed-off-by: Diego Elio Pettenò <flameeyes@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: poll() optimizationsEric Dumazet2010-09-061-3/+2
| | | | | | | | | | | | | | No need to test twice sk->sk_shutdown & RCV_SHUTDOWN Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net/sctp: Use pr_fmt and pr_<level>Joe Perches2010-08-261-24/+15
|/ | | | | | | | | | | | Change SCTP_DEBUG_PRINTK and SCTP_DEBUG_PRINTK_IPADDR to use do { print } while (0) guards. Add SCTP_DEBUG_PRINTK_CONT to fix errors in log when lines were continued. Add #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt Add a missing newline in "Failed bind hash alloc" Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: reserve ports for applications using fixed port numbersAmerigo Wang2010-05-151-0/+2
| | | | | | | | | | | | | | | | | | | (Dropped the infiniband part, because Tetsuo modified the related code, I will send a separate patch for it once this is accepted.) This patch introduces /proc/sys/net/ipv4/ip_local_reserved_ports which allows users to reserve ports for third-party applications. The reserved ports will not be used by automatic port assignments (e.g. when calling connect() or bind() with port number 0). Explicit port allocation behavior is unchanged. Signed-off-by: Octavian Purdila <opurdila@ixiacom.com> Signed-off-by: WANG Cong <amwang@redhat.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'net-next' of ↵David S. Miller2010-05-031-1/+1
|\ | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/vxy/lksctp-dev Add missing linux/vmalloc.h include to net/sctp/probe.c Signed-off-by: David S. Miller <davem@davemloft.net>
| * sctp: Use correct address family in sctp_getsockopt_peer_addrs()Vlad Yasevich2010-04-301-1/+1
| | | | | | | | | | | | | | The function should use the address family of the address when trying to determine the length of the structure. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
* | Merge branch 'master' of ↵David S. Miller2010-05-021-2/+15
|\ \ | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
| * | sctp: per_cpu variables should be in bh_disabled sectionVlad Yasevich2010-04-281-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | Since the change of the atomics to percpu variables, we now have to disable BH in process context when touching percpu variables. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | sctp: avoid irq lock inversion while call sk->sk_data_ready()Wei Yongjun2010-04-281-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sk->sk_data_ready() of sctp socket can be called from both BH and non-BH contexts, but the default sk->sk_data_ready(), sock_def_readable(), can not be used in this case. Therefore, we have to make a new function sctp_data_ready() to grab sk->sk_data_ready() with BH disabling. ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.33-rc6 #129 --------------------------------------------------------- sctp_darn/1517 just changed the state of lock: (clock-AF_INET){++.?..}, at: [<c06aab60>] sock_def_readable+0x20/0x80 but this lock took another, SOFTIRQ-unsafe lock in the past: (slock-AF_INET){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: 1 lock held by sctp_darn/1517: #0: (sk_lock-AF_INET){+.+.+.}, at: [<cdfe363d>] sctp_sendmsg+0x23d/0xc00 [sctp] Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: sock_def_readable() and friends RCU conversionEric Dumazet2010-05-011-1/+1
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sk_callback_lock rwlock actually protects sk->sk_sleep pointer, so we need two atomic operations (and associated dirtying) per incoming packet. RCU conversion is pretty much needed : 1) Add a new structure, called "struct socket_wq" to hold all fields that will need rcu_read_lock() protection (currently: a wait_queue_head_t and a struct fasync_struct pointer). [Future patch will add a list anchor for wakeup coalescing] 2) Attach one of such structure to each "struct socket" created in sock_alloc_inode(). 3) Respect RCU grace period when freeing a "struct socket_wq" 4) Change sk_sleep pointer in "struct sock" by sk_wq, pointer to "struct socket_wq" 5) Change sk_sleep() function to use new sk->sk_wq instead of sk->sk_sleep 6) Change sk_has_sleeper() to wq_has_sleeper() that must be used inside a rcu_read_lock() section. 7) Change all sk_has_sleeper() callers to : - Use rcu_read_lock() instead of read_lock(&sk->sk_callback_lock) - Use wq_has_sleeper() to eventually wakeup tasks. - Use rcu_read_unlock() instead of read_unlock(&sk->sk_callback_lock) 8) sock_wake_async() is modified to use rcu protection as well. 9) Exceptions : macvtap, drivers/net/tun.c, af_unix use integrated "struct socket_wq" instead of dynamically allocated ones. They dont need rcu freeing. Some cleanups or followups are probably needed, (possible sk_callback_lock conversion to a spinlock for example...). Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: sk_add_backlog() take rmem_alloc into accountEric Dumazet2010-04-271-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current socket backlog limit is not enough to really stop DDOS attacks, because user thread spend many time to process a full backlog each round, and user might crazy spin on socket lock. We should add backlog size and receive_queue size (aka rmem_alloc) to pace writers, and let user run without being slow down too much. Introduce a sk_rcvqueues_full() helper, to avoid taking socket lock in stress situations. Under huge stress from a multiqueue/RPS enabled NIC, a single flow udp receiver can now process ~200.000 pps (instead of ~100 pps before the patch) on a 8 core machine. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: sk_sleep() helperEric Dumazet2010-04-201-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Define a new function to return the waitqueue of a "struct sock". static inline wait_queue_head_t *sk_sleep(struct sock *sk) { return sk->sk_sleep; } Change all read occurrences of sk_sleep by a call to this function. Needed for a future RCU conversion. sk_sleep wont be a field directly available. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'master' of ↵David S. Miller2010-04-111-0/+1
|\ \ | |/ | | | | | | | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/stmmac/stmmac_main.c drivers/net/wireless/wl12xx/wl1271_cmd.c drivers/net/wireless/wl12xx/wl1271_main.c drivers/net/wireless/wl12xx/wl1271_spi.c net/core/ethtool.c net/mac80211/scan.c
| * include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo2010-03-301-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
* | sctp: eliminate useless codeHagen Paul Pfeifer2010-03-301-1/+0
|/ | | | | | | | | Remove duplicate declaration of symbol: struct hlist_node *node was already declared, the seconds declaration shadows the first one. CC: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: use limited socket backlogZhu Yi2010-03-051-0/+3
| | | | | | | | | Make sctp adapt to the limited socket backlog change. Cc: Vlad Yasevich <vladislav.yasevich@hp.com> Cc: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: Zhu Yi <yi.zhu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/sctp: Eliminate useless codeJulia Lawall2010-01-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | The variable newinet is initialized twice to the same (side effect-free) expression. Drop one initialization. A simplified version of the semantic match that finds this problem is: (http://coccinelle.lip6.fr/) // <smpl> @forall@ idexpression *x; identifier f!=ERR_PTR; @@ x = f(...) ... when != x ( x = f(...,<+...x...+>,...) | * x = f(...) ) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/sctp/socket.c: squish warningAndrew Morton2010-01-031-2/+1
| | | | | | | | | | | net/sctp/socket.c: In function 'sctp_setsockopt_autoclose': net/sctp/socket.c:2090: warning: comparison is always false due to limited range of data type Cc: Andrei Pelinescu-Onciul <andrei@iptel.org> Cc: Vlad Yasevich <vladislav.yasevich@hp.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: fix sctp_setsockopt_autoclose compile warningAndrei Pelinescu-Onciul2009-12-021-1/+1
| | | | | | | | | | | Fix the following warning, when building on 64 bits: net/sctp/socket.c:2091: warning: large integer implicitly truncated to unsigned type Signed-off-by: Andrei Pelinescu-Onciul <andrei@iptel.org> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Move && and || to end of previous lineJoe Perches2009-11-291-2/+2
| | | | | | | | | | | Not including net/atm/ Compiled tested x86 allyesconfig only Added a > 80 column line or two, which I ignored. Existing checkpatch plaints willfully, cheerfully ignored. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: limit maximum autoclose setsockopt valueAndrei Pelinescu-Onciul2009-11-231-0/+3
| | | | | | | | | To avoid overflowing the maximum timer interval when transforming the autoclose interval from seconds to jiffies, limit the maximum autoclose value to MAX_SCHEDULE_TIMEOUT/HZ. Signed-off-by: Andrei Pelinescu-Onciul <andrei@iptel.org> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
* sctp: remove deprecated SCTP_GET_*_OLD stuffsAmerigo Wang2009-11-231-325/+0
| | | | | | | | SCTP_GET_*_OLD stuffs are schedlued to be removed. Cc: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: WANG Cong <amwang@redhat.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
* sctp: allow setting path_maxrxt independent of SPP_PMTUD_ENABLEAndrei Pelinescu-Onciul2009-11-231-4/+3
| | | | | | | | | | Since draft-ietf-tsvwg-sctpsocket-15.txt, setting the SPP_MTUD_ENABLE flag when changing pathmaxrxt via the SCTP_PEER_ADDR_PARAMS setsockopt is not required any longer. Signed-off-by: Andrei Pelinescu-Onciul <andrei@iptel.org> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>