aboutsummaryrefslogtreecommitdiffstats
path: root/net/sctp
Commit message (Collapse)AuthorAgeFilesLines
* net: sctp: sctp_endpoint_free: zero out secret key dataDaniel Borkmann2013-02-141-0/+5
| | | | | | | | | | | | | [ Upstream commit b5c37fe6e24eec194bb29d22fdd55d73bcc709bf ] On sctp_endpoint_destroy, previously used sensitive keying material should be zeroed out before the memory is returned, as we already do with e.g. auth keys when released. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* net: sctp: sctp_setsockopt_auth_key: use kzfree instead of kfreeDaniel Borkmann2013-02-141-1/+1
| | | | | | | | | | | | | | | [ Upstream commit 6ba542a291a5e558603ac51cda9bded347ce7627 ] In sctp_setsockopt_auth_key, we create a temporary copy of the user passed shared auth key for the endpoint or association and after internal setup, we free it right away. Since it's sensitive data, we should zero out the key before returning the memory back to the allocator. Thus, use kzfree instead of kfree, just as we do in sctp_auth_key_put(). Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* sctp: refactor sctp_outq_teardown to insure proper re-initalizationNeil Horman2013-02-141-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 2f94aabd9f6c925d77aecb3ff020f1cc12ed8f86 ] Jamie Parsons reported a problem recently, in which the re-initalization of an association (The duplicate init case), resulted in a loss of receive window space. He tracked down the root cause to sctp_outq_teardown, which discarded all the data on an outq during a re-initalization of the corresponding association, but never reset the outq->outstanding_data field to zero. I wrote, and he tested this fix, which does a proper full re-initalization of the outq, fixing this problem, and hopefully future proofing us from simmilar issues down the road. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reported-by: Jamie Parsons <Jamie.Parsons@metaswitch.com> Tested-by: Jamie Parsons <Jamie.Parsons@metaswitch.com> CC: Jamie Parsons <Jamie.Parsons@metaswitch.com> CC: Vlad Yasevich <vyasevich@gmail.com> CC: "David S. Miller" <davem@davemloft.net> CC: netdev@vger.kernel.org Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* sctp: fix -ENOMEM result with invalid user space pointer in sendto() syscallTommi Rantala2013-01-112-6/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 6e51fe7572590d8d86e93b547fab6693d305fd0d ] Consider the following program, that sets the second argument to the sendto() syscall incorrectly: #include <string.h> #include <arpa/inet.h> #include <sys/socket.h> int main(void) { int fd; struct sockaddr_in sa; fd = socket(AF_INET, SOCK_STREAM, 132 /*IPPROTO_SCTP*/); if (fd < 0) return 1; memset(&sa, 0, sizeof(sa)); sa.sin_family = AF_INET; sa.sin_addr.s_addr = inet_addr("127.0.0.1"); sa.sin_port = htons(11111); sendto(fd, NULL, 1, 0, (struct sockaddr *)&sa, sizeof(sa)); return 0; } We get -ENOMEM: $ strace -e sendto ./demo sendto(3, NULL, 1, 0, {sa_family=AF_INET, sin_port=htons(11111), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 ENOMEM (Cannot allocate memory) Propagate the error code from sctp_user_addto_chunk(), so that we will tell user space what actually went wrong: $ strace -e sendto ./demo sendto(3, NULL, 1, 0, {sa_family=AF_INET, sin_port=htons(11111), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EFAULT (Bad address) Noticed while running Trinity (the syscall fuzzer). Signed-off-by: Tommi Rantala <tt.rantala@gmail.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* sctp: fix memory leak in sctp_datamsg_from_user() when copy from user space ↵Tommi Rantala2013-01-111-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | fails [ Upstream commit be364c8c0f17a3dd42707b5a090b318028538eb9 ] Trinity (the syscall fuzzer) discovered a memory leak in SCTP, reproducible e.g. with the sendto() syscall by passing invalid user space pointer in the second argument: #include <string.h> #include <arpa/inet.h> #include <sys/socket.h> int main(void) { int fd; struct sockaddr_in sa; fd = socket(AF_INET, SOCK_STREAM, 132 /*IPPROTO_SCTP*/); if (fd < 0) return 1; memset(&sa, 0, sizeof(sa)); sa.sin_family = AF_INET; sa.sin_addr.s_addr = inet_addr("127.0.0.1"); sa.sin_port = htons(11111); sendto(fd, NULL, 1, 0, (struct sockaddr *)&sa, sizeof(sa)); return 0; } As far as I can tell, the leak has been around since ~2003. Signed-off-by: Tommi Rantala <tt.rantala@gmail.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* sctp: fix call to SCTP_CMD_PROCESS_SACK in sctp_cmd_interpreter()Zijie Pan2012-11-171-1/+2
| | | | | | | | | | | | | [ Upstream commit f6e80abeab928b7c47cc1fbf53df13b4398a2bec ] Bug introduced by commit edfee0339e681a784ebacec7e8c2dc97dc6d2839 (sctp: check src addr when processing SACK to update transport state) Signed-off-by: Zijie Pan <zijie.pan@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* sctp: Don't charge for data in sndbuf again when transmitting packetThomas Graf2012-10-131-1/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 4c3a5bdae293f75cdf729c6c00124e8489af2276 ] SCTP charges wmem_alloc via sctp_set_owner_w() in sctp_sendmsg() and via skb_set_owner_w() in sctp_packet_transmit(). If a sender runs out of sndbuf it will sleep in sctp_wait_for_sndbuf() and expects to be waken up by __sctp_write_space(). Buffer space charged via sctp_set_owner_w() is released in sctp_wfree() which calls __sctp_write_space() directly. Buffer space charged via skb_set_owner_w() is released via sock_wfree() which calls sk->sk_write_space() _if_ SOCK_USE_WRITE_QUEUE is not set. sctp_endpoint_init() sets SOCK_USE_WRITE_QUEUE on all sockets. Therefore if sctp_packet_transmit() manages to queue up more than sndbuf bytes, sctp_wait_for_sndbuf() will never be woken up again unless it is interrupted by a signal. This could be fixed by clearing the SOCK_USE_WRITE_QUEUE flag but ... Charging for the data twice does not make sense in the first place, it leads to overcharging sndbuf by a factor 2. Therefore this patch only charges a single byte in wmem_alloc when transmitting an SCTP packet to ensure that the socket stays alive until the packet has been released. This means that control chunks are no longer accounted for in wmem_alloc which I believe is not a problem as skb->truesize will typically lead to overcharging anyway and thus compensates for any control overhead. Signed-off-by: Thomas Graf <tgraf@suug.ch> CC: Vlad Yasevich <vyasevic@redhat.com> CC: Neil Horman <nhorman@tuxdriver.com> CC: David Miller <davem@davemloft.net> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* sctp: Fix list corruption resulting from freeing an association on a listNeil Horman2012-08-092-7/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 2eebc1e188e9e45886ee00662519849339884d6d ] A few days ago Dave Jones reported this oops: [22766.294255] general protection fault: 0000 [#1] PREEMPT SMP [22766.295376] CPU 0 [22766.295384] Modules linked in: [22766.387137] ffffffffa169f292 6b6b6b6b6b6b6b6b ffff880147c03a90 ffff880147c03a74 [22766.387135] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 00000000000 [22766.387136] Process trinity-watchdo (pid: 10896, threadinfo ffff88013e7d2000, [22766.387137] Stack: [22766.387140] ffff880147c03a10 [22766.387140] ffffffffa169f2b6 [22766.387140] ffff88013ed95728 [22766.387143] 0000000000000002 [22766.387143] 0000000000000000 [22766.387143] ffff880003fad062 [22766.387144] ffff88013c120000 [22766.387144] [22766.387145] Call Trace: [22766.387145] <IRQ> [22766.387150] [<ffffffffa169f292>] ? __sctp_lookup_association+0x62/0xd0 [sctp] [22766.387154] [<ffffffffa169f2b6>] __sctp_lookup_association+0x86/0xd0 [sctp] [22766.387157] [<ffffffffa169f597>] sctp_rcv+0x207/0xbb0 [sctp] [22766.387161] [<ffffffff810d4da8>] ? trace_hardirqs_off_caller+0x28/0xd0 [22766.387163] [<ffffffff815827e3>] ? nf_hook_slow+0x133/0x210 [22766.387166] [<ffffffff815902fc>] ? ip_local_deliver_finish+0x4c/0x4c0 [22766.387168] [<ffffffff8159043d>] ip_local_deliver_finish+0x18d/0x4c0 [22766.387169] [<ffffffff815902fc>] ? ip_local_deliver_finish+0x4c/0x4c0 [22766.387171] [<ffffffff81590a07>] ip_local_deliver+0x47/0x80 [22766.387172] [<ffffffff8158fd80>] ip_rcv_finish+0x150/0x680 [22766.387174] [<ffffffff81590c54>] ip_rcv+0x214/0x320 [22766.387176] [<ffffffff81558c07>] __netif_receive_skb+0x7b7/0x910 [22766.387178] [<ffffffff8155856c>] ? __netif_receive_skb+0x11c/0x910 [22766.387180] [<ffffffff810d423e>] ? put_lock_stats.isra.25+0xe/0x40 [22766.387182] [<ffffffff81558f83>] netif_receive_skb+0x23/0x1f0 [22766.387183] [<ffffffff815596a9>] ? dev_gro_receive+0x139/0x440 [22766.387185] [<ffffffff81559280>] napi_skb_finish+0x70/0xa0 [22766.387187] [<ffffffff81559cb5>] napi_gro_receive+0xf5/0x130 [22766.387218] [<ffffffffa01c4679>] e1000_receive_skb+0x59/0x70 [e1000e] [22766.387242] [<ffffffffa01c5aab>] e1000_clean_rx_irq+0x28b/0x460 [e1000e] [22766.387266] [<ffffffffa01c9c18>] e1000e_poll+0x78/0x430 [e1000e] [22766.387268] [<ffffffff81559fea>] net_rx_action+0x1aa/0x3d0 [22766.387270] [<ffffffff810a495f>] ? account_system_vtime+0x10f/0x130 [22766.387273] [<ffffffff810734d0>] __do_softirq+0xe0/0x420 [22766.387275] [<ffffffff8169826c>] call_softirq+0x1c/0x30 [22766.387278] [<ffffffff8101db15>] do_softirq+0xd5/0x110 [22766.387279] [<ffffffff81073bc5>] irq_exit+0xd5/0xe0 [22766.387281] [<ffffffff81698b03>] do_IRQ+0x63/0xd0 [22766.387283] [<ffffffff8168ee2f>] common_interrupt+0x6f/0x6f [22766.387283] <EOI> [22766.387284] [22766.387285] [<ffffffff8168eed9>] ? retint_swapgs+0x13/0x1b [22766.387285] Code: c0 90 5d c3 66 0f 1f 44 00 00 4c 89 c8 5d c3 0f 1f 00 55 48 89 e5 48 83 ec 20 48 89 5d e8 4c 89 65 f0 4c 89 6d f8 66 66 66 66 90 <0f> b7 87 98 00 00 00 48 89 fb 49 89 f5 66 c1 c0 08 66 39 46 02 [22766.387307] [22766.387307] RIP [22766.387311] [<ffffffffa168a2c9>] sctp_assoc_is_match+0x19/0x90 [sctp] [22766.387311] RSP <ffff880147c039b0> [22766.387142] ffffffffa16ab120 [22766.599537] ---[ end trace 3f6dae82e37b17f5 ]--- [22766.601221] Kernel panic - not syncing: Fatal exception in interrupt It appears from his analysis and some staring at the code that this is likely occuring because an association is getting freed while still on the sctp_assoc_hashtable. As a result, we get a gpf when traversing the hashtable while a freed node corrupts part of the list. Nominally I would think that an mibalanced refcount was responsible for this, but I can't seem to find any obvious imbalance. What I did note however was that the two places where we create an association using sctp_primitive_ASSOCIATE (__sctp_connect and sctp_sendmsg), have failure paths which free a newly created association after calling sctp_primitive_ASSOCIATE. sctp_primitive_ASSOCIATE brings us into the sctp_sf_do_prm_asoc path, which issues a SCTP_CMD_NEW_ASOC side effect, which in turn adds a new association to the aforementioned hash table. the sctp command interpreter that process side effects has not way to unwind previously processed commands, so freeing the association from the __sctp_connect or sctp_sendmsg error path would lead to a freed association remaining on this hash table. I've fixed this but modifying sctp_[un]hash_established to use hlist_del_init, which allows us to proerly use hlist_unhashed to check if the node is on a hashlist safely during a delete. That in turn alows us to safely call sctp_unhash_established in the __sctp_connect and sctp_sendmsg error paths before freeing them, regardles of what the associations state is on the hash list. I noted, while I was doing this, that the __sctp_unhash_endpoint was using hlist_unhsashed in a simmilar fashion, but never nullified any removed nodes pointers to make that function work properly, so I fixed that up in a simmilar fashion. I attempted to test this using a virtual guest running the SCTP_RR test from netperf in a loop while running the trinity fuzzer, both in a loop. I wasn't able to recreate the problem prior to this fix, nor was I able to trigger the failure after (neither of which I suppose is suprising). Given the trace above however, I think its likely that this is what we hit. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reported-by: davej@redhat.com CC: davej@redhat.com CC: "David S. Miller" <davem@davemloft.net> CC: Vlad Yasevich <vyasevich@gmail.com> CC: Sridhar Samudrala <sri@us.ibm.com> CC: linux-sctp@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* sctp: check cached dst before using itNicolas Dichtel2012-06-102-20/+1
| | | | | | | | | | | | [ Upstream commit e0268868ba064980488fc8c194db3d8e9fb2959c ] dst_check() will take care of SA (and obsolete field), hence IPsec rekeying scenario is taken into account. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Vlad Yaseivch <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* sctp: Allow struct sctp_event_subscribe to grow without breaking binariesThomas Graf2012-04-271-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit acdd5985364f8dc511a0762fab2e683f29d9d692 ] getsockopt(..., SCTP_EVENTS, ...) performs a length check and returns an error if the user provides less bytes than the size of struct sctp_event_subscribe. Struct sctp_event_subscribe needs to be extended by an u8 for every new event or notification type that is added. This obviously makes getsockopt fail for binaries that are compiled against an older versions of <net/sctp/user.h> which do not contain all event types. This patch changes getsockopt behaviour to no longer return an error if not enough bytes are being provided by the user. Instead, it returns as much of sctp_event_subscribe as fits into the provided buffer. This leads to the new behavior that users see what they have been aware of at compile time. The setsockopt(..., SCTP_EVENTS, ...) API is already behaving like this. Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* sctp: Do not account for sizeof(struct sk_buff) in estimated rwndThomas Graf2012-01-062-11/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit a76c0adf60f6ca5ff3481992e4ea0383776b24d2 ] When checking whether a DATA chunk fits into the estimated rwnd a full sizeof(struct sk_buff) is added to the needed chunk size. This quickly exhausts the available rwnd space and leads to packets being sent which are much below the PMTU limit. This can lead to much worse performance. The reason for this behaviour was to avoid putting too much memory pressure on the receiver. The concept is not completely irational because a Linux receiver does in fact clone an skb for each DATA chunk delivered. However, Linux also reserves half the available socket buffer space for data structures therefore usage of it is already accounted for. When proposing to change this the last time it was noted that this behaviour was introduced to solve a performance issue caused by rwnd overusage in combination with small DATA chunks. Trying to reproduce this I found that with the sk_buff overhead removed, the performance would improve significantly unless socket buffer limits are increased. The following numbers have been gathered using a patched iperf supporting SCTP over a live 1 Gbit ethernet network. The -l option was used to limit DATA chunk sizes. The numbers listed are based on the average of 3 test runs each. Default values have been used for sk_(r|w)mem. Chunk Size Unpatched No Overhead ------------------------------------- 4 15.2 Kbit [!] 12.2 Mbit [!] 8 35.8 Kbit [!] 26.0 Mbit [!] 16 95.5 Kbit [!] 54.4 Mbit [!] 32 106.7 Mbit 102.3 Mbit 64 189.2 Mbit 188.3 Mbit 128 331.2 Mbit 334.8 Mbit 256 537.7 Mbit 536.0 Mbit 512 766.9 Mbit 766.6 Mbit 1024 810.1 Mbit 808.6 Mbit Signed-off-by: Thomas Graf <tgraf@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* sctp: fix incorrect overflow check on autocloseXi Wang2012-01-064-3/+17
| | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 2692ba61a82203404abd7dd2a027bda962861f74 ] Commit 8ffd3208 voids the previous patches f6778aab and 810c0719 for limiting the autoclose value. If userspace passes in -1 on 32-bit platform, the overflow check didn't work and autoclose would be set to 0xffffffff. This patch defines a max_autoclose (in seconds) for limiting the value and exposes it through sysctl, with the following intentions. 1) Avoid overflowing autoclose * HZ. 2) Keep the default autoclose bound consistent across 32- and 64-bit platforms (INT_MAX / HZ in this patch). 3) Keep the autoclose value consistent between setsockopt() and getsockopt() calls. Suggested-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* net: sctp: fix checksum marking for outgoing packetsMichał Mirosław2011-07-141-11/+8
| | | | | | | | | | | | | | Packets to devices without NETIF_F_SCTP_CSUM (including NETIF_F_NO_CSUM) should be properly checksummed because the packets can be diverted or rerouted after construction. This still leaves packets diverted from NETIF_F_SCTP_CSUM-enabled devices with broken checksums. Fixing this needs implementing software offload fallback in networking core. For users of sctp_checksum_disable, skb->ip_summed should be left as CHECKSUM_NONE and not CHECKSUM_UNNECESSARY as per include/linux/skbuff.h. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: ABORT if receive, reassmbly, or reodering queue is not empty while ↵Thomas Graf2011-07-082-8/+21
| | | | | | | | | | | | | | | | | | | | | closing socket Trigger user ABORT if application closes a socket which has data queued on the socket receive queue or chunks waiting on the reassembly or ordering queue as this would imply data being lost which defeats the point of a graceful shutdown. This behavior is already practiced in TCP. We do not check the input queue because that would mean to parse all chunks on it to look for unacknowledged data which seems too much of an effort. Control chunks or duplicated chunks may also be in the input queue and should not be stopping a graceful shutdown. Signed-off-by: Thomas Graf <tgraf@infradead.org> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: Enforce retransmission limit during shutdownThomas Graf2011-07-074-13/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When initiating a graceful shutdown while having data chunks on the retransmission queue with a peer which is in zero window mode the shutdown is never completed because the retransmission error count is reset periodically by the following two rules: - Do not timeout association while doing zero window probe. - Reset overall error count when a heartbeat request has been acknowledged. The graceful shutdown will wait for all outstanding TSN to be acknowledged before sending the SHUTDOWN request. This never happens due to the peer's zero window not acknowledging the continuously retransmitted data chunks. Although the error counter is incremented for each failed retransmission, the receiving of the SACK announcing the zero window clears the error count again immediately. Also heartbeat requests continue to be sent periodically. The peer acknowledges these requests causing the error counter to be reset as well. This patch changes behaviour to only reset the overall error counter for the above rules while not in shutdown. After reaching the maximum number of retransmission attempts, the T5 shutdown guard timer is scheduled to give the receiver some additional time to recover. The timer is stopped as soon as the receiver acknowledges any data. The issue can be easily reproduced by establishing a sctp association over the loopback device, constantly queueing data at the sender while not reading any at the receiver. Wait for the window to reach zero, then initiate a shutdown by killing both processes simultaneously. The association will never be freed and the chunks on the retransmission queue will be retransmitted indefinitely. Signed-off-by: Thomas Graf <tgraf@infradead.org> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: fix missing send up SCTP_SENDER_DRY_EVENT when subscribe itWei Yongjun2011-07-071-0/+23
| | | | | | | | | | | | | | We forgot to send up SCTP_SENDER_DRY_EVENT notification when user app subscribes to this event, and there is no data to be sent or retransmit. This is required by the Socket API and used by the DTLS/SCTP implementation. Reported-by: Michael Tüxen <Michael.Tuexen@lurchi.franken.de> Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Tested-by: Robin Seggelmann <seggelmann@fh-muenster.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: refine {udp|tcp|sctp}_mem limitsEric Dumazet2011-07-071-10/+1
| | | | | | | | | | | | | | | | | Current tcp/udp/sctp global memory limits are not taking into account hugepages allocations, and allow 50% of ram to be used by buffers of a single protocol [ not counting space used by sockets / inodes ...] Lets use nr_free_buffer_pages() and allow a default of 1/8 of kernel ram per protocol, and a minimum of 128 pages. Heavy duty machines sysadmins probably need to tweak limits anyway. References: https://bugzilla.stlinux.com/show_bug.cgi?id=38032 Reported-by: starlight <starlight@binnacle.cx> Suggested-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: stop pending timers and purge queues when peer restart asocWei Yongjun2011-05-313-11/+29
| | | | | | | | | If the peer restart the asoc, we should not only fail any unsent/unacked data, but also stop the T3-rtx, SACK, T4-rto timers, and teardown ASCONF queues. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: fix memory leak of the ASCONF queue when free asocWei Yongjun2011-05-251-0/+16
| | | | | | | | | | If an ASCONF chunk is outstanding, then the following ASCONF chunk will be queued for later transmission. But when we free the asoc, we forget to free the ASCONF queue at the same time, this will cause memory leak. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: convert %p usage to %pKDan Rosenberg2011-05-241-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The %pK format specifier is designed to hide exposed kernel pointers, specifically via /proc interfaces. Exposing these pointers provides an easy target for kernel write vulnerabilities, since they reveal the locations of writable structures containing easily triggerable function pointers. The behavior of %pK depends on the kptr_restrict sysctl. If kptr_restrict is set to 0, no deviation from the standard %p behavior occurs. If kptr_restrict is set to 1, the default, if the current user (intended to be a reader via seq_printf(), etc.) does not have CAP_SYSLOG (currently in the LSM tree), kernel pointers using %pK are printed as 0's. If kptr_restrict is set to 2, kernel pointers using %pK are printed as 0's regardless of privileges. Replacing with 0's was chosen over the default "(null)", which cannot be parsed by userland %p, which expects "(nil)". The supporting code for kptr_restrict and %pK are currently in the -mm tree. This patch converts users of %p in net/ to %pK. Cases of printing pointers to the syslog are not covered, since this would eliminate useful information for postmortem debugging and the reading of the syslog is already optionally protected by the dmesg_restrict sysctl. Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com> Cc: James Morris <jmorris@namei.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Thomas Graf <tgraf@infradead.org> Cc: Eugene Teo <eugeneteo@kernel.org> Cc: Kees Cook <kees.cook@canonical.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: David S. Miller <davem@davemloft.net> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Eric Paris <eparis@parisplace.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: Fix build failure.David S. Miller2011-05-211-1/+1
| | | | | | | | | | | | Commit c182f90bc1f22ce5039b8722e45621d5f96862c2 ("SCTP: fix race between sctp_bind_addr_free() and sctp_bind_addr_conflict()") and commit 1231f0baa547a541a7481119323b7f964dda4788 ("net,rcu: convert call_rcu(sctp_local_addr_free) to kfree_rcu()"), happening in different trees, introduced a build failure. Simply make the SCTP race fix use kfree_rcu() too. Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6Linus Torvalds2011-05-2014-319/+396
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1446 commits) macvlan: fix panic if lowerdev in a bond tg3: Add braces around 5906 workaround. tg3: Fix NETIF_F_LOOPBACK error macvlan: remove one synchronize_rcu() call networking: NET_CLS_ROUTE4 depends on INET irda: Fix error propagation in ircomm_lmp_connect_response() irda: Kill set but unused variable 'bytes' in irlan_check_command_param() irda: Kill set but unused variable 'clen' in ircomm_connect_indication() rxrpc: Fix set but unused variable 'usage' in rxrpc_get_transport() be2net: Kill set but unused variable 'req' in lancer_fw_download() irda: Kill set but unused vars 'saddr' and 'daddr' in irlan_provider_connect_indication() atl1c: atl1c_resume() is only used when CONFIG_PM_SLEEP is defined. rxrpc: Fix set but unused variable 'usage' in rxrpc_get_peer(). rxrpc: Kill set but unused variable 'local' in rxrpc_UDP_error_handler() rxrpc: Kill set but unused variable 'sp' in rxrpc_process_connection() rxrpc: Kill set but unused variable 'sp' in rxrpc_rotate_tx_window() pkt_sched: Kill set but unused variable 'protocol' in tc_classify() isdn: capi: Use pr_debug() instead of ifdefs. tg3: Update version to 3.119 tg3: Apply rx_discards fix to 5719/5720 ... Fix up trivial conflicts in arch/x86/Kconfig and net/mac80211/agg-tx.c as per Davem.
| * SCTP: fix race between sctp_bind_addr_free() and sctp_bind_addr_conflict()Jacek Luczak2011-05-191-6/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During the sctp_close() call, we do not use rcu primitives to destroy the address list attached to the endpoint. At the same time, we do the removal of addresses from this list before attempting to remove the socket from the port hash As a result, it is possible for another process to find the socket in the port hash that is in the process of being closed. It then proceeds to traverse the address list to find the conflict, only to have that address list suddenly disappear without rcu() critical section. Fix issue by closing address list removal inside RCU critical section. Race can result in a kernel crash with general protection fault or kernel NULL pointer dereference: kernel: general protection fault: 0000 [#1] SMP kernel: RIP: 0010:[<ffffffffa02f3dde>] [<ffffffffa02f3dde>] sctp_bind_addr_conflict+0x64/0x82 [sctp] kernel: Call Trace: kernel: [<ffffffffa02f415f>] ? sctp_get_port_local+0x17b/0x2a3 [sctp] kernel: [<ffffffffa02f3d45>] ? sctp_bind_addr_match+0x33/0x68 [sctp] kernel: [<ffffffffa02f4416>] ? sctp_do_bind+0xd3/0x141 [sctp] kernel: [<ffffffffa02f5030>] ? sctp_bindx_add+0x4d/0x8e [sctp] kernel: [<ffffffffa02f5183>] ? sctp_setsockopt_bindx+0x112/0x4a4 [sctp] kernel: [<ffffffff81089e82>] ? generic_file_aio_write+0x7f/0x9b kernel: [<ffffffffa02f763e>] ? sctp_setsockopt+0x14f/0xfee [sctp] kernel: [<ffffffff810c11fb>] ? do_sync_write+0xab/0xeb kernel: [<ffffffff810e82ab>] ? fsnotify+0x239/0x282 kernel: [<ffffffff810c2462>] ? alloc_file+0x18/0xb1 kernel: [<ffffffff8134a0b1>] ? compat_sys_setsockopt+0x1a5/0x1d9 kernel: [<ffffffff8134aaf1>] ? compat_sys_socketcall+0x143/0x1a4 kernel: [<ffffffff810467dc>] ? sysenter_dispatch+0x7/0x32 Signed-off-by: Jacek Luczak <luczak.jacek@gmail.com> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> CC: Eric Dumazet <eric.dumazet@gmail.com> Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * sctp: sctp_sendmsg: Don't test known non-null sinfoJoe Perches2011-05-121-6/+4
| | | | | | | | | | | | | | It's already known non-null above. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * sctp: sctp_sendmsg: Don't initialize default_sinfoJoe Perches2011-05-121-1/+2
| | | | | | | | | | | | | | | | | | | | | | This variable only needs initialization when cmsgs.info is NULL. Use memset to ensure padding is also zeroed so kernel doesn't leak any data. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * sctp: Remove rt->rt_src usage in sctp_v4_get_saddr()David S. Miller2011-05-101-1/+1
| | | | | | | | | | | | Flow key is available, so fetch it from there. Signed-off-by: David S. Miller <davem@davemloft.net>
| * sctp: Fix debug message args.David S. Miller2011-05-081-2/+2
| | | | | | | | | | | | | | | | I messed things up when I converted over to the transport flow, I passed the ipv4 address value instead of it's address. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
| * sctp: Don't use rt->rt_{src,dst} in sctp_v4_xmit()David S. Miller2011-05-081-2/+2
| | | | | | | | | | | | Now we can pick it out of the transport's flow key. Signed-off-by: David S. Miller <davem@davemloft.net>
| * inet: Pass flowi to ->queue_xmit().David S. Miller2011-05-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | This allows us to acquire the exact route keying information from the protocol, however that might be managed. It handles all of the possibilities, from the simplest case of storing the key in inet->cork.fl to the more complex setup SCTP has where individual transports determine the flow. Signed-off-by: David S. Miller <davem@davemloft.net>
| * sctp: Store a flowi in transports to provide persistent keying.David S. Miller2011-05-081-6/+3
| | | | | | | | | | | | | | | | | | Several future simplifications are possible now because of this. For example, the sctp_addr unions can simply refer directly to the flowi information. Signed-off-by: David S. Miller <davem@davemloft.net>
| * sctp: Use flowi4's {saddr,daddr} in sctp_v4_dst_saddr() and sctp_v4_get_dst()David S. Miller2011-05-031-5/+4
| | | | | | | | | | | | Instead of rt->rt_{src,dst} Signed-off-by: David S. Miller <davem@davemloft.net>
| * sctp: clean up route lookup callsVlad Yasevich2011-04-273-29/+23
| | | | | | | | | | | | | | | | | | | | | | | | Change the call to take the transport parameter and set the cached 'dst' appropriately inside the get_dst() function calls. This will allow us in the future to clean up source address storage as well. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * sctp: remove useless arguments from get_saddr() callVlad Yasevich2011-04-273-6/+2
| | | | | | | | | | | | | | | | | | There is no point in passing a destination address to a get_saddr() call. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * sctp: make sctp over IPv6 work with IPsecVlad Yasevich2011-04-271-5/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | SCTP never called xfrm_output after it's v6 route lookups so that never really worked with ipsec. Additioanlly, we never passed port nubmers and protocol in the flowi, so any port based policies were never applied as well. Now that we can fixed ipv6 routing lookup code, using ip6_dst_lookup_flow() and pass port numbers. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * sctp: cache the ipv6 source after route lookupVlad Yasevich2011-04-274-119/+106
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ipv6 routing lookup does give us a source address, but instead of filling it into the dst, it's stored in the flowi. We can use that instead of going through the entire source address selection again. Also the useless ->dst_saddr member of sctp_pf is removed. And sctp_v6_dst_saddr() is removed, instead by introduce sctp_v6_to_addr(), which can be reused to cleanup some dup code. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * sctp: fix sctp to work with ipv6 source address routingWeixing Shi2011-04-271-1/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the below test case, using the source address routing, sctp can not work. Node-A 1)ifconfig eth0 inet6 add 2001:1::1/64 2)ip -6 rule add from 2001:1::1 table 100 pref 100 3)ip -6 route add 2001:2::1 dev eth0 table 100 4)sctp_darn -H 2001:1::1 -P 250 -l & Node-B 1)ifconfig eth0 inet6 add 2001:2::1/64 2)ip -6 rule add from 2001:2::1 table 100 pref 100 3)ip -6 route add 2001:1::1 dev eth0 table 100 4)sctp_darn -H 2001:2::1 -P 250 -h 2001:1::1 -p 250 -s root cause: Node-A and Node-B use the source address routing, and at begining, source address will be NULL,sctp will search the routing table by the destination address, because using the source address routing table, and the result dst_entry will be NULL. solution: walk through the bind address list to get the source address and then lookup the routing table again to get the correct dst_entry. Signed-off-by: Weixing Shi <Weixing.Shi@windriver.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * inet: constify ip headers and in6_addrEric Dumazet2011-04-222-2/+2
| | | | | | | | | | | | | | | | Add const qualifiers to structs iphdr, ipv6hdr and in6_addr pointers where possible, to make code intention more obvious. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * sctp: implement event notification SCTP_SENDER_DRY_EVENTWei Yongjun2011-04-213-1/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implement event notification SCTP_SENDER_DRY_EVENT. SCTP Socket API Extensions: 6.1.9. SCTP_SENDER_DRY_EVENT When the SCTP stack has no more user data to send or retransmit, this notification is given to the user. Also, at the time when a user app subscribes to this event, if there is no data to be sent or retransmit, the stack will immediately send up this notification. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * sctp: change auth event type name to SCTP_AUTHENTICATION_EVENTWei Yongjun2011-04-211-1/+1
| | | | | | | | | | | | | | | | | | This patch change the auth event type name to SCTP_AUTHENTICATION_EVENT, which is based on API extension compliance. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * sctp: implement socket option SCTP_GET_ASSOC_ID_LISTWei Yongjun2011-04-211-0/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch Implement socket option SCTP_GET_ASSOC_ID_LIST. SCTP Socket API Extension: 8.2.6. Get the Current Identifiers of Associations (SCTP_GET_ASSOC_ID_LIST) This option gets the current list of SCTP association identifiers of the SCTP associations handled by a one-to-many style socket. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * sctp: move chunk from retransmit queue to abandoned listWei Yongjun2011-04-201-0/+7
| | | | | | | | | | | | | | | | | | | | If there is still data waiting to retransmit and remain in retransmit queue, while doing the next retransmit, if the chunk is abandoned, we should move it to abandoned list. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * sctp: make heartbeat information in sctp_make_heartbeat()Wei Yongjun2011-04-202-15/+14
| | | | | | | | | | | | | | | | | | Make heartbeat information in sctp_make_heartbeat() instead of make it in sctp_sf_heartbeat() directly for common using. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * sctp: fix to check the source address of COOKIE-ECHO chunkWei Yongjun2011-04-203-16/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | SCTP does not check whether the source address of COOKIE-ECHO chunk is the original address of INIT chunk or part of the any address parameters saved in COOKIE in CLOSED state. So even if the COOKIE-ECHO chunk is from any address but with correct COOKIE, the COOKIE-ECHO chunk still be accepted. If the COOKIE is not from a valid address, the assoc should not be established. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * sctp: handle ootb packet in chunk order as definedShan Wei2011-04-202-15/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | Changed the order of processing SHUTDOWN ACK and COOKIE ACK refer to section 8.4:Handle "Out of the Blue" Packets. SHUTDOWN ACK chunk should be processed before processing "Stale Cookie" ERROR or a COOKIE ACK. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * sctp: bail from sctp_endpoint_lookup_assoc() if not boundVlad Yasevich2011-04-201-6/+14
| | | | | | | | | | | | | | | | | | | | | | | | The sctp_endpoint_lookup_assoc() function uses a port hash to lookup the association and then checks to see if any of them are on the current endpoint. However, if the current endpoint is not bound, there can't be any associations on it, thus we can bail early. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * sctp: remove completely unsed EMPTY stateVlad Yasevich2011-04-203-78/+0
| | | | | | | | | | | | | | | | | | SCTP does not SCTP_STATE_EMPTY and we can never be in that state. Remove useless code. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * sctp: check invalid value of length parameter in error causeShan Wei2011-04-201-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | RFC4960, section 3.3.7 said: If an endpoint receives an ABORT with a format error or no TCB is found, it MUST silently discard it. When an endpoint receives ABORT that parameter value is invalid, drop it. Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * sctp: check parameter value of length in ERROR chunkShan Wei2011-04-201-0/+5
| | | | | | | | | | | | | | | | | | | | When an endpoint receives ERROR that parameter value is invalid, send an ABORT to peer with a Protocol Violation error code. Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * sctp: Release all routes when processing acks ADD_IP or DEL_IPVlad Yasevich2011-04-191-6/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When processing an ACK for ADD_IP parameter, we only release the routes on non-active transports. This can cause a wrong source address to be used. We can release the routes and cause new route lookups and source address selection so that new addresses can be used as source. Additionally, we don't need to lookup routes for all transports at the same time. We can let the transmit code path update the cached route when the transport actually sends something. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * sctp: Allow bindx_del to accept 0 portVlad Yasevich2011-04-191-1/+5
| | | | | | | | | | | | | | | | | | | | | | We allow 0 port when adding new addresses. It only makes sence to allow 0 port when removing addresses. When removing the currently bound port will be used when the port in the address is set to 0. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>