aboutsummaryrefslogtreecommitdiffstats
path: root/net/netfilter
Commit message (Collapse)AuthorAgeFilesLines
* netfilter: fix module dependency issues with IPv6 defragmentation, ip6tables ↵KOVACS Krisztian2010-10-253-8/+16
| | | | | | | | | | | | | | | | | and xt_TPROXY One of the previous tproxy related patches split IPv6 defragmentation and connection tracking, but did not correctly add Kconfig stanzas to handle the new dependencies correctly. This patch fixes that by making the config options mirror the setup we have for IPv4: a distinct config option for defragmentation that is automatically selected by both connection tracking and xt_TPROXY/xt_socket. The patch also changes the #ifdefs enclosing IPv6 specific code in xt_socket and xt_TPROXY: we only compile these in case we have ip6tables support enabled. Signed-off-by: KOVACS Krisztian <hidden@balabit.hu> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'for-next' of ↵Linus Torvalds2010-10-241-2/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits) Update broken web addresses in arch directory. Update broken web addresses in the kernel. Revert "drivers/usb: Remove unnecessary return's from void functions" for musb gadget Revert "Fix typo: configuation => configuration" partially ida: document IDA_BITMAP_LONGS calculation ext2: fix a typo on comment in ext2/inode.c drivers/scsi: Remove unnecessary casts of private_data drivers/s390: Remove unnecessary casts of private_data net/sunrpc/rpc_pipe.c: Remove unnecessary casts of private_data drivers/infiniband: Remove unnecessary casts of private_data drivers/gpu/drm: Remove unnecessary casts of private_data kernel/pm_qos_params.c: Remove unnecessary casts of private_data fs/ecryptfs: Remove unnecessary casts of private_data fs/seq_file.c: Remove unnecessary casts of private_data arm: uengine.c: remove C99 comments arm: scoop.c: remove C99 comments Fix typo configue => configure in comments Fix typo: configuation => configuration Fix typo interrest[ing|ed] => interest[ing|ed] Fix various typos of valid in comments ... Fix up trivial conflicts in: drivers/char/ipmi/ipmi_si_intf.c drivers/usb/gadget/rndis.c net/irda/irnet/irnet_ppp.c
| * Update broken web addresses in the kernel.Justin P. Mattock2010-10-181-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The patch below updates broken web addresses in the kernel Signed-off-by: Justin P. Mattock <justinmattock@gmail.com> Cc: Maciej W. Rozycki <macro@linux-mips.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Finn Thain <fthain@telegraphics.com.au> Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: Matt Turner <mattst88@gmail.com> Cc: Dimitry Torokhov <dmitry.torokhov@gmail.com> Cc: Mike Frysinger <vapier.adi@gmail.com> Acked-by: Ben Pfaff <blp@cs.stanford.edu> Acked-by: Hans J. Koch <hjk@linutronix.de> Reviewed-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6Linus Torvalds2010-10-2329-1288/+2993
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1699 commits) bnx2/bnx2x: Unsupported Ethtool operations should return -EINVAL. vlan: Calling vlan_hwaccel_do_receive() is always valid. tproxy: use the interface primary IP address as a default value for --on-ip tproxy: added IPv6 support to the socket match cxgb3: function namespace cleanup tproxy: added IPv6 support to the TPROXY target tproxy: added IPv6 socket lookup function to nf_tproxy_core be2net: Changes to use only priority codes allowed by f/w tproxy: allow non-local binds of IPv6 sockets if IP_TRANSPARENT is enabled tproxy: added tproxy sockopt interface in the IPV6 layer tproxy: added udp6_lib_lookup function tproxy: added const specifiers to udp lookup functions tproxy: split off ipv6 defragmentation to a separate module l2tp: small cleanup nf_nat: restrict ICMP translation for embedded header can: mcp251x: fix generation of error frames can: mcp251x: fix endless loop in interrupt handler if CANINTF_MERRF is set can-raw: add msg_flags to distinguish local traffic 9p: client code cleanup rds: make local functions/variables static ... Fix up conflicts in net/core/dev.c, drivers/net/pcmcia/smc91c92_cs.c and drivers/net/wireless/ath/ath9k/debug.c as per David
| * \ Merge branch 'master' of ↵David S. Miller2010-10-2128-1248/+2953
| |\ \ | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6
| | * \ Merge branch 'for-patrick' of ↵Patrick McHardy2010-10-2110-384/+903
| | |\ \ | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/horms/lvs-test-2.6
| | | * | ipvs: provide address family for debuggingJulian Anastasov2010-10-217-84/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As skb->protocol is not valid in LOCAL_OUT add parameter for address family in packet debugging functions. Even if ports are not present in AH and ESP change them to use ip_vs_tcpudp_debug_packet to show at least valid addresses as before. This patch removes the last user of skb->protocol in IPVS. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
| | | * | ipvs: inherit forwarding method in backupJulian Anastasov2010-10-211-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Connections in backup server should inherit the forwarding method from real server. It is a way to fix a problem where the forwarding method in backup connection is damaged by logical OR operation with the real server's connection flags. And the change is needed for setups where the backup server uses different forwarding method for the same real servers. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
| | | * | ipvs: changes for local clientJulian Anastasov2010-10-212-92/+225
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch deals with local client processing. Prefer LOCAL_OUT hook for scheduling connections from local clients. LOCAL_IN is still supported if the packets are not marked as processed in LOCAL_OUT. The idea to process requests in LOCAL_OUT is to alter conntrack reply before it is confirmed at POST_ROUTING. If the local requests are processed in LOCAL_IN the conntrack can not be updated and matching by state is impossible. Add the following handlers: - ip_vs_reply[46] at LOCAL_IN:99 to process replies from remote real servers to local clients. Now when both replies from remote real servers (ip_vs_reply*) and local real servers (ip_vs_local_reply*) are handled it is safe to remove the conn_out_get call from ip_vs_in because it does not support related ICMP packets. - ip_vs_local_request[46] at LOCAL_OUT:-98 to process requests from local client Handling in LOCAL_OUT causes some changes: - as skb->dev, skb->protocol and skb->pkt_type are not defined in LOCAL_OUT make sure we set skb->dev before calling icmpv6_send, prefer skb_dst(skb) for struct net and remove the skb->protocol checks from TUN transmitters. [ horms@verge.net.au: removed trailing whitespace ] Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
| | | * | ipvs: changes for local real serverJulian Anastasov2010-10-213-117/+457
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch deals with local real servers: - Add support for DNAT to local address (different real server port). It needs ip_vs_out hook in LOCAL_OUT for both families because skb->protocol is not set for locally generated packets and can not be used to set 'af'. - Skip packets in ip_vs_in marked with skb->ipvs_property because ip_vs_out processing can be executed in LOCAL_OUT but we still have the conn_out_get check in ip_vs_in. - Ignore packets with inet->nodefrag from local stack - Require skb_dst(skb) != NULL because we use it to get struct net - Add support for changing the route to local IPv4 stack after DNAT depending on the source address type. Local client sets output route and the remote client sets input route. It looks like IPv6 does not need such rerouting because the replies use addresses from initial incoming header, not from skb route. - All transmitters now have strict checks for the destination address type: redirect from non-local address to local real server requires NAT method, local address can not be used as source address when talking to remote real server. - Now LOCALNODE is not set explicitly as forwarding method in real server to allow the connections to provide correct forwarding method to the backup server. Not sure if this breaks tools that expect to see 'Local' real server type. If needed, this can be supported with new flag IP_VS_DEST_F_LOCAL. Now it should be possible connections in backup that lost their fwmark information during sync to be forwarded properly to their daddr, even if it is local address in the backup server. By this way backup could be used as real server for DR or TUN, for NAT there are some restrictions because tuple collisions in conntracks can create problems for the traffic. - Call ip_vs_dst_reset when destination is updated in case some real server IP type is changed between local and remote. [ horms@verge.net.au: removed trailing whitespace ] Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
| | | * | ipvs: move ip_route_me_harder for ICMPJulian Anastasov2010-10-211-20/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, ip_route_me_harder after ip_vs_out_icmp is called even if packet is not related to IPVS connection. Move it into handle_response_icmp. Also, force rerouting if sending to local client because IPv4 stack uses addresses from the route. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
| | | * | ipvs: create ip_vs_defrag_userJulian Anastasov2010-10-211-21/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Create new function ip_vs_defrag_user to return correct IP_DEFRAG_xxx user depending on the hooknum. It will be needed when we add handlers in LOCAL_OUT. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
| | | * | ipvs: fix CHECKSUM_PARTIAL for TUN methodJulian Anastasov2010-10-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The recent change in IP_VS_XMIT_TUNNEL to set CHECKSUM_NONE is not correct. After adding IPIP header skb->csum becomes invalid but the CHECKSUM_PARTIAL case must be supported. So, use skb_forward_csum() which is most suitable for us to allow local clients to send IPIP to remote real server. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
| | | * | ipvs: stop ICMP from FORWARD to localJulian Anastasov2010-10-211-2/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Delivering locally ICMP from FORWARD hook is not supported. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
| | | * | ipvs: do not schedule conns from real serversJulian Anastasov2010-10-214-8/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is needed to avoid scheduling of packets from local real server when we add ip_vs_in in LOCAL_OUT hook to support local client. Currently, when ip_vs_in can not find existing connection it tries to create new one by calling ip_vs_schedule. The default indication from ip_vs_schedule was if connection was scheduled to real server. If real server is not available we try to use the bypass forwarding method or to send ICMP error. But in some cases we do not want to use the bypass feature. So, add flag 'ignored' to indicate if the scheduler ignores this packet. Make sure we do not create new connections from replies. We can hit this problem for persistent services and local real server when ip_vs_in is added to LOCAL_OUT hook to handle local clients. Also, make sure ip_vs_schedule ignores SYN packets for Active FTP DATA from local real server. The FTP DATA connection should be created on SYN+ACK from client to assign correct connection daddr. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
| | | * | ipvs: switch to notrack modeJulian Anastasov2010-10-212-37/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change skb->ipvs_property semantic. This is preparation to support ip_vs_out processing in LOCAL_OUT. ipvs_property=1 will be used to avoid expensive lookups for traffic sent by transmitters. Now when conntrack support is not used we call ip_vs_notrack method to avoid problems in OUTPUT and POST_ROUTING hooks instead of exiting POST_ROUTING as before. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
| | | * | ipvs: optimize checksums for appsJulian Anastasov2010-10-213-13/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Avoid full checksum calculation for apps that can provide info whether csum was broken after payload mangling. For now only ip_vs_ftp mangles payload and it updates the csum, so the full recalculation is avoided for all packets. Add CHECKSUM_UNNECESSARY for snat_handler (TCP and UDP). It is needed to support SNAT from local address for the case when csum is fully recalculated. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
| | | * | ipvs: fix CHECKSUM_PARTIAL for TCP, UDPJulian Anastasov2010-10-212-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix CHECKSUM_PARTIAL handling. Tested for IPv4 TCP, UDP not tested because it needs network card with HW CSUM support. May be fixes problem where IPVS can not be used in virtual boxes. Problem appears with DNAT to local address when the local stack sends reply in CHECKSUM_PARTIAL mode. Fix tcp_dnat_handler and udp_dnat_handler to provide vaddr and daddr in right order (old and new IP) when calling tcp_partial_csum_update/udp_partial_csum_update (CHECKSUM_PARTIAL). Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
| | * | | tproxy: use the interface primary IP address as a default value for --on-ipBalazs Scheidler2010-10-211-70/+132
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The REDIRECT target and the older TProxy versions used the primary address of the incoming interface as the default value of the --on-ip parameter. This was unintentionally changed during the initial TProxy submission and caused confusion among users. Since IPv6 has no notion of primary address, we just select the first address on the list: this way the socket lookup finds wildcard bound sockets properly and we cannot really do better without the user telling us the IPv6 address of the proxy. This is implemented for both IPv4 and IPv6. Signed-off-by: Balazs Scheidler <bazsi@balabit.hu> Signed-off-by: KOVACS Krisztian <hidden@balabit.hu> Signed-off-by: Patrick McHardy <kaber@trash.net>
| | * | | tproxy: added IPv6 support to the socket matchBalazs Scheidler2010-10-211-11/+154
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ICMP extraction bits were contributed by Harry Mason. Signed-off-by: Balazs Scheidler <bazsi@balabit.hu> Signed-off-by: KOVACS Krisztian <hidden@balabit.hu> Signed-off-by: Patrick McHardy <kaber@trash.net>
| | * | | tproxy: added IPv6 support to the TPROXY targetBalazs Scheidler2010-10-211-37/+225
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This requires a new revision as the old target structure was IPv4 specific. Signed-off-by: Balazs Scheidler <bazsi@balabit.hu> Signed-off-by: KOVACS Krisztian <hidden@balabit.hu> Signed-off-by: Patrick McHardy <kaber@trash.net>
| | * | | tproxy: add lookup type checks for UDP in nf_tproxy_get_sock_v4()Balazs Scheidler2010-10-211-48/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Also, inline this function as the lookup_type is always a literal and inlining removes branches performed at runtime. Signed-off-by: Balazs Scheidler <bazsi@balabit.hu> Signed-off-by: KOVACS Krisztian <hidden@balabit.hu> Signed-off-by: Patrick McHardy <kaber@trash.net>
| | * | | tproxy: kick out TIME_WAIT sockets in case a new connection comes in with ↵Balazs Scheidler2010-10-213-14/+85
| | |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the same tuple Without tproxy redirections an incoming SYN kicks out conflicting TIME_WAIT sockets, in order to handle clients that reuse ports within the TIME_WAIT period. The same mechanism didn't work in case TProxy is involved in finding the proper socket, as the time_wait processing code looked up the listening socket assuming that the listener addr/port matches those of the established connection. This is not the case with TProxy as the listener addr/port is possibly changed with the tproxy rule. Signed-off-by: Balazs Scheidler <bazsi@balabit.hu> Signed-off-by: KOVACS Krisztian <hidden@balabit.hu> Signed-off-by: Patrick McHardy <kaber@trash.net>
| | * | Fixed race condition at ip_vs.ko module init.Eduardo Blanco2010-10-191-9/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Lists were initialized after the module was registered. Multiple ipvsadm processes at module load triggered a race condition that resulted in a null pointer dereference in do_ip_vs_get_ctl(). As a result, __ip_vs_mutex was left locked preventing all further ipvsadm commands. Signed-off-by: Eduardo J. Blanco <ejblanco@google.com> Signed-off-by: Simon Horman <horms@verge.net.au>
| | * | ipvs: IPv6 tunnel modeHans Schillstrom2010-10-191-79/+92
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | IPv6 encapsulation uses a bad source address for the tunnel. i.e. VIP will be used as local-addr and encap. dst addr. Decapsulation will not accept this. Example LVS (eth1 2003::2:0:1/96, VIP 2003::2:0:100) (eth0 2003::1:0:1/96) RS (ethX 2003::1:0:5/96) tcpdump 2003::2:0:100 > 2003::1:0:5: IP6 (hlim 63, next-header TCP (6) payload length: 40) 2003::3:0:10.50991 > 2003::2:0:100.http: Flags [S], cksum 0x7312 (correct), seq 3006460279, win 5760, options [mss 1440,sackOK,TS val 1904932 ecr 0,nop,wscale 3], length 0 In Linux IPv6 impl. you can't have a tunnel with an any cast address receiving packets (I have not tried to interpret RFC 2473) To have receive capabilities the tunnel must have: - Local address set as multicast addr or an unicast addr - Remote address set as an unicast addr. - Loop back addres or Link local address are not allowed. This causes us to setup a tunnel in the Real Server with the LVS as the remote address, here you can't use the VIP address since it's used inside the tunnel. Solution Use outgoing interface IPv6 address (match against the destination). i.e. use ip6_route_output() to look up the route cache and then use ipv6_dev_get_saddr(...) to set the source address of the encapsulated packet. Additionally, cache the results in new destination fields: dst_cookie and dst_saddr and properly check the returned dst from ip6_route_output. We now add xfrm_lookup call only for the tunneling method where the source address is a local one. Signed-off-by:Hans Schillstrom <hans.schillstrom@ericsson.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
| | * | netfilter: ctnetlink: add expectation deletion eventsPablo Neira Ayuso2010-10-192-11/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch allows to listen to events that inform about expectations destroyed. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>
| | * | IPVS: ip_vs_dbg_callid() is only needed for debuggingSimon Horman2010-10-131-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ip_vs_dbg_callid() and IP_VS_DEBUG_CALLID() are only needed it CONFIG_IP_VS_DEBUG is defined. This resolves the following build warning when CONFIG_IP_VS_DEBUG is not defined. net/netfilter/ipvs/ip_vs_pe_sip.c:11: warning: 'ip_vs_dbg_callid' defined but not used Reported-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Patrick McHardy <kaber@trash.net>
| | * | netfilter: unregister nf hooks, matches and targets in the reverse orderChangli Gao2010-10-042-12/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since we register nf hooks, matches and targets in order, we'd better unregister them in the reverse order. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
| | * | netfilter: remove duplicated includeNicolas Kaiser2010-10-041-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove duplicated include. Signed-off-by: Nicolas Kaiser <nikai@nikai.net> Signed-off-by: Patrick McHardy <kaber@trash.net>
| | * | IPVS: sip persistence engineSimon Horman2010-10-043-0/+177
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the SIP callid as a key for persistence. This allows multiple connections from the same IP address to be differentiated on the basis of the callid. When used in conjunction with the persistence mask, it allows connections from different IP addresses to be aggregated on the basis of the callid. It is envisaged that a persistence mask of 0.0.0.0 will be a useful setting. That is, ignore the source IP address when checking for persistence. It is envisaged that this option will be used in conjunction with one-packet scheduling. This only works with UDP and cannot be made to work with TCP within the current framework. Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: Julian Anastasov <ja@ssi.bg>
| | * | IPVS: Fallback if persistence engine failsSimon Horman2010-10-042-9/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fall back to normal persistence handling if the persistence engine fails to recognise a packet. This way, at least the packet will go somewhere. It is envisaged that iptables could be used to block packets such if this is not desired although nf_conntrack_sip would likely need to be enhanced first. Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: Julian Anastasov <ja@ssi.bg>
| | * | IPVS: Allow configuration of persistence enginesSimon Horman2010-10-041-3/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow the persistence engine of a virtual service to be set, edited and unset. This feature only works with the netlink user-space interface. Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: Julian Anastasov <ja@ssi.bg>
| | * | IPVS: management of persistence engine modulesSimon Horman2010-10-042-1/+148
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is based heavily on the scheduler management code Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: Julian Anastasov <ja@ssi.bg>
| | * | IPVS: Add persistence engine data to /proc/net/ip_vs_connSimon Horman2010-10-041-5/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This shouldn't break compatibility with userspace as the new data is at the end of the line. I have confirmed that this doesn't break ipvsadm, the main (only?) user-space user of this data. Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: Julian Anastasov <ja@ssi.bg>
| | * | IPVS: Add struct ip_vs_peSimon Horman2010-10-043-20/+100
| | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: Julian Anastasov <ja@ssi.bg>
| | * | IPVS: ip_vs_{un,}bind_scheduler NULL argumentsSimon Horman2010-10-042-22/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In general NULL arguments aren't passed by the few callers that exist, so don't test for them. The exception is to make passing NULL to ip_vs_unbind_scheduler() a noop. Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: Julian Anastasov <ja@ssi.bg>
| | * | IPVS: Allow null argument to ip_vs_scheduler_put()Simon Horman2010-10-042-10/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This simplifies caller logic sightly. Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: Julian Anastasov <ja@ssi.bg>
| | * | IPVS: Add struct ip_vs_conn_paramSimon Horman2010-10-046-189/+181
| | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: Julian Anastasov <ja@ssi.bg>
| | * | IPVS: compact ip_vs_sched_persist()Simon Horman2010-10-041-105/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Compact ip_vs_sched_persist() by setting up parameters and calling functions once. Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: Julian Anastasov <ja@ssi.bg>
| | * | netfilter: nf_conntrack_sip: Add callid parserSimon Horman2010-10-041-0/+39
| | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: Julian Anastasov <ja@ssi.bg>
| | * | netfilter: nf_conntrack_sip: Allow ct_sip_get_header() to be called with a ↵Simon Horman2010-10-041-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | null ct argument Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: Julian Anastasov <ja@ssi.bg>
| | * | netfilter: ctnetlink: add support for user-space expectation helpersPablo Neira Ayuso2010-09-282-31/+77
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the basic infrastructure to support user-space expectation helpers via ctnetlink and the netfilter queuing infrastructure NFQUEUE. Basically, this patch: * adds NF_CT_EXPECT_USERSPACE flag to identify user-space created expectations. I have also added a sanity check in __nf_ct_expect_check() to avoid that kernel-space helpers may create an expectation if the master conntrack has no helper assigned. * adds some branches to check if the master conntrack helper exists, otherwise we skip the code that refers to kernel-space helper such as the local expectation list and the expectation policy. * allows to set the timeout for user-space expectations with no helper assigned. * a list of expectations created from user-space that depends on ctnetlink (if this module is removed, they are deleted). * includes USERSPACE in the /proc output for expectations that have been created by a user-space helper. This patch also modifies ctnetlink to skip including the helper name in the Netlink messages if no kernel-space helper is set (since no user-space expectation has not kernel-space kernel assigned). You can access an example user-space FTP conntrack helper at: http://people.netfilter.org/pablo/userspace-conntrack-helpers/nf-ftp-helper-userspace-POC.tar.bz Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>
| | * | netfilter: ctnetlink: allow to specify the expectation flagsPablo Neira Ayuso2010-09-221-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With this patch, you can specify the expectation flags for user-space created expectations. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>
| | * | netfilter: ctnetlink: missing validation of CTA_EXPECT_ZONE attributePablo Neira Ayuso2010-09-221-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the missing validation of the CTA_EXPECT_ZONE attribute in the ctnetlink code. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>
| | * | ipvs: changes related to service usecntJulian Anastasov2010-09-212-154/+102
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change the usage of svc usecnt during command execution: - we check if svc is registered but we do not need to hold usecnt reference while under __ip_vs_mutex, only the packet handling needs it during scheduling - change __ip_vs_service_get to __ip_vs_service_find and __ip_vs_svc_fwm_get to __ip_vs_svc_fwm_find because now caller will increase svc->usecnt - put common code that calls update_service in __ip_vs_update_dest - put common code in ip_vs_unlink_service() and use it to unregister the service - add comment that svc should not be accessed after ip_vs_del_service anymore - all IP_VS_WAIT_WHILE calls are now unified: usecnt > 0 - Properly log the app ports As result, some problems are fixed: - possible use-after-free of svc in ip_vs_genl_set_cmd after ip_vs_del_service because our usecnt reference does not guarantee that svc is not freed on refcnt==0, eg. when no dests are moved to trash - possible usecnt leak in do_ip_vs_set_ctl after ip_vs_del_service when the service is not freed now, for example, when some destionations are moved into trash and svc->refcnt remains above 0. It is harmless because svc is not in hash anymore. Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Patrick McHardy <kaber@trash.net>
| | * | netfilter: save the hash of the tuple in the original direction for latter useChangli Gao2010-09-211-34/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since we don't change the tuple in the original direction, we can save it in ct->tuplehash[IP_CT_DIR_REPLY].hnode.pprev for __nf_conntrack_confirm() use. __hash_conntrack() is split into two steps: hash_conntrack_raw() is used to get the raw hash, and __hash_bucket() is used to get the bucket id. In SYN-flood case, early_drop() doesn't need to recompute the hash again. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
| | * | ipvs: make rerouting optional with snat_rerouteJulian Anastasov2010-09-212-8/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add new sysctl flag "snat_reroute". Recent kernels use ip_route_me_harder() to route LVS-NAT responses properly by VIP when there are multiple paths to client. But setups that do not have alternative default routes can skip this routing lookup by using snat_reroute=0. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Patrick McHardy <kaber@trash.net>
| | * | ipvs: netfilter connection tracking changesJulian Anastasov2010-09-218-195/+430
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add more code to IPVS to work with Netfilter connection tracking and fix some problems. - Allow IPVS to be compiled without connection tracking as in 2.6.35 and before. This can avoid keeping conntracks for all IPVS connections because this costs memory. ip_vs_ftp still depends on connection tracking and NAT as implemented for 2.6.36. - Add sysctl var "conntrack" to enable connection tracking for all IPVS connections. For loaded IPVS directors it needs tuning of nf_conntrack_max limit. - Add IP_VS_CONN_F_NFCT connection flag to request the connection to use connection tracking. This allows user space to provide this flag, for example, in dest->conn_flags. This can be useful to request connection tracking per real server instead of forcing it for all connections with the "conntrack" sysctl. This flag is set currently only by ip_vs_ftp and of course by "conntrack" sysctl. - Add ip_vs_nfct.c file to hold all connection tracking code, by this way main code should not depend of netfilter conntrack support. - Return back the ip_vs_post_routing handler as in 2.6.35 and use skb->ipvs_property=1 to allow IPVS to work without connection tracking Connection tracking: - most of the code is already in 2.6.36-rc - alter conntrack reply tuple for LVS-NAT connections when first packet from client is forwarded and conntrack state is NEW or RELATED. Additionally, alter reply for RELATED connections from real server, again for packet in original direction. - add IP_VS_XMIT_TUNNEL to confirm conntrack (without altering reply) for LVS-TUN early because we want to call nf_reset. It is needed because we add IPIP header and the original conntrack should be preserved, not destroyed. The transmitted IPIP packets can reuse same conntrack, so we do not set skb->ipvs_property. - try to destroy conntrack when the IPVS connection is destroyed. It is not fatal if conntrack disappears before that, it depends on the used timers. Fix problems from long time: - add skb->ip_summed = CHECKSUM_NONE for the LVS-TUN transmitters Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Patrick McHardy <kaber@trash.net>
| | * | ipvs: extend connection flags to 32 bitsJulian Anastasov2010-09-173-13/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - the sync protocol supports 16 bits only, so bits 0..15 should be used only for flags that should go to backup server, bits 16 and above should be allocated for flags not sent to backup. - use IP_VS_CONN_F_DEST_MASK as mask of connection flags in destination that can be changed by user space - allow IP_VS_CONN_F_ONE_PACKET to be set in destination Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Patrick McHardy <kaber@trash.net>
| | * | netfilter: nf_conntrack: fix the hash random initializing raceChangli Gao2010-09-161-6/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | nf_conntrack_alloc() isn't called with nf_conntrack_lock locked, so hash random initializing code maybe executed more than once on different CPUs. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>