aboutsummaryrefslogtreecommitdiffstats
path: root/net/netfilter/ipvs/ip_vs_core.c
Commit message (Collapse)AuthorAgeFilesLines
* update network codemigrate_3.2Wolfgang Wiedmeyer2015-10-241-72/+67
|
* ipvs: fix matching of fwmark templates during schedulingSimon Horman2012-02-291-1/+1
| | | | | | | | | | | | | | | | | | commit e0aac52e17a3db68fe2ceae281780a70fc69957f upstream. Commit f11017ec2d1859c661f4e2b12c4a8d250e1f47cf (2.6.37) moved the fwmark variable in subcontext that is invalidated before reaching the ip_vs_ct_in_get call. As vaddr is provided as pointer in the param structure make sure the fwmark variable is in same context. As the fwmark templates can not be matched, more and more template connections are created and the controlled connections can not go to single real server. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* IPVS netns exit causes crash in conntrackHans Schillstrom2011-06-131-0/+1
| | | | | | | | | | | | | | | | | Quote from Patric Mc Hardy "This looks like nfnetlink.c excited and destroyed the nfnl socket, but ip_vs was still holding a reference to a conntrack. When the conntrack got destroyed it created a ctnetlink event, causing an oops in netlink_has_listeners when trying to use the destroyed nfnetlink socket." If nf_conntrack_netlink is loaded before ip_vs this is not a problem. This patch simply avoids calling ip_vs_conn_drop_conntrack() when netns is dying as suggested by Julian. Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Signed-off-by: Simon Horman <horms@verge.net.au>
* ipvs: restore support for iptables SNATJulian Anastasov2011-06-061-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix the IPVS priority in LOCAL_IN hook, so that SNAT target in POSTROUTING is supported for IPVS traffic as in 2.6.36 where it worked depending on module load order. Before 2.6.37 we used priority 100 in LOCAL_IN to process remote requests. We used the same priority as iptables SNAT and if IPVS handlers are installed before SNAT handlers we supported SNAT in POSTROUTING for the IPVS traffic. If SNAT is installed before IPVS, the netfilter handlers are before IPVS and netfilter checks the NAT table twice for the IPVS requests: once in LOCAL_IN where IPS_SRC_NAT_DONE is set and second time in POSTROUTING where the SNAT rules are ignored because IPS_SRC_NAT_DONE was already set in LOCAL_IN. But in 2.6.37 we changed the IPVS priority for LOCAL_IN with the goal to be unique (101) forgetting the fact that for IPVS traffic we should not walk both LOCAL_IN and POSTROUTING nat tables. So, change the priority for processing remote IPVS requests from 101 to 99, i.e. before NAT_SRC (100) because we prefer to support SNAT in POSTROUTING instead of LOCAL_IN. It also moves the priority for IPVS replies from 99 to 98. Use constants instead of magic numbers at these places. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* ipvs: Remove all remaining references to rt->rt_{src,dst}Julian Anastasov2011-05-121-22/+2
| | | | | | | | | | | | | | | Remove all remaining references to rt->rt_{src,dst} by using dest->dst_saddr to cache saddr (used for TUN mode). For ICMP in FORWARD hook just restrict the rt_mode for NAT to disable LOCALNODE. All other modes do not allow IP_VS_RT_MODE_RDR, so we should be safe with the ICMP forwarding. Using cp->daddr as replacement for rt_dst is safe for all modes except BYPASS, even when cp->dest is NULL because it is cp->daddr that is used to assign cp->dest for sync-ed connections. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
* IPVS: init and cleanup restructuringHans Schillstrom2011-05-101-9/+92
| | | | | | | | | | | | | | | | | | | | | | | | | | DESCRIPTION This patch tries to restore the initial init and cleanup sequences that was before namspace patch. Netns also requires action when net devices unregister which has never been implemented. I.e this patch also covers when a device moves into a network namespace, and has to be released. IMPLEMENTATION The number of calls to register_pernet_device have been reduced to one for the ip_vs.ko Schedulers still have their own calls. This patch adds a function __ip_vs_service_cleanup() and an enable flag for the netfilter hooks. The nf hooks will be enabled when the first service is loaded and never disabled again, except when a namespace exit starts. Signed-off-by: Hans Schillstrom <hans@schillstrom.com> Acked-by: Julian Anastasov <ja@ssi.bg> [horms@verge.net.au: minor edit to changelog] Signed-off-by: Simon Horman <horms@verge.net.au>
* IPVS: Change of socket usage to enable name space exit.Hans Schillstrom2011-05-101-1/+1
| | | | | | | | | | | | | | | | | If the sync daemons run in a name space while it crashes or get killed, there is no way to stop them except for a reboot. When all patches are there, ip_vs_core will handle register_pernet_(), i.e. ip_vs_sync_init() and ip_vs_sync_cleanup() will be removed. Kernel threads should not increment the use count of a socket. By calling sk_change_net() after creating a socket this is avoided. sock_release cant be used intead sk_release_kernel() should be used. Thanks Eric W Biederman for your advices. Signed-off-by: Hans Schillstrom <hans@schillstrom.com> [horms@verge.net.au: minor edit to changelog] Signed-off-by: Simon Horman <horms@verge.net.au>
* IPVS: Minimise ip_vs_leave when CONFIG_SYSCTL is undefinedSimon Horman2011-03-151-2/+7
| | | | | | | | | | | | Much of ip_vs_leave() is unnecessary if CONFIG_SYSCTL is undefined. I tried an approach of breaking the now #ifdef'ed portions out into a separate function. However this appeared to grow the compiled code on x86_64 by about 200 bytes in the case where CONFIG_SYSCTL is defined. So I have gone with the simpler though less elegant #ifdef'ed solution for now. Signed-off-by: Simon Horman <horms@verge.net.au>
* IPVS: Add sysctl_expire_nodest_conn()Simon Horman2011-03-151-1/+7
| | | | | | | In preparation for not including sysctl_expire_nodest_conn in struct netns_ipvs when CONFIG_SYCTL is not defined. Signed-off-by: Simon Horman <horms@verge.net.au>
* IPVS: Add {sysctl_sync_threshold,period}()Simon Horman2011-03-151-5/+5
| | | | | | | In preparation for not including sysctl_sync_threshold in struct netns_ipvs when CONFIG_SYCTL is not defined. Signed-off-by: Simon Horman <horms@verge.net.au>
* IPVS: Add sysctl_nat_icmp_send()Simon Horman2011-03-151-3/+8
| | | | | | | In preparation for not including sysctl_nat_icmp_send in struct netns_ipvs when CONFIG_SYCTL is not defined. Signed-off-by: Simon Horman <horms@verge.net.au>
* IPVS: Add sysctl_snat_reroute()Simon Horman2011-03-151-4/+16
| | | | | | | In preparation for not including sysctl_snat_reroute in struct netns_ipvs when CONFIG_SYCTL is not defined. Signed-off-by: Simon Horman <horms@verge.net.au>
* IPVS: Add ip_vs_route_me_harder()Simon Horman2011-03-151-26/+22
| | | | | | Add ip_vs_route_me_harder() to avoid repeating the same code twice. Signed-off-by: Simon Horman <horms@verge.net.au>
* ipvs: reorganize tot_statsJulian Anastasov2011-03-151-3/+3
| | | | | | | | | | | | | | | | | The global tot_stats contains cpustats field just like the stats for dest and svc, so better use it to simplify the usage in estimation_timer. As tot_stats is registered as estimator we can remove the special ip_vs_read_cpu_stats call for tot_stats. Fix ip_vs_read_cpu_stats to be called under stats lock because it is still used as synchronization between estimation timer and user context (the stats readers). Also, make sure ip_vs_stats_percpu_show reads properly the u64 stats from user context. Signed-off-by: Julian Anastasov <ja@ssi.bg> Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Simon Horman <horms@verge.net.au>
* ipvs: remove extra lookups for ICMP packetsJulian Anastasov2011-02-161-25/+3
| | | | | | | | | | | | Remove code that should not be called anymore. Now when ip_vs_out handles replies for local clients at LOCAL_IN hook we do not need to call conn_out_get and handle_response_icmp from ip_vs_in_icmp* because such lookups were already performed for the ICMP packet and no connection was found. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
* IPVS: use z modifier for sizeof() argumentSimon Horman2011-02-011-1/+1
| | | | | | | | | Reported-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Hans Schillstrom <hans@schillstrom.com> Tested-by: Hans Schillstrom <hans@schillstrom.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
* netfilter: ipvs: fix compiler warningsChangli Gao2011-01-251-3/+1
| | | | | | | | Fix compiler warnings when IP_VS_DBG() isn't defined. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Acked-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Signed-off-by: Simon Horman <horms@verge.net.au>
* IPVS: netns, final patch enabling network name space.Hans Schillstrom2011-01-131-4/+0
| | | | | | | | | all init_net removed, (except for some alloc related that needs to be there) Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
* IPVS: netns, misc init_net removal in core.Hans Schillstrom2011-01-131-2/+4
| | | | | | | | init_net removed in __ip_vs_addr_is_local_v6, and got net as param. Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
* IPVS: netns, defense work timer.Hans Schillstrom2011-01-131-0/+1
| | | | | | | | | | | | | This patch makes defense work timer per name-space, A net ptr had to be added to the ipvs struct, since it's needed by defense_work_handler. [ horms@verge.net.au: Use cancel_delayed_work_sync() instead of cancel_rearming_delayed_work(). Found during merge conflict resoliution ] Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
* IPVS: netns, ip_vs_ctl local vars moved to ipvs struct.Hans Schillstrom2011-01-131-12/+22
| | | | | | | | | | | | | Moving global vars to ipvs struct, except for svc table lock. Next patch for ctl will be drop-rate handling. *v3 __ip_vs_mutex remains global ip_vs_conntrack_enabled(struct netns_ipvs *ipvs) Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
* IPVS: netns, connection hash got net as param.Hans Schillstrom2011-01-131-6/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | Connection hash table is now name space aware. i.e. net ptr >> 8 is xor:ed to the hash, and this is the first param to be compared. The net struct is 0xa40 in size ( a little bit smaller for 32 bit arch:s) and cache-line aligned, so a ptr >> 5 might be a more clever solution ? All lookups where net is compared uses net_eq() which returns 1 when netns is disabled, and the compiler seems to do something clever in that case. ip_vs_conn_fill_param() have *net as first param now. Three new inlines added to keep conn struct smaller when names space is disabled. - ip_vs_conn_net() - ip_vs_conn_net_set() - ip_vs_conn_net_eq() *v3 moved net compare to the end in "fast path" Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
* IPVS: netns, ip_vs_stats and its procfsHans Schillstrom2011-01-131-38/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The statistic counter locks for every packet are now removed, and that statistic is now per CPU, i.e. no locks needed. However summing is made in ip_vs_est into ip_vs_stats struct which is moved to ipvs struc. procfs, ip_vs_stats now have a "per cpu" count and a grand total. A new function seq_file_single_net() in ip_vs.h created for handling of single_open_net() since it does not place net ptr in a struct, like others. /var/lib/lxc # cat /proc/net/ip_vs_stats_percpu Total Incoming Outgoing Incoming Outgoing CPU Conns Packets Packets Bytes Bytes 0 0 3 1 9D 34 1 0 1 2 49 70 2 0 1 2 34 76 3 1 2 2 70 74 ~ 1 7 7 18A 18E Conns/s Pkts/s Pkts/s Bytes/s Bytes/s 0 0 0 0 0 *v3 ip_vs_stats reamains as before, instead ip_vs_stats_percpu is added. u64 seq lock added *v4 Bug correction inbytes and outbytes as own vars.. per_cpu counter for all stats now as suggested by Julian. [horms@verge.net.au: removed whitespace-change-only hunk] Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
* IPVS: netns awareness to ip_vs_syncHans Schillstrom2011-01-131-6/+9
| | | | | | | | | | All global variables moved to struct ipvs, most external changes fixed (i.e. init_net removed) in sync_buf create + 4 replaced by sizeof(struct..) Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
* IPVS: netns, use ip_vs_proto_data as param.Hans Schillstrom2011-01-131-30/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | ip_vs_protocol *pp is replaced by ip_vs_proto_data *pd in function call in ip_vs_protocol struct i.e. :, - timeout_change() - state_transition() ip_vs_protocol_timeout_change() got ipvs as param, due to above and a upcoming patch - defence work Most of this changes are triggered by Julians comment: "tcp_timeout_change should work with the new struct ip_vs_proto_data so that tcp_state_table will go to pd->state_table and set_tcp_state will get pd instead of pp" *v3 Mostly comments from Julian The pp -> pd conversion should start from functions like ip_vs_out() that use pp = ip_vs_proto_get(iph.protocol), now they should use ip_vs_proto_data_get(net, iph.protocol). conn_in_get() and conn_out_get() unused param *pp, removed. *v4 ip_vs_protocol_timeout_change() walk the proto_data path. Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
* IPVS: netns to services part 1Hans Schillstrom2011-01-131-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | Services hash tables got netns ptr a hash arg, While Real Servers (rs) has been moved to ipvs struct. Two new inline functions added to get net ptr from skb. Since ip_vs is called from different contexts there is two places to dig for the net ptr skb->dev or skb->sk this is handled in skb_net() and skb_sknet() Global functions, ip_vs_service_get() ip_vs_lookup_real_service() etc have got struct net *net as first param. If possible get net ptr skb etc, - if not &init_net is used at this early stage of patching. ip_vs_ctl.c procfs not ready for netns yet. *v3 Comments by Julian - __ip_vs_service_find and __ip_vs_svc_fwm_find are fast path, net_eq(svc->net, net) so the check is at the end now. - net = skb_net(skb) in ip_vs_out moved after check for skb_dst. Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
* IPVS: netns, add basic init per netns.Hans Schillstrom2011-01-131-2/+61
| | | | | | | | | | | | | | | | | | | Preparation for network name-space init, in this stage some empty functions exists. In most files there is a check if it is root ns i.e. init_net if (!net_eq(net, &init_net)) return ... this will be removed by the last patch, when enabling name-space. *v3 ip_vs_conn.c merge error corrected. net_ipvs #ifdef removed as sugested by Jan Engelhardt [ horms@verge.net.au: Removed whitespace-change-only hunks ] Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
* IPVS: Backup, Change sending to Version 1 formatHans Schillstrom2010-11-251-4/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Enable sending and removal of version 0 sending Affected functions, ip_vs_sync_buff_create() ip_vs_sync_conn() ip_vs_core.c removal of IPv4 check. *v5 Just check cp->pe_data_len in ip_vs_sync_conn Check if padding needed before adding a new sync_conn to the buffer, i.e. avoid sending padding at the end. *v4 moved sanity check and pe_name_len after sloop. use cp->pe instead of cp->dest->svc->pe real length in each sync_conn, not padded length however total size of a sync_msg includes padding. *v3 Sending ip_vs_sync_conn_options in network order. Sending Templates for ONE_PACKET conn. Renaming of ip_vs_sync_mesg to ip_vs_sync_mesg_v0 Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
* IPVS: Handle Scheduling errors.Hans Schillstrom2010-11-251-16/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If ip_vs_conn_fill_param_persist return an error to ip_vs_sched_persist, this error must propagate as ignored=-1 to ip_vs_schedule(). Errors from ip_vs_conn_new() in ip_vs_sched_persist() and ip_vs_schedule() should also return *ignored=-1; This patch just relies on the fact that ignored is 1 before calling ip_vs_sched_persist(). Sent from Julian: "The new case when ip_vs_conn_fill_param_persist fails should set *ignored = -1, so that we can use NF_DROP, see below. *ignored = -1 should be also used for ip_vs_conn_new failure in ip_vs_sched_persist() and ip_vs_schedule(). The new negative value should be handled in tcp,udp,sctp" "To summarize: - *ignored = 1: protocol tried to schedule (eg. on SYN), found svc but the svc/scheduler decides that this packet should be accepted with NF_ACCEPT because it must not be scheduled. - *ignored = 0: scheduler can not find destination, so try bypass or return ICMP and then NF_DROP (ip_vs_leave). - *ignored = -1: scheduler tried to schedule but fatal error occurred, eg. ip_vs_conn_new failure (ENOMEM) or ip_vs_sip_fill_param failure such as missing Call-ID, ENOMEM on skb_linearize or pe_data. In this case we should return NF_DROP without any attempts to send ICMP with ip_vs_leave." More or less all ideas and input to this patch is work from Julian Anastasov Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
* IPVS: Split ports[2] into src_port and dst_portHans Schillstrom2010-11-251-10/+11
| | | | | | | | | | | | Avoid sending invalid pointer due to skb_linearize() call. This patch prepares for next patch where skb_linearize is a part. In ip_vs_sched_persist() params the ports ptr will be replaced by src and dst port. Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
* IPVS: Backup, Prepare for transferring firewall marks (fwmark) to the backup ↵Hans Schillstrom2010-11-251-4/+4
| | | | | | | | | | | | | | | daemon. One struct will have fwmark added: * ip_vs_conn ip_vs_conn_new() and ip_vs_find_dest() will have an extra param - fwmark The effects of that, is in this patch. Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
* ipvs: provide address family for debuggingJulian Anastasov2010-10-211-17/+24
| | | | | | | | | | | | As skb->protocol is not valid in LOCAL_OUT add parameter for address family in packet debugging functions. Even if ports are not present in AH and ESP change them to use ip_vs_tcpudp_debug_packet to show at least valid addresses as before. This patch removes the last user of skb->protocol in IPVS. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
* ipvs: changes for local clientJulian Anastasov2010-10-211-77/+189
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch deals with local client processing. Prefer LOCAL_OUT hook for scheduling connections from local clients. LOCAL_IN is still supported if the packets are not marked as processed in LOCAL_OUT. The idea to process requests in LOCAL_OUT is to alter conntrack reply before it is confirmed at POST_ROUTING. If the local requests are processed in LOCAL_IN the conntrack can not be updated and matching by state is impossible. Add the following handlers: - ip_vs_reply[46] at LOCAL_IN:99 to process replies from remote real servers to local clients. Now when both replies from remote real servers (ip_vs_reply*) and local real servers (ip_vs_local_reply*) are handled it is safe to remove the conn_out_get call from ip_vs_in because it does not support related ICMP packets. - ip_vs_local_request[46] at LOCAL_OUT:-98 to process requests from local client Handling in LOCAL_OUT causes some changes: - as skb->dev, skb->protocol and skb->pkt_type are not defined in LOCAL_OUT make sure we set skb->dev before calling icmpv6_send, prefer skb_dst(skb) for struct net and remove the skb->protocol checks from TUN transmitters. [ horms@verge.net.au: removed trailing whitespace ] Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
* ipvs: changes for local real serverJulian Anastasov2010-10-211-16/+107
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch deals with local real servers: - Add support for DNAT to local address (different real server port). It needs ip_vs_out hook in LOCAL_OUT for both families because skb->protocol is not set for locally generated packets and can not be used to set 'af'. - Skip packets in ip_vs_in marked with skb->ipvs_property because ip_vs_out processing can be executed in LOCAL_OUT but we still have the conn_out_get check in ip_vs_in. - Ignore packets with inet->nodefrag from local stack - Require skb_dst(skb) != NULL because we use it to get struct net - Add support for changing the route to local IPv4 stack after DNAT depending on the source address type. Local client sets output route and the remote client sets input route. It looks like IPv6 does not need such rerouting because the replies use addresses from initial incoming header, not from skb route. - All transmitters now have strict checks for the destination address type: redirect from non-local address to local real server requires NAT method, local address can not be used as source address when talking to remote real server. - Now LOCALNODE is not set explicitly as forwarding method in real server to allow the connections to provide correct forwarding method to the backup server. Not sure if this breaks tools that expect to see 'Local' real server type. If needed, this can be supported with new flag IP_VS_DEST_F_LOCAL. Now it should be possible connections in backup that lost their fwmark information during sync to be forwarded properly to their daddr, even if it is local address in the backup server. By this way backup could be used as real server for DR or TUN, for NAT there are some restrictions because tuple collisions in conntracks can create problems for the traffic. - Call ip_vs_dst_reset when destination is updated in case some real server IP type is changed between local and remote. [ horms@verge.net.au: removed trailing whitespace ] Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
* ipvs: move ip_route_me_harder for ICMPJulian Anastasov2010-10-211-20/+21
| | | | | | | | | | | Currently, ip_route_me_harder after ip_vs_out_icmp is called even if packet is not related to IPVS connection. Move it into handle_response_icmp. Also, force rerouting if sending to local client because IPv4 stack uses addresses from the route. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
* ipvs: create ip_vs_defrag_userJulian Anastasov2010-10-211-21/+34
| | | | | | | | | Create new function ip_vs_defrag_user to return correct IP_DEFRAG_xxx user depending on the hooknum. It will be needed when we add handlers in LOCAL_OUT. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
* ipvs: stop ICMP from FORWARD to localJulian Anastasov2010-10-211-2/+19
| | | | | | | Delivering locally ICMP from FORWARD hook is not supported. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
* ipvs: do not schedule conns from real serversJulian Anastasov2010-10-211-2/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is needed to avoid scheduling of packets from local real server when we add ip_vs_in in LOCAL_OUT hook to support local client. Currently, when ip_vs_in can not find existing connection it tries to create new one by calling ip_vs_schedule. The default indication from ip_vs_schedule was if connection was scheduled to real server. If real server is not available we try to use the bypass forwarding method or to send ICMP error. But in some cases we do not want to use the bypass feature. So, add flag 'ignored' to indicate if the scheduler ignores this packet. Make sure we do not create new connections from replies. We can hit this problem for persistent services and local real server when ip_vs_in is added to LOCAL_OUT hook to handle local clients. Also, make sure ip_vs_schedule ignores SYN packets for Active FTP DATA from local real server. The FTP DATA connection should be created on SYN+ACK from client to assign correct connection daddr. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
* ipvs: switch to notrack modeJulian Anastasov2010-10-211-35/+4
| | | | | | | | | | | | Change skb->ipvs_property semantic. This is preparation to support ip_vs_out processing in LOCAL_OUT. ipvs_property=1 will be used to avoid expensive lookups for traffic sent by transmitters. Now when conntrack support is not used we call ip_vs_notrack method to avoid problems in OUTPUT and POST_ROUTING hooks instead of exiting POST_ROUTING as before. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
* IPVS: Fallback if persistence engine failsSimon Horman2010-10-041-6/+4
| | | | | | | | | | | | | | | Fall back to normal persistence handling if the persistence engine fails to recognise a packet. This way, at least the packet will go somewhere. It is envisaged that iptables could be used to block packets such if this is not desired although nf_conntrack_sip would likely need to be enhanced first. Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: Julian Anastasov <ja@ssi.bg>
* IPVS: Add struct ip_vs_peSimon Horman2010-10-041-7/+29
| | | | | | Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: Julian Anastasov <ja@ssi.bg>
* IPVS: Add struct ip_vs_conn_paramSimon Horman2010-10-041-30/+34
| | | | | | Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: Julian Anastasov <ja@ssi.bg>
* IPVS: compact ip_vs_sched_persist()Simon Horman2010-10-041-105/+51
| | | | | | | | | Compact ip_vs_sched_persist() by setting up parameters and calling functions once. Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: Julian Anastasov <ja@ssi.bg>
* ipvs: make rerouting optional with snat_rerouteJulian Anastasov2010-09-211-8/+29
| | | | | | | | | | | Add new sysctl flag "snat_reroute". Recent kernels use ip_route_me_harder() to route LVS-NAT responses properly by VIP when there are multiple paths to client. But setups that do not have alternative default routes can skip this routing lookup by using snat_reroute=0. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Patrick McHardy <kaber@trash.net>
* ipvs: netfilter connection tracking changesJulian Anastasov2010-09-211-4/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add more code to IPVS to work with Netfilter connection tracking and fix some problems. - Allow IPVS to be compiled without connection tracking as in 2.6.35 and before. This can avoid keeping conntracks for all IPVS connections because this costs memory. ip_vs_ftp still depends on connection tracking and NAT as implemented for 2.6.36. - Add sysctl var "conntrack" to enable connection tracking for all IPVS connections. For loaded IPVS directors it needs tuning of nf_conntrack_max limit. - Add IP_VS_CONN_F_NFCT connection flag to request the connection to use connection tracking. This allows user space to provide this flag, for example, in dest->conn_flags. This can be useful to request connection tracking per real server instead of forcing it for all connections with the "conntrack" sysctl. This flag is set currently only by ip_vs_ftp and of course by "conntrack" sysctl. - Add ip_vs_nfct.c file to hold all connection tracking code, by this way main code should not depend of netfilter conntrack support. - Return back the ip_vs_post_routing handler as in 2.6.35 and use skb->ipvs_property=1 to allow IPVS to work without connection tracking Connection tracking: - most of the code is already in 2.6.36-rc - alter conntrack reply tuple for LVS-NAT connections when first packet from client is forwarded and conntrack state is NEW or RELATED. Additionally, alter reply for RELATED connections from real server, again for packet in original direction. - add IP_VS_XMIT_TUNNEL to confirm conntrack (without altering reply) for LVS-TUN early because we want to call nf_reset. It is needed because we add IPIP header and the original conntrack should be preserved, not destroyed. The transmitted IPIP packets can reuse same conntrack, so we do not set skb->ipvs_property. - try to destroy conntrack when the IPVS connection is destroyed. It is not fatal if conntrack disappears before that, it depends on the used timers. Fix problems from long time: - add skb->ip_summed = CHECKSUM_NONE for the LVS-TUN transmitters Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Patrick McHardy <kaber@trash.net>
* ipvs: extend connection flags to 32 bitsJulian Anastasov2010-09-171-5/+6
| | | | | | | | | | | | | | - the sync protocol supports 16 bits only, so bits 0..15 should be used only for flags that should go to backup server, bits 16 and above should be allocated for flags not sent to backup. - use IP_VS_CONN_F_DEST_MASK as mask of connection flags in destination that can be changed by user space - allow IP_VS_CONN_F_ONE_PACKET to be set in destination Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Patrick McHardy <kaber@trash.net>
* Merge branch 'master' of ↵David S. Miller2010-09-091-0/+1
|\ | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/mac80211/main.c
| * ipvs: fix active FTPJulian Anastasov2010-09-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Do not create expectation when forwarding the PORT command to avoid blocking the connection. The problem is that nf_conntrack_ftp.c:help() tries to create the same expectation later in POST_ROUTING and drops the packet with "dropping packet" message after failure in nf_ct_expect_related. - Change ip_vs_update_conntrack to alter the conntrack for related connections from real server. If we do not alter the reply in this direction the next packet from client sent to vport 20 comes as NEW connection. We alter it but may be some collision happens for both conntracks and the second conntrack gets destroyed immediately. The connection stucks too. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipvs: use pkts for SCTP tooJulian Anastasov2010-09-021-3/+3
| | | | | | | | | | | | | | | | Use correctly the in_pkts packet counter also for SCTP Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* | IPVS: include net/ip6_checksum.h for csum_ipv6_magicStephen Rothwell2010-08-291-0/+1
| | | | | | | | | | | | | | | | | | | | | | Fixes this build error: net/netfilter/ipvs/ip_vs_core.c: In function 'ip_vs_nat_icmp_v6': net/netfilter/ipvs/ip_vs_core.c:640: error: implicit declaration of function 'csum_ipv6_magic' Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>