aboutsummaryrefslogtreecommitdiffstats
path: root/net/sched
Commit message (Collapse)AuthorAgeFilesLines
* rtnetlink: Compute and store minimum ifinfo dump sizeGreg Rose2013-01-173-12/+13
| | | | | | | | | | | | | | | | | | | commit c7ac8679bec9397afe8918f788cbcef88c38da54 upstream. The message size allocated for rtnl ifinfo dumps was limited to a single page. This is not enough for additional interface info available with devices that support SR-IOV and caused a bug in which VF info would not be displayed if more than approximately 40 VFs were created per interface. Implement a new function pointer for the rtnl_register service that will calculate the amount of data required for the ifinfo dump and allocate enough data to satisfy the request. Signed-off-by: Greg Rose <gregory.v.rose@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* net: sched: integer overflow fixStefan Hasko2013-01-111-1/+1
| | | | | | | | | | | [ Upstream commit d2fe85da52e89b8012ffad010ef352a964725d5f ] Fixed integer overflow in function htb_dequeue Signed-off-by: Stefan Hasko <hasko.stevo@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* pkt_sched: fix virtual-start-time update in QFQPaolo Valente2012-10-131-1/+4
| | | | | | | | | | | | | | | | | | [ Upstream commit 71261956973ba9e0637848a5adb4a5819b4bae83 ] If the old timestamps of a class, say cl, are stale when the class becomes active, then QFQ may assign to cl a much higher start time than the maximum value allowed. This may happen when QFQ assigns to the start time of cl the finish time of a group whose classes are characterized by a higher value of the ratio max_class_pkt/weight_of_the_class with respect to that of cl. Inserting a class with a too high start time into the bucket list corrupts the data structure and may eventually lead to crashes. This patch limits the maximum start time assigned to a class. Signed-off-by: Paolo Valente <paolo.valente@unimore.it> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* net-sched: sch_cbq: avoid infinite loopEric Dumazet2012-10-131-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit bdfc87f7d1e253e0a61e2fc6a75ea9d76f7fc03a ] Its possible to setup a bad cbq configuration leading to an infinite loop in cbq_classify() DEV_OUT=eth0 ICMP="match ip protocol 1 0xff" U32="protocol ip u32" DST="match ip dst" tc qdisc add dev $DEV_OUT root handle 1: cbq avpkt 1000 \ bandwidth 100mbit tc class add dev $DEV_OUT parent 1: classid 1:1 cbq \ rate 512kbit allot 1500 prio 5 bounded isolated tc filter add dev $DEV_OUT parent 1: prio 3 $U32 \ $ICMP $DST 192.168.3.234 flowid 1: Reported-by: Denys Fedoryschenko <denys@visp.net.lb> Tested-by: Denys Fedoryschenko <denys@visp.net.lb> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* net_sched: gact: Fix potential panic in tcf_gact().Hiroaki SHIMODA2012-10-021-3/+11
| | | | | | | | | | | | | | | [ Upstream commit 696ecdc10622d86541f2e35cc16e15b6b3b1b67e ] gact_rand array is accessed by gact->tcfg_ptype whose value is assumed to less than MAX_RAND, but any range checks are not performed. So add a check in tcf_gact_init(). And in tcf_gact(), we can reduce a branch. Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* sch_sfb: Fix missing NULL checkAlan Cox2012-08-091-0/+2
| | | | | | | | | | | [ Upstream commit 7ac2908e4b2edaec60e9090ddb4d9ceb76c05e7d ] Resolves-bug: https://bugzilla.kernel.org/show_bug.cgi?id=44461 Signed-off-by: Alan Cox <alan@linux.intel.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* netem: fix possible skb leakEric Dumazet2012-05-211-4/+2
| | | | | | | | | | | | | [ Upstream commit 116a0fc31c6c9b8fc821be5a96e5bf0b43260131 ] skb_checksum_help(skb) can return an error, we must free skb in this case. qdisc_drop(skb, sch) can also be feeded with a NULL skb (if skb_unshare() failed), so lets use this generic helper. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* net_sched: gred: Fix oops in gred_dump() in WRED modeDavid Ward2012-04-271-5/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 244b65dbfede788f2fa3fe2463c44d0809e97c6b ] A parameter set exists for WRED mode, called wred_set, to hold the same values for qavg and qidlestart across all VQs. The WRED mode values had been previously held in the VQ for the default DP. After these values were moved to wred_set, the VQ for the default DP was no longer created automatically (so that it could be omitted on purpose, to have packets in the default DP enqueued directly to the device without using RED). However, gred_dump() was overlooked during that change; in WRED mode it still reads qavg/qidlestart from the VQ for the default DP, which might not even exist. As a result, this command sequence will cause an oops: tc qdisc add dev $DEV handle $HANDLE parent $PARENT gred setup \ DPs 3 default 2 grio tc qdisc change dev $DEV handle $HANDLE gred DP 0 prio 8 $RED_OPTIONS tc qdisc change dev $DEV handle $HANDLE gred DP 1 prio 8 $RED_OPTIONS This fixes gred_dump() in WRED mode to use the values held in wred_set. Signed-off-by: David Ward <david.ward@ll.mit.edu> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* net_sched: Bug in netem reorderingHagen Paul Pfeifer2012-02-291-2/+2
| | | | | | | | | | | | | | | | | | [ Upstream commit eb10192447370f19a215a8c2749332afa1199d46 ] Not now, but it looks you are correct. q->qdisc is NULL until another additional qdisc is attached (beside tfifo). See 50612537e9ab2969312. The following patch should work. From: Hagen Paul Pfeifer <hagen@jauu.net> netem: catch NULL pointer by updating the real qdisc statistic Reported-by: Vijay Subramanian <subramanian.vijay@gmail.com> Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* net: Make qdisc_skb_cb upper size bound explicit.David S. Miller2012-02-293-6/+3
| | | | | | | | | | | | | | [ Upstream commit 16bda13d90c8d5da243e2cfa1677e62ecce26860 ] Just like skb->cb[], so that qdisc_skb_cb can be encapsulated inside of other data structures. This is intended to be used by IPoIB so that it can remember addressing information stored at hard_header_ops->create() time that it can fetch when the packet gets to the transmit routine. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* net: fix NULL dereferences in check_peer_redir()Eric Dumazet2012-02-131-11/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit d3aaeb38c40e5a6c08dd31a1b64da65c4352be36, along with dependent backports of commits: 69cce1d1404968f78b177a0314f5822d5afdbbfb 9de79c127cccecb11ae6a21ab1499e87aa222880 218fa90f072e4aeff9003d57e390857f4f35513e 580da35a31f91a594f3090b7a2c39b85cb051a12 f7e57044eeb1841847c24aa06766c8290c202583 e049f28883126c689cf95859480d9ee4ab23b7fa ] Gergely Kalman reported crashes in check_peer_redir(). It appears commit f39925dbde778 (ipv4: Cache learned redirect information in inetpeer.) added a race, leading to possible NULL ptr dereference. Since we can now change dst neighbour, we should make sure a reader can safely use a neighbour. Add RCU protection to dst neighbour, and make sure check_peer_redir() can be called safely by different cpus in parallel. As neighbours are already freed after one RCU grace period, this patch should not add typical RCU penalty (cache cold effects) Many thanks to Gergely for providing a pretty report pointing to the bug. Reported-by: Gergely Kalman <synapse@hippy.csoma.elte.hu> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* sch_gred: should not use GFP_KERNEL while holding a spinlockEric Dumazet2012-01-061-1/+1
| | | | | | | | | | | | | | | | [ Upstream commit 3f1e6d3fd37bd4f25e5b19f1c7ca21850426c33f ] gred_change_vq() is called under sch_tree_lock(sch). This means a spinlock is held, and we are not allowed to sleep in this context. We might pre-allocate memory using GFP_KERNEL before taking spinlock, but this is not suitable for stable material. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* mqprio: Avoid panic if no options are providedThomas Graf2012-01-061-1/+1
| | | | | | | | | | | | [ Upstream commit 7838f2ce36b6ab5c13ef20b1857e3bbd567f1759 ] Userspace may not provide TCA_OPTIONS, in fact tc currently does so not do so if no arguments are specified on the command line. Return EINVAL instead of panicing. Signed-off-by: Thomas Graf <tgraf@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* net_sched: prio: use qdisc_dequeue_peekedFlorian Westphal2011-10-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 3557619f0f6f7496ed453d4825e24958ab1884e0 ] commit 07bd8df5df4369487812bf85a237322ff3569b77 (sch_sfq: fix peek() implementation) changed sfq to use generic peek helper. This makes HFSC complain about a non-work-conserving child qdisc, if prio with sfq child is used within hfsc: hfsc peeks into prio qdisc, which will then peek into sfq. returned skb is stashed in sch->gso_skb. Next, hfsc tries to dequeue from prio, but prio will call sfq dequeue directly, which may return NULL instead of previously peeked-at skb. Have prio call qdisc_dequeue_peeked, so sfq->dequeue() is not called in this case. Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* sch_sfq: fix sfq_enqueue()Eric Dumazet2011-08-151-1/+6
| | | | | | | | | | | | | | | | | | | | | [ Upstream commit e1738bd9cecc5c867b0e2996470c1ff20f66ba79 ] commit 8efa88540635 (sch_sfq: avoid giving spurious NET_XMIT_CN signals) forgot to call qdisc_tree_decrease_qlen() to signal upper levels that a packet (from another flow) was dropped, leading to various problems. With help from Michal Soltys and Michal Pokrywka, who did a bisection. Bugzilla ref: https://bugzilla.kernel.org/show_bug.cgi?id=39372 Debian ref: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=631945 Reported-by: Lucas Bocchi <lucas.bocchi@gmail.com> Reported-and-bisected-by: Michal Pokrywka <wolfmoon@o2.pl> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Michal Soltys <soltys@ziu.info> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* net: Rework netdev_drivername() to avoid warning.David S. Miller2011-06-061-2/+1
| | | | | | | | | | | | | This interface uses a temporary buffer, but for no real reason. And now can generate warnings like: net/sched/sch_generic.c: In function dev_watchdog net/sched/sch_generic.c:254:10: warning: unused variable drivername Just return driver->name directly or "". Reported-by: Connor Hansen <cmdkhh@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sch_sfq: fix peek() implementationEric Dumazet2011-05-251-13/+1
| | | | | | | | | | | | | | | Since commit eeaeb068f139 (sch_sfq: allow big packets and be fair), sfq_peek() can return a different skb that would be normally dequeued by sfq_dequeue() [ if current slot->allot is negative ] Use generic qdisc_peek_dequeued() instead of custom implementation, to get consistent result. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Jarek Poplawski <jarkao2@gmail.com> CC: Patrick McHardy <kaber@trash.net> CC: Jesper Dangaard Brouer <hawk@diku.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
* sch_sfq: avoid giving spurious NET_XMIT_CN signalsEric Dumazet2011-05-231-2/+6
| | | | | | | | | | | | | | | | | | | | | | While chasing a possible net_sched bug, I found that IP fragments have litle chance to pass a congestioned SFQ qdisc : - Say SFQ qdisc is full because one flow is non responsive. - ip_fragment() wants to send two fragments belonging to an idle flow. - sfq_enqueue() queues first packet, but see queue limit reached : - sfq_enqueue() drops one packet from 'big consumer', and returns NET_XMIT_CN. - ip_fragment() cancel remaining fragments. This patch restores fairness, making sure we return NET_XMIT_CN only if we dropped a packet from the same flow. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Patrick McHardy <kaber@trash.net> CC: Jarek Poplawski <jarkao2@gmail.com> CC: Jamal Hadi Salim <hadi@cyberus.ca> CC: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: avoid synchronize_rcu() in dev_deactivate_manyEric Dumazet2011-05-221-2/+15
| | | | | | | | | | | | | | | | | | | | | dev_deactivate_many() issues one synchronize_rcu() call after qdiscs set to noop_qdisc. This call is here to make sure they are no outstanding qdisc-less dev_queue_xmit calls before returning to caller. But in dismantle phase, we dont have to wait, because we wont activate again the device, and we are going to wait one rcu grace period later in rollback_registered_many(). After this patch, device dismantle uses one synchronize_net() and one rcu_barrier() call only, so we have a ~30% speedup and a smaller RTNL latency. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Patrick McHardy <kaber@trash.net>, CC: Ben Greear <greearb@candelatech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6Linus Torvalds2011-05-205-3/+1151
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1446 commits) macvlan: fix panic if lowerdev in a bond tg3: Add braces around 5906 workaround. tg3: Fix NETIF_F_LOOPBACK error macvlan: remove one synchronize_rcu() call networking: NET_CLS_ROUTE4 depends on INET irda: Fix error propagation in ircomm_lmp_connect_response() irda: Kill set but unused variable 'bytes' in irlan_check_command_param() irda: Kill set but unused variable 'clen' in ircomm_connect_indication() rxrpc: Fix set but unused variable 'usage' in rxrpc_get_transport() be2net: Kill set but unused variable 'req' in lancer_fw_download() irda: Kill set but unused vars 'saddr' and 'daddr' in irlan_provider_connect_indication() atl1c: atl1c_resume() is only used when CONFIG_PM_SLEEP is defined. rxrpc: Fix set but unused variable 'usage' in rxrpc_get_peer(). rxrpc: Kill set but unused variable 'local' in rxrpc_UDP_error_handler() rxrpc: Kill set but unused variable 'sp' in rxrpc_process_connection() rxrpc: Kill set but unused variable 'sp' in rxrpc_rotate_tx_window() pkt_sched: Kill set but unused variable 'protocol' in tc_classify() isdn: capi: Use pr_debug() instead of ifdefs. tg3: Update version to 3.119 tg3: Apply rx_discards fix to 5719/5720 ... Fix up trivial conflicts in arch/x86/Kconfig and net/mac80211/agg-tx.c as per Davem.
| * networking: NET_CLS_ROUTE4 depends on INETRandy Dunlap2011-05-191-0/+1
| | | | | | | | | | | | | | | | | | | | | | IP_ROUTE_CLASSID depends on INET and NET_CLS_ROUTE4 selects IP_ROUTE_CLASSID, but when INET is not enabled, this kconfig warning is produced, so fix it by making NET_CLS_ROUTE4 depend on INET. warning: (NET_CLS_ROUTE4) selects IP_ROUTE_CLASSID which has unmet direct dependencies (NET && INET) Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * pkt_sched: Kill set but unused variable 'protocol' in tc_classify()David S. Miller2011-05-191-2/+0
| | | | | | | | | | | | | | I checked the history and this has been like this since the beginning of time. Signed-off-by: David S. Miller <davem@davemloft.net>
| * inet: constify ip headers and in6_addrEric Dumazet2011-04-221-1/+1
| | | | | | | | | | | | | | | | Add const qualifiers to structs iphdr, ipv6hdr and in6_addr pointers where possible, to make code intention more obvious. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * Merge branch 'master' of ↵David S. Miller2011-04-115-7/+7
| |\ | | | | | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/smsc911x.c
| * | pkt_sched: QFQ - quick fair queue schedulerstephen hemminger2011-04-043-0/+1149
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is an implementation of the Quick Fair Queue scheduler developed by Fabio Checconi. The same algorithm is already implemented in ipfw in FreeBSD. Fabio had an earlier version developed on Linux, I just cleaned it up. Thanks to Eric Dumazet for testing this under load. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net,act_police,rcu: remove rcu_barrier()Lai Jiangshan2011-05-071-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is no callback of this module maybe queued since we use kfree_rcu(), we can safely remove the rcu_barrier(). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
* | | net,rcu: convert call_rcu(tcf_police_free_rcu) to kfree_rcu()Lai Jiangshan2011-05-071-6/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [PATCH 05/17] net,rcu: convert call_rcu(tcf_police_free_rcu) to kfree_rcu() The rcu callback tcf_police_free_rcu() just calls a kfree(), so we use kfree_rcu() instead of the call_rcu(tcf_police_free_rcu). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
* | | net,rcu: convert call_rcu(tcf_common_free_rcu) to kfree_rcu()Lai Jiangshan2011-05-071-6/+1
| |/ |/| | | | | | | | | | | | | | | | | The rcu callback tcf_common_free_rcu() just calls a kfree(), so we use kfree_rcu() instead of the call_rcu(tcf_common_free_rcu). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
* | Fix common misspellingsLucas De Marchi2011-03-315-7/+7
|/ | | | | | Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
* ipv4: Remove flowi from struct rtable.David S. Miller2011-03-042-2/+2
| | | | | | | | | | The only necessary parts are the src/dst addresses, the interface indexes, the TOS, and the mark. The rest is unnecessary bloat, which amounts to nearly 50 bytes on 64-bit. Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'master' of ↵David S. Miller2011-03-031-0/+1
|\ | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/bnx2x/bnx2x.h
| * net: Fix more stale on-stack list_head objects.Eric W. Biederman2011-02-201-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | From: Eric W. Biederman <ebiederm@xmission.com> In the beginning with batching unreg_list was a list that was used only once in the lifetime of a network device (I think). Now we have calls using the unreg_list that can happen multiple times in the life of a network device like dev_deactivate and dev_close that are also using the unreg_list. In addition in unregister_netdevice_queue we also do a list_move because for devices like veth pairs it is possible that unregister_netdevice_queue will be called multiple times. So I think the change below to fix dev_deactivate which Eric D. missed will fix this problem. Now to go test that. Signed-off-by: David S. Miller <davem@davemloft.net>
* | net_sched: reduce fifo qdisc sizeEric Dumazet2011-03-032-30/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Because of various alignements [SLUB / qdisc], we use 512 bytes of memory for one {p|b}fifo qdisc, instead of 256 bytes on 64bit arches and 192 bytes on 32bit ones. Move the "u32 limit" inside "struct Qdisc" (no impact on other qdiscs) Change qdisc_alloc(), first trying a regular allocation before an oversized one. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | sched: protocol only needed when CONFIG_NET_CLS_ACT is enabledHagen Paul Pfeifer2011-02-251-2/+2
| | | | | | | | | | Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | sch_netem: Need to include vmalloc.hDavid S. Miller2011-02-241-0/+1
| | | | | | | | Signed-off-by: David S. Miller <davem@davemloft.net>
* | sch_choke: add choke_skb_cbEric Dumazet2011-02-241-2/+13
| | | | | | | | | | | | | | | | | | | | | | | | Better document choke skb->cb[] use, like we did in netem and sfb This adds a compile time check to make sure we dont exhaust skb->cb[] space. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Stephen Hemminger <shemminger@vyatta.com> CC: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | netem: update version and cleanupstephen hemminger2011-02-241-6/+4
| | | | | | | | | | | | | | | | Get rid of debug message that are not useful, and enable the log messages in case of error. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | netem: revised correlated loss generatorstephen hemminger2011-02-241-4/+270
| | | | | | | | | | | | | | | | | | | | | | This is a patch originated with Stefano Salsano and Fabio Ludovici. It provides several alternative loss models for use with netem. This patch adds two state machine based loss models. See: http://netgroup.uniroma2.it/twiki/bin/view.cgi/Main/NetemCLG Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Revert "sch_netem: Remove classful functionality"stephen hemminger2011-02-241-8/+79
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Many users have wanted the old functionality that was lost to be able to use pfifo as inner qdisc for netem. The reason that netem could not be classful with the older API was because of the limitations of the old dequeue/requeue interface; now that qdisc API has a peek function, there is no longer a problem with using any inner qdisc's. This reverts commit 02201464119334690fe209849843881b8e9cfa9f. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | netem: define NETEM_DIST_MAXstephen hemminger2011-02-241-1/+1
| | | | | | | | | | | | | | | | | | Rather than magic constant in code, expose the maximum size of packet distribution table in API. In iproute2, q_netem defines MAX_DIST as 16K already. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | netem: use vmalloc for distribution tablestephen hemminger2011-02-241-4/+18
| | | | | | | | | | | | | | | | The netem probability table can be large (up to 64K bytes) which may be too large to allocate in one contiguous chunk. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | netem: cleanup dump codestephen hemminger2011-02-241-6/+3
| | | | | | | | | | | | | | Use nla_put_nested to update netlink attribute value. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | em_meta: fix sparse warningstephen hemminger2011-02-231-1/+1
| | | | | | | | | | | | | | gfp_t needs to be cast to integer. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | mqprio: cleanupsstephen hemminger2011-02-231-2/+4
| | | | | | | | | | | | | | | | | | * make qdisc_ops local * add sparse annotation about expected unlock/unlock in dump_class_stats * fix indentation Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net_sched: SFB flow schedulerEric Dumazet2011-02-233-0/+721
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the Stochastic Fair Blue scheduler, based on work from : W. Feng, D. Kandlur, D. Saha, K. Shin. Blue: A New Class of Active Queue Management Algorithms. U. Michigan CSE-TR-387-99, April 1999. http://www.thefengs.com/wuchang/blue/CSE-TR-387-99.pdf This implementation is based on work done by Juliusz Chroboczek General SFB algorithm can be found in figure 14, page 15: B[l][n] : L x N array of bins (L levels, N bins per level) enqueue() Calculate hash function values h{0}, h{1}, .. h{L-1} Update bins at each level for i = 0 to L - 1 if (B[i][h{i}].qlen > bin_size) B[i][h{i}].p_mark += p_increment; else if (B[i][h{i}].qlen == 0) B[i][h{i}].p_mark -= p_decrement; p_min = min(B[0][h{0}].p_mark ... B[L-1][h{L-1}].p_mark); if (p_min == 1.0) ratelimit(); else mark/drop with probabilty p_min; I did the adaptation of Juliusz code to meet current kernel standards, and various changes to address previous comments : http://thread.gmane.org/gmane.linux.network/90225 http://thread.gmane.org/gmane.linux.network/90375 Default flow classifier is the rxhash introduced by RPS in 2.6.35, but we can use an external flow classifier if wanted. tc qdisc add dev $DEV parent 1:11 handle 11: \ est 0.5sec 2sec sfb limit 128 tc filter add dev $DEV protocol ip parent 11: handle 3 \ flow hash keys dst divisor 1024 Notes: 1) SFB default child qdisc is pfifo_fast. It can be changed by another qdisc but a child qdisc MUST not drop a packet previously queued. This is because SFB needs to handle a dequeued packet in order to maintain its virtual queue states. pfifo_head_drop or CHOKe should not be used. 2) ECN is enabled by default, unlike RED/CHOKe/GRED With help from Patrick McHardy & Andi Kleen Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Juliusz Chroboczek <Juliusz.Chroboczek@pps.jussieu.fr> CC: Stephen Hemminger <shemminger@vyatta.com> CC: Patrick McHardy <kaber@trash.net> CC: Andi Kleen <andi@firstfloor.org> CC: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | cls_u32: fix sparse warningsstephen hemminger2011-02-221-6/+6
| | | | | | | | | | | | | | | | The variable _data is used in asm-generic to define sections which causes sparse warnings, so just rename the variable. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | sch_mqprio: Always set num_tc to 0 in mqprio_destroy()Ben Hutchings2011-02-141-7/+7
| | | | | | | | | | | | | | | | | | All the cleanup code in mqprio_destroy() is currently conditional on priv->qdiscs being non-null, but that condition should only apply to the per-queue qdisc cleanup. We should always set the number of traffic classes back to 0 here. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
* | sch_choke: Need linux/vmalloc.hDavid S. Miller2011-02-021-0/+1
| | | | | | | | Signed-off-by: David S. Miller <davem@davemloft.net>
* | sched: CHOKe flow schedulerstephen hemminger2011-02-023-0/+689
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CHOKe ("CHOose and Kill" or "CHOose and Keep") is an alternative packet scheduler based on the Random Exponential Drop (RED) algorithm. The core idea is: For every packet arrival: Calculate Qave if (Qave < minth) Queue the new packet else Select randomly a packet from the queue if (both packets from same flow) then Drop both the packets else if (Qave > maxth) Drop packet else Admit packet with proability p (same as RED) See also: Rong Pan, Balaji Prabhakar, Konstantinos Psounis, "CHOKe: a stateless active queue management scheme for approximating fair bandwidth allocation", Proceeding of INFOCOM'2000, March 2000. Help from: Eric Dumazet <eric.dumazet@gmail.com> Patrick McHardy <kaber@trash.net> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | sfq: deadlock in error pathstephen hemminger2011-02-021-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The change to allow divisor to be a parameter (in 2.6.38-rc1) commit 817fb15dfd988d8dda916ee04fa506f0c466b9d6 introduced a possible deadlock caught by sparse. The scheduler tree lock was left locked in the case of an incorrect divisor value. Simplest fix is to move test outside of lock which also solves problem of partial update. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>