aboutsummaryrefslogtreecommitdiffstats
path: root/net/sched/sch_sfq.c
Commit message (Collapse)AuthorAgeFilesLines
* sch_sfq: fix sfq_enqueue()Eric Dumazet2011-08-011-1/+6
| | | | | | | | | | | | | | | | | | commit 8efa88540635 (sch_sfq: avoid giving spurious NET_XMIT_CN signals) forgot to call qdisc_tree_decrease_qlen() to signal upper levels that a packet (from another flow) was dropped, leading to various problems. With help from Michal Soltys and Michal Pokrywka, who did a bisection. Bugzilla ref: https://bugzilla.kernel.org/show_bug.cgi?id=39372 Debian ref: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=631945 Reported-by: Lucas Bocchi <lucas.bocchi@gmail.com> Reported-and-bisected-by: Michal Pokrywka <wolfmoon@o2.pl> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Michal Soltys <soltys@ziu.info> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* ip: introduce ip_is_fragment helper inline functionPaul Gortmaker2011-06-211-1/+1
| | | | | | | | | | | There are enough instances of this: iph->frag_off & htons(IP_MF | IP_OFFSET) that a helper function is probably warranted. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sch_sfq: fix peek() implementationEric Dumazet2011-05-251-13/+1
| | | | | | | | | | | | | | | Since commit eeaeb068f139 (sch_sfq: allow big packets and be fair), sfq_peek() can return a different skb that would be normally dequeued by sfq_dequeue() [ if current slot->allot is negative ] Use generic qdisc_peek_dequeued() instead of custom implementation, to get consistent result. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Jarek Poplawski <jarkao2@gmail.com> CC: Patrick McHardy <kaber@trash.net> CC: Jesper Dangaard Brouer <hawk@diku.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
* sch_sfq: avoid giving spurious NET_XMIT_CN signalsEric Dumazet2011-05-231-2/+6
| | | | | | | | | | | | | | | | | | | | | | While chasing a possible net_sched bug, I found that IP fragments have litle chance to pass a congestioned SFQ qdisc : - Say SFQ qdisc is full because one flow is non responsive. - ip_fragment() wants to send two fragments belonging to an idle flow. - sfq_enqueue() queues first packet, but see queue limit reached : - sfq_enqueue() drops one packet from 'big consumer', and returns NET_XMIT_CN. - ip_fragment() cancel remaining fragments. This patch restores fairness, making sure we return NET_XMIT_CN only if we dropped a packet from the same flow. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Patrick McHardy <kaber@trash.net> CC: Jarek Poplawski <jarkao2@gmail.com> CC: Jamal Hadi Salim <hadi@cyberus.ca> CC: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* inet: constify ip headers and in6_addrEric Dumazet2011-04-221-1/+1
| | | | | | | | Add const qualifiers to structs iphdr, ipv6hdr and in6_addr pointers where possible, to make code intention more obvious. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sfq: deadlock in error pathstephen hemminger2011-02-021-4/+5
| | | | | | | | | | | | | | The change to allow divisor to be a parameter (in 2.6.38-rc1) commit 817fb15dfd988d8dda916ee04fa506f0c466b9d6 introduced a possible deadlock caught by sparse. The scheduler tree lock was left locked in the case of an incorrect divisor value. Simplest fix is to move test outside of lock which also solves problem of partial update. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'master' of ↵David S. Miller2011-01-241-3/+2
|\ | | | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/sched/sch_hfsc.c net/sched/sch_htb.c net/sched/sch_tbf.c
| * net_sched: accurate bytes/packets stats/ratesEric Dumazet2011-01-201-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit 44b8288308ac9d (net_sched: pfifo_head_drop problem), we fixed a problem with pfifo_head drops that incorrectly decreased sch->bstats.bytes and sch->bstats.packets Several qdiscs (CHOKe, SFQ, pfifo_head, ...) are able to drop a previously enqueued packet, and bstats cannot be changed, so bstats/rates are not accurate (over estimated) This patch changes the qdisc_bstats updates to be done at dequeue() time instead of enqueue() time. bstats counters no longer account for dropped frames, and rates are more correct, since enqueue() bursts dont have effect on dequeue() rate. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net_sched: TCQ_F_CAN_BYPASS generalizationEric Dumazet2011-01-211-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now qdisc stab is handled before TCQ_F_CAN_BYPASS test in __dev_xmit_skb(), we can generalize TCQ_F_CAN_BYPASS to other qdiscs than pfifo_fast : pfifo, bfifo, pfifo_head_drop and sfq SFQ is special because it can have external classifiers, and in these cases, we cannot bypass queue discipline (packet could be dropped by classifier) without admin asking it, or further changes. Its worth doing this, especially for SFQ, avoiding dirtying memory in case no packets are already waiting in queue. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net_sched: sfq: allow divisor to be a parameterEric Dumazet2011-01-201-12/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SFQ currently uses a 1024 slots hash table, and its internal structure (sfq_sched_data) allocation needs order-1 page on x86_64 Allow tc command to specify a divisor value (hash table size), between 1 and 65536. If no value is provided, assume the 1024 default size. This allows admins to setup smaller (or bigger) SFQ for specific needs. This also brings back sfq_sched_data allocations to order-0 ones, saving 3KB per SFQ qdisc. Jesper uses ~55.000 SFQ in one machine, this patch should free 165 MB of memory. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Patrick McHardy <kaber@trash.net> CC: Jesper Dangaard Brouer <hawk@diku.dk> CC: Jarek Poplawski <jarkao2@gmail.com> CC: Jamal Hadi Salim <hadi@cyberus.ca> CC: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net_sched: cleanupsEric Dumazet2011-01-191-10/+8
|/ | | | | | | | | Cleanup net/sched code to current CodingStyle and practices. Reduce inline abuse Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net_sched: factorize qdisc stats handlingEric Dumazet2011-01-101-2/+1
| | | | | | | | | | | | | | | | | HTB takes into account skb is segmented in stats updates. Generalize this to all schedulers. They should use qdisc_bstats_update() helper instead of manipulating bstats.bytes and bstats.packets Add bstats_update() helper too for classes that use gnet_stats_basic_packed fields. Note : Right now, TCQ_F_CAN_BYPASS shortcurt can be taken only if no stab is setup on qdisc. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sfq: fix slot_dequeue_head()Eric Dumazet2010-12-311-2/+4
| | | | | | | | | | | slot_dequeue_head() should make sure slot skb chain is correct in both ways, or we can crash if all possible flows are in use. Jarek pointed out slot_queue_init() can now be done in sfq_init() once, instead each time a flow is setup. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sch_sfq: allow big packets and be fairEric Dumazet2010-12-311-7/+19
| | | | | | | | | | | | | | SFQ is currently 'limited' to small packets, because it uses a 15bit allotment number per flow. Introduce a scale by 8, so that we can handle full size TSO/GRO packets. Use appropriate handling to make sure allot is positive before a new packet is dequeued, so that fairness is respected. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Jarek Poplawski <jarkao2@gmail.com> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* sfq: fix sfq class stats handlingEric Dumazet2010-12-221-5/+11
| | | | | | | | | | | | | | | | | sfq_walk() runs without qdisc lock. By the time it selects a non empty hash slot and sfq_dump_class_stats() is run (with lock held), slot might have been freed : We then access q->slots[SFQ_EMPTY_SLOT], out of bounds, and crash in slot_queue_walk() On previous kernels, bug is here but out of bounds qs[SFQ_DEPTH] and allot[SFQ_DEPTH] are located in struct sfq_sched_data, so no illegal memory access happens, only possibly wrong data reported to user. Also, slot_dequeue_tail() should make sure slot skb chain is correctly terminated, or sfq_dump_class_stats() can access freed skbs. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net_sched: sch_sfq: better struct layoutsEric Dumazet2010-12-201-98/+162
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here is a respin of patch. I'll send a short patch to make SFQ more fair in presence of large packets as well. Thanks [PATCH v3 net-next-2.6] net_sched: sch_sfq: better struct layouts This patch shrinks sizeof(struct sfq_sched_data) from 0x14f8 (or more if spinlocks are bigger) to 0x1180 bytes, and reduce text size as well. text data bss dec hex filename 4821 152 0 4973 136d old/net/sched/sch_sfq.o 4627 136 0 4763 129b new/net/sched/sch_sfq.o All data for a slot/flow is now grouped in a compact and cache friendly structure, instead of being spreaded in many different points. struct sfq_slot { struct sk_buff *skblist_next; struct sk_buff *skblist_prev; sfq_index qlen; /* number of skbs in skblist */ sfq_index next; /* next slot in sfq chain */ struct sfq_head dep; /* anchor in dep[] chains */ unsigned short hash; /* hash value (index in ht[]) */ short allot; /* credit for this slot */ }; Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Jarek Poplawski <jarkao2@gmail.com> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'master' of ↵David S. Miller2010-12-201-12/+8
|\ | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
| * net_sched: sch_sfq: fix allot handlingEric Dumazet2010-12-201-12/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When deploying SFQ/IFB here at work, I found the allot management was pretty wrong in sfq, even changing allot from short to int... We should init allot for each new flow, not using a previous value found in slot. Before patch, I saw bursts of several packets per flow, apparently denying the default "quantum 1514" limit I had on my SFQ class. class sfq 11:1 parent 11: (dropped 0, overlimits 0 requeues 0) backlog 0b 7p requeues 0 allot 11546 class sfq 11:46 parent 11: (dropped 0, overlimits 0 requeues 0) backlog 0b 1p requeues 0 allot -23873 class sfq 11:78 parent 11: (dropped 0, overlimits 0 requeues 0) backlog 0b 5p requeues 0 allot 11393 After patch, better fairness among each flow, allot limit being respected, allot is positive : class sfq 11:e parent 11: (dropped 0, overlimits 0 requeues 86) backlog 0b 3p requeues 86 allot 596 class sfq 11:94 parent 11: (dropped 0, overlimits 0 requeues 0) backlog 0b 3p requeues 0 allot 1468 class sfq 11:a4 parent 11: (dropped 0, overlimits 0 requeues 0) backlog 0b 4p requeues 0 allot 650 class sfq 11:bb parent 11: (dropped 0, overlimits 0 requeues 0) backlog 0b 3p requeues 0 allot 596 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net_sched: sch_sfq: add backlog info in sfq_dump_class_stats()Eric Dumazet2010-12-201-1/+6
|/ | | | | | | | | | | | | | | | | | | | | | | | We currently return for each active SFQ slot the number of packets in queue. We can also give number of bytes accounted for these packets. tc -s class show dev ifb0 Before patch : class sfq 11:3d9 parent 11: (dropped 0, overlimits 0 requeues 0) backlog 0b 3p requeues 0 allot 1266 After patch : class sfq 11:3e4 parent 11: (dropped 0, overlimits 0 requeues 0) backlog 4380b 3p requeues 0 allot 1212 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net_sched: sch_sfq: use proto_ports_offset() to support AH messageChangli Gao2010-08-191-17/+16
| | | | | Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* pkt_sched: Fix sch_sfq vs tc_modify_qdisc oopsJarek Poplawski2010-08-111-0/+6
| | | | | | | | | | sch_sfq as a classful qdisc needs the .leaf handler. Otherwise, there is an oops possible in tc_modify_qdisc()/check_loop(). Fixes commit 7d2681a6ff4f9ab5e48d02550b4c6338f1638998 Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Use NET_XMIT_SUCCESS where possible.Ben Greear2010-08-101-1/+1
| | | | | | | This is based on work originally done by Patric McHardy. Signed-off-by: Ben Greear <greearb@candelatech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* pkt_sched: sch_sfq: Add dummy unbind_tcf and put handles. Was: [PATCH] sfq: ↵Jarek Poplawski2010-08-101-0/+6
| | | | | | | | | | add dummy bind/unbind handles Add dummy .unbind_tcf and .put qdisc class ops for easier verification. (All other schedulers have it like this.) Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* pkt_sched: Fix sch_sfq vs tcf_bind_filter oopsJarek Poplawski2010-08-071-0/+7
| | | | | | | | | | | | | Since there was added ->tcf_chain() method without ->bind_tcf() to sch_sfq class options, there is oops when a filter is added with the classid parameter. Fixes commit 7d2681a6ff4f9ab5e48d02550b4c6338f1638998 netdev thread: null pointer at cls_api.c Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Reported-by: Franchoze Eric <franchoze@yandex.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
* sch_sfq: add sanity check for the packet lengthChangli Gao2010-08-041-9/+20
| | | | | | | The packet length should be checked before the packet data is dereferenced. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Fix various endianness glitchesEric Dumazet2010-04-201-5/+5
| | | | | | | | Sparse can help us find endianness bugs, but we need to make some cleanups to be able to more easily spot real bugs. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo2010-03-301-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
* net_sched: make cls_ops->change and cls_ops->delete optionalPatrick McHardy2009-09-061-7/+0
| | | | | | | | | | | | | | | | | | | | Some schedulers don't support creating, changing or deleting classes. Make the respective callbacks optionally and consistently return -EOPNOTSUPP for unsupported operations, instead of currently either -EOPNOTSUPP, -ENOSYS or no error. In case of sch_prio and sch_multiq, the removed operations additionally checked for an invalid class. This is not necessary since the class argument can only orginate from ->get() or in case of ->change is 0 for creation of new classes, in which case ->change() incorrectly returned -ENOENT. As a side-effect, this patch fixes a possible (root-only) NULL pointer function call in sch_ingress, which didn't implement a so far mandatory ->delete() operation. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: skb->dst accessorsEric Dumazet2009-06-031-1/+1
| | | | | | | | | | | | | | | | | | Define three accessors to get/set dst attached to a skb struct dst_entry *skb_dst(const struct sk_buff *skb) void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst) void skb_dst_drop(struct sk_buff *skb) This one should replace occurrences of : dst_release(skb->dst) skb->dst = NULL; Delete skb->dst field Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* remove lots of double-semicolonsFernando Carrijo2009-01-081-1/+1
| | | | | | | | | | | | | Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Acked-by: Theodore Ts'o <tytso@mit.edu> Acked-by: Mark Fasheh <mfasheh@suse.com> Acked-by: David S. Miller <davem@davemloft.net> Cc: James Morris <jmorris@namei.org> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Acked-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* pkt_sched: Annotate uninitialized var in sfq_enqueue()Jarek Poplawski2008-12-211-1/+1
| | | | | | | | Some gcc versions warn that ret may be used uninitialized in sfq_enqueue(). It's a false positive, so let's annotate this. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* pkt_sched: Remove qdisc->ops->requeue() etc.Jarek Poplawski2008-11-131-63/+0
| | | | | | | | | | | | | | After implementing qdisc->ops->peek() and changing sch_netem into classless qdisc there are no more qdisc->ops->requeue() users. This patch removes this method with its wrappers (qdisc_requeue()), and also unused qdisc->requeue structure. There are a few minor fixes of warnings (htb_enqueue()) and comments btw. The idea to kill ->requeue() and a similar patch were first developed by David S. Miller. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* pkt_sched: Add ->peek() methods for fifo, prio and SFQ qdiscs.Patrick McHardy2008-10-311-0/+12
| | | | | | | | | | | | | | | From: Patrick McHardy <kaber@trash.net> Just as a demonstration how easy adding a peek operation to the work-conserving qdiscs actually is. It doesn't need to keep or change any internal state in many cases thanks to the guarantee that the packet will either be dequeued or, if another packet arrives, the upper qdisc will immediately ->peek again to reevaluate the state. (This is only slightly modified Patrick's patch.) Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Use hton[sl]() instead of __constant_hton[sl]() where applicableArnaldo Carvalho de Melo2008-09-201-2/+2
| | | | | Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net_sched: Add qdisc __NET_XMIT_BYPASS flagJarek Poplawski2008-08-041-3/+3
| | | | | | | | | | | | Patrick McHardy <kaber@trash.net> noticed that it would be nice to handle NET_XMIT_BYPASS by NET_XMIT_SUCCESS with an internal qdisc flag __NET_XMIT_BYPASS and to remove the mapping from dev_queue_xmit(). David Miller <davem@davemloft.net> spotted a serious bug in the first version of this patch. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net_sched: Add qdisc __NET_XMIT_STOLEN flagJarek Poplawski2008-08-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patrick McHardy <kaber@trash.net> noticed: "The other problem that affects all qdiscs supporting actions is TC_ACT_QUEUED/TC_ACT_STOLEN getting mapped to NET_XMIT_SUCCESS even though the packet is not queued, corrupting upper qdiscs' qlen counters." and later explained: "The reason why it translates it at all seems to be to not increase the drops counter. Within a single qdisc this could be avoided by other means easily, upper qdiscs would still increase the counter when we return anything besides NET_XMIT_SUCCESS though. This means we need a new NET_XMIT return value to indicate this to the upper qdiscs. So I'd suggest to introduce NET_XMIT_STOLEN, return that to upper qdiscs and translate it to NET_XMIT_SUCCESS in dev_queue_xmit, similar to NET_XMIT_BYPASS." David Miller <davem@davemloft.net> noticed: "Maybe these NET_XMIT_* values being passed around should be a set of bits. They could be composed of base meanings, combined with specific attributes. So you could say "NET_XMIT_DROP | __NET_XMIT_NO_DROP_COUNT" The attributes get masked out by the top-level ->enqueue() caller, such that the base meanings are the only thing that make their way up into the stack. If it's only about communication within the qdisc tree, let's simply code it that way." This patch is trying to realize these ideas. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Revert "pkt_sched: sch_sfq: dump a real number of flows"David S. Miller2008-07-261-8/+1
| | | | | | | | | | | This reverts commit f867e6af94239a04ec23aeec2fcda5aa58e41db7. Based upon discussions between Jarek and Patrick McHardy this is field being set is more a config parameter than a statistic. And we should add a true statistic to provide this information if we really want it. Signed-off-by: David S. Miller <davem@davemloft.net>
* pkt_sched: sch_sfq: dump a real number of flowsJarek Poplawski2008-07-231-1/+8
| | | | | | | | | Dump the "flows" number according to the number of active flows instead of repeating the "limit". Reported-by: Denys Fedoryshchenko <denys@visp.net.lb> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net_sched: Add accessor function for packet length for qdiscsJussi Kivilinna2008-07-201-8/+8
| | | | | Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
* pkt_sched: Remove 'dev' member of struct Qdisc.David S. Miller2008-07-081-2/+2
| | | | | | | | | It can be obtained via the netdev_queue. So create a helper routine, qdisc_dev(), to make the transformations nicer looking. Now, qdisc_alloc() now no longer needs a net_device pointer argument. Signed-off-by: David S. Miller <davem@davemloft.net>
* net-sched: change tcf_destroy_chain() to clear start of filter listPatrick McHardy2008-07-011-1/+1
| | | | | | | | Pass double tcf_proto pointers to tcf_destroy_chain() to make it clear the start of the filter list for more consistency. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* sch_sfq: use del_timer_sync() in sfq_destroy()Jarek Poplawski2008-04-291-1/+2
| | | | | | | Let's delete timer reliably in sfq_destroy(). Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET_SCHED]: sch_sfq: make internal queues visible as classesPatrick McHardy2008-01-311-1/+40
| | | | | | | | Add support for dumping statistics and make internal queues visible as classes. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET_SCHED]: sch_sfq: add support for external classifiersPatrick McHardy2008-01-311-4/+91
| | | | | | | | | Add support for external classifiers to allow using different flow hash functions similar to ESFQ. When no classifier is attached the built-in hash is used as before. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET_SCHED]: Convert packet schedulers from rtnetlink to new netlink APIPatrick McHardy2008-01-281-6/+6
| | | | | | | | | | | Convert packet schedulers to use the netlink API. Unfortunately a gradual conversion is not possible without breaking compilation in the middle or adding lots of casts, so this patch converts them all in one step. The patch has been mostly generated automatically with some minor edits to at least allow seperate conversion of classifiers and actions. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [PKT_SCHED] SFQ: whitespace cleanupStephen Hemminger2008-01-281-17/+25
| | | | | | | | Add whitespace around operators, and add a few blank lines to improve readability. Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [PKT_SCHED] SFQ: use net_randomStephen Hemminger2008-01-281-3/+3
| | | | | | | | | | SFQ doesn't need true random numbers, it is only using them to salt a hash. Therefore it is better to use net_random() and avoid any possible problems with depleting the entropy pool. Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [PKT_SCHED] SFQ: timer is deferrableStephen Hemminger2008-01-281-1/+3
| | | | | | | | | The perturbation timer used for re-keying can be deferred, it doesn't need to be deterministic. Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Move Qdisc_class_ops and Qdisc_ops in appropriate sections.Eric Dumazet2008-01-281-1/+1
| | | | | | | | | | Qdisc_class_ops are const, and Qdisc_ops are mostly read. Using "const" and "__read_mostly" qualifiers helps to reduce false sharing. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Convert init_timer into setup_timerPavel Emelyanov2008-01-281-3/+1
| | | | | | | | | | | | | Many-many code in the kernel initialized the timer->function and timer->data together with calling init_timer(timer). There is already a helper for this. Use it for networking code. The patch is HUGE, but makes the code 130 lines shorter (98 insertions(+), 228 deletions(-)). Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>