aboutsummaryrefslogtreecommitdiffstats
path: root/net/core/neighbour.c
Commit message (Collapse)AuthorAgeFilesLines
* merged 3.0.101 tagWolfgang Wiedmeyer2015-10-221-5/+7
|
* rtnetlink: Compute and store minimum ifinfo dump sizeGreg Rose2013-01-171-5/+6
| | | | | | | | | | | | | | | | | | | commit c7ac8679bec9397afe8918f788cbcef88c38da54 upstream. The message size allocated for rtnl ifinfo dumps was limited to a single page. This is not enough for additional interface info available with devices that support SR-IOV and caused a bug in which VF info would not be displayed if more than approximately 40 VFs were created per interface. Implement a new function pointer for the rtnl_register service that will calculate the amount of data required for the ifinfo dump and allocate enough data to satisfy the request. Signed-off-by: Greg Rose <gregory.v.rose@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* net: Fix skb_under_panic oops in neigh_resolve_outputramesh.nagappa@gmail.com2012-10-281-4/+2
| | | | | | | | | | | | | | | | [ Upstream commit e1f165032c8bade3a6bdf546f8faf61fda4dd01c ] The retry loop in neigh_resolve_output() and neigh_connected_output() call dev_hard_header() with out reseting the skb to network_header. This causes the retry to fail with skb_under_panic. The fix is to reset the network_header within the retry loop. Signed-off-by: Ramesh Nagappa <ramesh.nagappa@ericsson.com> Reviewed-by: Shawn Lu <shawn.lu@ericsson.com> Reviewed-by: Robert Coulson <robert.coulson@ericsson.com> Reviewed-by: Billie Alsup <billie.alsup@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* neighbour: Fixed race condition at tbl->nhtMichel Machado2012-03-191-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 84338a6c9dbb6ff3de4749864020f8f25d86fc81 ] When the fixed race condition happens: 1. While function neigh_periodic_work scans the neighbor hash table pointed by field tbl->nht, it unlocks and locks tbl->lock between buckets in order to call cond_resched. 2. Assume that function neigh_periodic_work calls cond_resched, that is, the lock tbl->lock is available, and function neigh_hash_grow runs. 3. Once function neigh_hash_grow finishes, and RCU calls neigh_hash_free_rcu, the original struct neigh_hash_table that function neigh_periodic_work was using doesn't exist anymore. 4. Once back at neigh_periodic_work, whenever the old struct neigh_hash_table is accessed, things can go badly. Signed-off-by: Michel Machado <michel@digirati.com.br> CC: "David S. Miller" <davem@davemloft.net> CC: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* net: fix NULL dereferences in check_peer_redir()Eric Dumazet2012-02-131-7/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit d3aaeb38c40e5a6c08dd31a1b64da65c4352be36, along with dependent backports of commits: 69cce1d1404968f78b177a0314f5822d5afdbbfb 9de79c127cccecb11ae6a21ab1499e87aa222880 218fa90f072e4aeff9003d57e390857f4f35513e 580da35a31f91a594f3090b7a2c39b85cb051a12 f7e57044eeb1841847c24aa06766c8290c202583 e049f28883126c689cf95859480d9ee4ab23b7fa ] Gergely Kalman reported crashes in check_peer_redir(). It appears commit f39925dbde778 (ipv4: Cache learned redirect information in inetpeer.) added a race, leading to possible NULL ptr dereference. Since we can now change dst neighbour, we should make sure a reader can safely use a neighbour. Add RCU protection to dst neighbour, and make sure check_peer_redir() can be called safely by different cpus in parallel. As neighbours are already freed after one RCU grace period, this patch should not add typical RCU penalty (cache cold effects) Many thanks to Gergely for providing a pretty report pointing to the bug. Reported-by: Gergely Kalman <synapse@hippy.csoma.elte.hu> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* arp: fix rcu lockdep splat in arp_process()Eric Dumazet2011-10-031-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 20e6074eb8e096b3a595c093d1cb222f378cd671 ] Dave Jones reported a lockdep splat triggered by an arp_process() call from parp_redo(). Commit faa9dcf793be (arp: RCU changes) is the origin of the bug, since it assumed arp_process() was called under rcu_read_lock(), which is not true in this particular path. Instead of adding rcu_read_lock() in parp_redo(), I chose to add it in neigh_proxy_process() to take care of IPv6 side too. =================================================== [ INFO: suspicious rcu_dereference_check() usage. ] --------------------------------------------------- include/linux/inetdevice.h:209 invoked rcu_dereference_check() without protection! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 4 locks held by setfiles/2123: #0: (&sb->s_type->i_mutex_key#13){+.+.+.}, at: [<ffffffff8114cbc4>] walk_component+0x1ef/0x3e8 #1: (&isec->lock){+.+.+.}, at: [<ffffffff81204bca>] inode_doinit_with_dentry+0x3f/0x41f #2: (&tbl->proxy_timer){+.-...}, at: [<ffffffff8106a803>] run_timer_softirq+0x157/0x372 #3: (class){+.-...}, at: [<ffffffff8141f256>] neigh_proxy_process +0x36/0x103 stack backtrace: Pid: 2123, comm: setfiles Tainted: G W 3.1.0-0.rc2.git7.2.fc16.x86_64 #1 Call Trace: <IRQ> [<ffffffff8108ca23>] lockdep_rcu_dereference+0xa7/0xaf [<ffffffff8146a0b7>] __in_dev_get_rcu+0x55/0x5d [<ffffffff8146a751>] arp_process+0x25/0x4d7 [<ffffffff8146ac11>] parp_redo+0xe/0x10 [<ffffffff8141f2ba>] neigh_proxy_process+0x9a/0x103 [<ffffffff8106a8c4>] run_timer_softirq+0x218/0x372 [<ffffffff8106a803>] ? run_timer_softirq+0x157/0x372 [<ffffffff8141f220>] ? neigh_stat_seq_open+0x41/0x41 [<ffffffff8108f2f0>] ? mark_held_locks+0x6d/0x95 [<ffffffff81062bb6>] __do_softirq+0x112/0x25a [<ffffffff8150d27c>] call_softirq+0x1c/0x30 [<ffffffff81010bf5>] do_softirq+0x4b/0xa2 [<ffffffff81062f65>] irq_exit+0x5d/0xcf [<ffffffff8150dc11>] smp_apic_timer_interrupt+0x7c/0x8a [<ffffffff8150baf3>] apic_timer_interrupt+0x73/0x80 <EOI> [<ffffffff8108f439>] ? trace_hardirqs_on_caller+0x121/0x158 [<ffffffff814fc285>] ? __slab_free+0x30/0x24c [<ffffffff814fc283>] ? __slab_free+0x2e/0x24c [<ffffffff81204e74>] ? inode_doinit_with_dentry+0x2e9/0x41f [<ffffffff81204e74>] ? inode_doinit_with_dentry+0x2e9/0x41f [<ffffffff81204e74>] ? inode_doinit_with_dentry+0x2e9/0x41f [<ffffffff81130cb0>] kfree+0x108/0x131 [<ffffffff81204e74>] inode_doinit_with_dentry+0x2e9/0x41f [<ffffffff81204fc6>] selinux_d_instantiate+0x1c/0x1e [<ffffffff81200f4f>] security_d_instantiate+0x21/0x23 [<ffffffff81154625>] d_instantiate+0x5c/0x61 [<ffffffff811563ca>] d_splice_alias+0xbc/0xd2 [<ffffffff811b17ff>] ext4_lookup+0xba/0xeb [<ffffffff8114bf1e>] d_alloc_and_lookup+0x45/0x6b [<ffffffff8114cbea>] walk_component+0x215/0x3e8 [<ffffffff8114cdf8>] lookup_last+0x3b/0x3d [<ffffffff8114daf3>] path_lookupat+0x82/0x2af [<ffffffff8110fc53>] ? might_fault+0xa5/0xac [<ffffffff8110fc0a>] ? might_fault+0x5c/0xac [<ffffffff8114c564>] ? getname_flags+0x31/0x1ca [<ffffffff8114dd48>] do_path_lookup+0x28/0x97 [<ffffffff8114df2c>] user_path_at+0x59/0x96 [<ffffffff811467ad>] ? cp_new_stat+0xf7/0x10d [<ffffffff811469a6>] vfs_fstatat+0x44/0x6e [<ffffffff811469ee>] vfs_lstat+0x1e/0x20 [<ffffffff81146b3d>] sys_newlstat+0x1a/0x33 [<ffffffff8108f439>] ? trace_hardirqs_on_caller+0x121/0x158 [<ffffffff812535fe>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff8150af82>] system_call_fastpath+0x16/0x1b Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* neigh: __rcu annotationsEric Dumazet2011-01-201-6/+7
| | | | | | | fix some minor issues and sparse (__rcu) warnings Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: kill unused macrosShan Wei2010-12-191-1/+0
| | | | | | | These macros never be used, so remove them. Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/neighbour: cancel_delayed_work() + flush_scheduled_work() -> ↵Tejun Heo2010-10-211-2/+1
| | | | | | | | | cancel_delayed_work_sync() flush_scheduled_work() is going away. Prepare for it. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* neigh: Protect neigh->ha[] with a seqlockEric Dumazet2010-10-111-17/+30
| | | | | | | | | | | | | Add a seqlock in struct neighbour to protect neigh->ha[], and avoid dirtying neighbour in stress situation (many different flows / dsts) Dirtying takes place because of read_lock(&n->lock) and n->used writes. Switching to a seqlock, and writing n->used only on jiffies changes permits less dirtying. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* neigh: speedup neigh_hh_init()Eric Dumazet2010-10-111-38/+61
| | | | | | | | | | | | | | | | | | | | | | | | | When a new dst is used to send a frame, neigh_resolve_output() tries to associate an struct hh_cache to this dst, calling neigh_hh_init() with the neigh rwlock write locked. Most of the time, hh_cache is already known and linked into neighbour, so we find it and increment its refcount. This patch changes the logic so that we call neigh_hh_init() with neighbour lock read locked only, so that fast path can be run in parallel by concurrent cpus. This brings part of the speedup we got with commit c7d4426a98a5f (introduce DST_NOCACHE flag) for non cached dsts, even for cached ones, removing one of the contention point that routers hit on multiqueue enabled machines. Further improvements would need to use a seqlock instead of an rwlock to protect neigh->ha[], to not dirty neigh too often and remove two atomic ops. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* neigh: RCU conversion of struct neighbourEric Dumazet2010-10-061-52/+85
| | | | | | | | | | | | | | | | | | | This is the second step for neighbour RCU conversion. (first was commit d6bf7817 : RCU conversion of neigh hash table) neigh_lookup() becomes lockless, but still take a reference on found neighbour. (no more read_lock()/read_unlock() on tbl->lock) struct neighbour gets an additional rcu_head field and is freed after an RCU grace period. Future work would need to eventually not take a reference on neighbour for temporary dst (DST_NOCACHE), but this would need dst->_neighbour to use a noref bit like we did for skb->_dst. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net neigh: RCU conversion of neigh hash tableEric Dumazet2010-10-051-82/+137
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | David This is the first step for RCU conversion of neigh code. Next patches will convert hash_buckets[] and "struct neighbour" to RCU protected objects. Thanks [PATCH net-next] net neigh: RCU conversion of neigh hash table Instead of storing hash_buckets, hash_mask and hash_rnd in "struct neigh_table", a new structure is defined : struct neigh_hash_table { struct neighbour **hash_buckets; unsigned int hash_mask; __u32 hash_rnd; struct rcu_head rcu; }; And "struct neigh_table" has an RCU protected pointer to such a neigh_hash_table. This means the signature of (*hash)() function changed: We need to add a third parameter with the actual hash_rnd value, since this is not anymore a neigh_table field. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net neigh: neigh_delete() and neigh_add() changesEric Dumazet2010-10-051-22/+17
| | | | | | | | neigh_delete() and neigh_add() dont need to touch device refcount, we hold RTNL when calling them, so device cannot disappear under us. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: introduce DST_NOCACHE flagEric Dumazet2010-10-031-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While doing stress tests with IP route cache disabled, and multi queue devices, I noticed a very high contention on one rwlock used in neighbour code. When many cpus are trying to send frames (possibly using a high performance multiqueue device) to the same neighbour, they fight for the neigh->lock rwlock in order to call neigh_hh_init(), and fight on hh->hh_refcnt (a pair of atomic_inc/atomic_dec_and_test()) But we dont need to call neigh_hh_init() for dst that are used only once. It costs four atomic operations at least, on two contended cache lines, plus the high contention on neigh->lock rwlock. Introduce a new dst flag, DST_NOCACHE, that is set when dst was not inserted in route cache. With the stress test bench, sending 160000000 frames on one neighbour, results are : Before patch: real 2m28.406s user 0m11.781s sys 36m17.964s After patch: real 1m26.532s user 0m12.185s sys 20m3.903s Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: return operator cleanupEric Dumazet2010-09-231-3/+3
| | | | | | | | | Change "return (EXPR);" to "return EXPR;" return is not a function, parentheses are not required. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/core: neighbour update OopsDoug Kehn2010-07-141-1/+4
| | | | | | | | | | | | | | | | | | | | When configuring DMVPN (GRE + openNHRP) and a GRE remote address is configured a kernel Oops is observed. The obserseved Oops is caused by a NULL header_ops pointer (neigh->dev->header_ops) in neigh_update_hhs() when void (*update)(struct hh_cache*, const struct net_device*, const unsigned char *) = neigh->dev->header_ops->cache_update; is executed. The dev associated with the NULL header_ops is the GRE interface. This patch guards against the possibility that header_ops is NULL. This Oops was first observed in kernel version 2.6.26.8. Signed-off-by: Doug Kehn <rdkehn@yahoo.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: fix __neigh_event_send()Eric Dumazet2010-05-281-0/+1
| | | | | | | | | | | | | | commit 7fee226ad23 (net: add a noref bit on skb dst) missed one spot where an skb is enqueued, with a possibly not refcounted dst entry. __neigh_event_send() inserts skb into arp_queue, so we must make sure dst entry is refcounted, or dst entry can be freed by garbage collector after caller exits from rcu protected section. Reported-by: Ingo Molnar <mingo@elte.hu> Tested-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo2010-03-301-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
* net: Annotates neigh_invalidate()Eric Dumazet2010-03-101-0/+2
| | | | | | | | Annotates neigh_invalidate() with __releases() and __acquires() for sparse sake. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net neigh: Decouple per interface neighbour table controls from binary sysctlsEric W. Biederman2010-02-161-3/+4
| | | | | | | | | | | | | | Stop computing the number of neighbour table settings we have by counting the number of binary sysctls. This behaviour was silly and meant that we could not add another neighbour table setting without also adding another binary sysctl. Don't pass the binary sysctl path for neighour table entries into neigh_sysctl_register. These parameters are no longer used and so are just dead code. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* neigh: simplify seq_file codeAlexey Dobriyan2010-01-231-7/+4
| | | | | | | | Simpily pass 'struct neigh_table' with seq_file private pointer, and save one dereference. Proc entry itself isn't interesting. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6Linus Torvalds2009-12-081-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1815 commits) mac80211: fix reorder buffer release iwmc3200wifi: Enable wimax core through module parameter iwmc3200wifi: Add wifi-wimax coexistence mode as a module parameter iwmc3200wifi: Coex table command does not expect a response iwmc3200wifi: Update wiwi priority table iwlwifi: driver version track kernel version iwlwifi: indicate uCode type when fail dump error/event log iwl3945: remove duplicated event logging code b43: fix two warnings ipw2100: fix rebooting hang with driver loaded cfg80211: indent regulatory messages with spaces iwmc3200wifi: fix NULL pointer dereference in pmkid update mac80211: Fix TX status reporting for injected data frames ath9k: enable 2GHz band only if the device supports it airo: Fix integer overflow warning rt2x00: Fix padding bug on L2PAD devices. WE: Fix set events not propagated b43legacy: avoid PPC fault during resume b43: avoid PPC fault during resume tcp: fix a timewait refcnt race ... Fix up conflicts due to sysctl cleanups (dead sysctl_check code and CTL_UNNUMBERED removed) in kernel/sysctl_check.c net/ipv4/sysctl_net_ipv4.c net/ipv6/addrconf.c net/sctp/sysctl.c
| * net: use net_eq to compare netsOctavian Purdila2009-11-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Generated with the following semantic patch @@ struct net *n1; struct net *n2; @@ - n1 == n2 + net_eq(n1, n2) @@ struct net *n1; struct net *n2; @@ - n1 != n2 + !net_eq(n1, n2) applied over {include,net,drivers/net}. Signed-off-by: Octavian Purdila <opurdila@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | sysctl net: Remove unused binary sysctl codeEric W. Biederman2009-11-121-41/+6
|/ | | | | | | | | | | | | | | | Now that sys_sysctl is a compatiblity wrapper around /proc/sys all sysctl strategy routines, and all ctl_name and strategy entries in the sysctl tables are unused, and can be revmoed. In addition neigh_sysctl_register has been modified to no longer take a strategy argument and it's callers have been modified not to pass one. Cc: "David Miller" <davem@davemloft.net> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: netdev@vger.kernel.org Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
* neigh: Convert garbage collection from softirq to workqueueEric Dumazet2009-08-021-46/+43
| | | | | | | | | | | | | | | | | | Current neigh_periodic_timer() function is fired by timer IRQ, and scans one hash bucket each round (very litle work in fact) As we are supposed to scan whole hash table in 15 seconds, this means neigh_periodic_timer() can be fired very often. (depending on the number of concurrent hash entries we stored in this table) Converting this to a workqueue permits scanning whole table, minimizing icache pollution, and firing this work every 15 seconds, independantly of hash table size. This 15 seconds delay is not a hard number, as work is a deferrable one. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Rename lookup_neigh_params functionTobias Klauser2009-07-131-3/+3
| | | | | | | | | Rename lookup_neigh_params to lookup_neigh_parms as the struct is named neigh_parms and all other functions dealing with the struct carry neigh_parms in their names. Signed-off-by: Tobias Klauser <klto@zhaw.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
* neigh: fix state transition INCOMPLETE->FAILED via Netlink requestTimo Teras2009-06-111-18/+28
| | | | | | | | | | | | | | | | | The current code errors out the INCOMPLETE neigh entry skb queue only from the timer if maximum probes have been attempted and there has been no reply. This also causes the transtion to FAILED state. However, the neigh entry can be also updated via Netlink to inform that the address is unavailable. Currently, neigh_update() just stops the timers and leaves the pending skb's unreleased. This results that the clean up code in the timer callback is never called, preventing also proper garbage collection. This fixes neigh_update() to process the pending skb queue immediately if INCOMPLETE -> FAILED state transtion occurs due to a Netlink request. Signed-off-by: Timo Teras <timo.teras@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: skb->dst accessorsEric Dumazet2009-06-031-6/+5
| | | | | | | | | | | | | | | | | | Define three accessors to get/set dst attached to a skb struct dst_entry *skb_dst(const struct sk_buff *skb) void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst) void skb_dst_drop(struct sk_buff *skb) This one should replace occurrences of : dst_release(skb->dst) skb->dst = NULL; Delete skb->dst field Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* neigh: Allow for user space users of the neighbour tableEric Biederman2009-03-041-1/+5
| | | | | | | | | | | | Currently it is possible to do just about everything with the arp table from user space except treat an entry like you are using it. To that end implement and a flag NTF_USE that when set in a netwlink update request treats the neighbour table entry like the kernel does on the output path. This allows user space applications to share the kernel's arp cache. Signed-off-by: Eric Biederman <ebiederm@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* core: remove some pointless conditionals before kfree_skb()Wei Yongjun2009-02-261-4/+2
| | | | | | | Remove some pointless conditionals before kfree_skb(). Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* netlink: change nlmsg_notify() return value logicPablo Neira Ayuso2009-02-241-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch changes the return value of nlmsg_notify() as follows: If NETLINK_BROADCAST_ERROR is set by any of the listeners and an error in the delivery happened, return the broadcast error; else if there are no listeners apart from the socket that requested a change with the echo flag, return the result of the unicast notification. Thus, with this patch, the unicast notification is handled in the same way of a broadcast listener that has set the NETLINK_BROADCAST_ERROR socket flag. This patch is useful in case that the caller of nlmsg_notify() wants to know the result of the delivery of a netlink notification (including the broadcast delivery) and take any action in case that the delivery failed. For example, ctnetlink can drop packets if the event delivery failed to provide reliable logging and state-synchronization at the cost of dropping packets. This patch also modifies the rtnetlink code to ignore the return value of rtnl_notify() in all callers. The function rtnl_notify() (before this patch) returned the error of the unicast notification which makes rtnl_set_sk_err() reports errors to all listeners. This is not of any help since the origin of the change (the socket that requested the echoing) notices the ENOBUFS error if the notification fails and should resync itself. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* neigh: some entries can be skipped during dumpingGautam Kachroo2009-02-061-6/+8
| | | | | | | | | | | | | neightbl_dump_info and neigh_dump_table can skip entries if the *fill*info functions return an error. This results in an incomplete dump ((invoked by netlink requests for RTM_GETNEIGHTBL or RTM_GETNEIGH) nidx and idx should not be incremented if the current entry was not placed in the output buffer Signed-off-by: Gautam Kachroo <gk@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* cpumask: prepare for iterators to only go to nr_cpu_ids/nr_cpumask_bits: netRusty Russell2008-12-291-2/+2
| | | | | | | | | | | | | | In future all cpumask ops will only be valid (in general) for bit numbers < nr_cpu_ids. So use that instead of NR_CPUS in iterators and other comparisons. This is always safe: no cpu number can be >= nr_cpu_ids, and nr_cpu_ids is initialized to NR_CPUS at boot. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Mike Travis <travis@sgi.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: David S. Miller <davem@davemloft.net>
* netdev: add more functions to netdevice opsStephen Hemminger2008-11-201-3/+3
| | | | | | | | | | | | This patch moves neigh_setup and hard_start_xmit into the network device ops structure. For bisection, fix all the previously converted drivers as well. Bonding driver took the biggest hit on this. Added a prefetch of the hard_start_xmit in the fast path to try and reduce any impact this would have. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Cleanup of neighbour codeEric Dumazet2008-11-121-9/+3
| | | | | | | | Using read_pnet() and write_pnet() in neighbour code ease the reading of code. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: remove struct neigh_table::pdeAlexey Dobriyan2008-11-111-3/+2
| | | | | | | ->pde isn't actually needed, since name is stashed in ->id. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: '&' reduxAlexey Dobriyan2008-11-031-24/+24
| | | | | | | | | | | | | I want to compile out proc_* and sysctl_* handlers totally and stub them to NULL depending on config options, however usage of & will prevent this, since taking adress of NULL pointer will break compilation. So, drop & in front of every ->proc_handler and every ->strategy handler, it was never needed in fact. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: don't use INIT_RCU_HEADAlexey Dobriyan2008-10-281-2/+0
| | | | | | | | | | | | | | | | call_rcu() will unconditionally rewrite RCU head anyway. Applies to struct neigh_parms struct neigh_table struct net struct cipso_v4_doi struct in_ifaddr struct in_device rt->u.dst Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* neigh: Remove by-hand SKB queue handling.David S. Miller2008-09-231-13/+8
| | | | Signed-off-by: David S. Miller <davem@davemloft.net>
* net: fix missing pneigh entries in the neighbor seq_file codeChris Larson2008-08-031-5/+6
| | | | | | | | | | | | | | | | When pneigh entries exist, but the user's read buffer isn't sufficient to hold them all, one of the pneigh entries will be missing from the results. In neigh_get_idx_any, the number of elements which neigh_get_idx encountered is not correctly subtracted from the position number before the call to pneigh_get_idx. neigh_get_idx reduces the position by 1 for each call to neigh_get_next, but it does not reduce it by one for the first element (neigh_get_first). The patch alters the neigh_get_idx and pneigh_get_idx functions to subtract one from pos, for the first element, when pos is non-zero. Signed-off-by: Chris Larson <clarson@mvista.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: in the first call to neigh_seq_next, call neigh_get_first, not ↵Chris Larson2008-08-031-1/+1
| | | | | | | | | | | neigh_get_idx. neigh_seq_next won't be called both with *pos > 0 && v == SEQ_START_TOKEN, so there's no point calling neigh_get_idx when we're on the start token, just call neigh_get_first directly. Signed-off-by: Chris Larson <clarson@mvista.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* core: add stat to track unresolved discards in neighbor cacheNeil Horman2008-07-161-3/+5
| | | | | | | | | | | | | | in __neigh_event_send, if we have a neighbour entry which is in NUD_INCOMPLETE state, we enqueue any outbound frames to that neighbour to the neighbours arp_queue, which is default capped to a length of 3 skbs. If that queue exceeds its set length, it will drop an skb on the queue to enqueue the newly arrived skb. This results in a drop for which we have no statistics incremented. This patch adds an unresolved_discards stat to /proc/net/stat/ndisc_cache to track these lost frames. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* netlink: Improve returned error codesThomas Graf2008-06-031-1/+2
| | | | | | | | | | | Make nlmsg_trim(), nlmsg_cancel(), genlmsg_cancel(), and nla_nest_cancel() void functions. Return -EMSGSIZE instead of -1 if the provided message buffer is not big enough. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: neighbour table ABI problemStephen Hemminger2008-06-031-3/+3
| | | | | | | | | | | | | | | | | | | The neighbor table time of last use information is returned in the incorrect unit. Kernel to user space ABI's need to use USER_HZ (or milliseconds), otherwise the application has to try and discover the real system HZ value which is problematic. Linux has standardized on keeping USER_HZ consistent (100hz) even when kernel is running internally at some other value. This change is small, but it breaks the ABI for older version of iproute2 utilities. But these utilities are already broken since they are looking at the psched_hz values which are completely different. So let's just go ahead and fix both kernel and user space. Older utilities will just print wrong values. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: assign PDE->data before gluing PDE into /proc treeDenis V. Lunev2008-05-021-3/+2
| | | | | | | | | Simply replace proc_create and further data assigned with proc_create_data. Additionally, there is no need to assign NULL to PDE->data after creation, /proc generic has already done this for us. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET] NEIGHBOUR: Extract hash/lookup functions for pneigh entries.YOSHIFUJI Hideaki2008-03-281-32/+29
| | | | | | | | | Extract hash function for pneigh entries from pneigh_lookup(), __pneigh_lookup() and pneigh_delete() as pneigh_hash(). Extract core of pneigh_lookup() and __pneigh_lookup() as __pneigh_lookup_1(). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
* [NET] NEIGHBOUR: Make each EXPORT_SYMBOL{,_GPL}() immediately follow its ↵YOSHIFUJI Hideaki2008-03-281-29/+24
| | | | | | function/variable. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
* Merge branch 'master' of ↵David S. Miller2008-03-271-0/+23
|\ | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/usb/rndis_host.c drivers/net/wireless/b43/dma.c net/ipv6/ndisc.c
| * [NEIGH]: Fix race between pneigh deletion and ipv6's ndisc_recv_ns (v3).Pavel Emelyanov2008-03-241-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Proxy neighbors do not have any reference counting, so any caller of pneigh_lookup (unless it's a netlink triggered add/del routine) should _not_ perform any actions on the found proxy entry. There's one exception from this rule - the ipv6's ndisc_recv_ns() uses found entry to check the flags for NTF_ROUTER. This creates a race between the ndisc and pneigh_delete - after the pneigh is returned to the caller, the nd_tbl.lock is dropped and the deleting procedure may proceed. One of the fixes would be to add a reference counting, but this problem exists for ndisc only. Besides such a patch would be too big for -rc4. So I propose to introduce a __pneigh_lookup() which is supposed to be called with the lock held and use it in ndisc code to check the flags on alive pneigh entry. Changes from v2: As David noticed, Exported the __pneigh_lookup() to ipv6 module. The checkpatch generates a warning on it, since the EXPORT_SYMBOL does not follow the symbol itself, but in this file all the exports come at the end, so I decided no to break this harmony. Changes from v1: Fixed comments from YOSHIFUJI - indentation of prototype in header and the pndisc_check_router() name - and a compilation fix, pointed by Daniel - the is_routed was (falsely) considered as uninitialized by gcc. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>