aboutsummaryrefslogtreecommitdiffstats
path: root/include
Commit message (Collapse)AuthorAgeFilesLines
* [ARM] 3009/1: S3C2410 - io.h offsets too large for LDRH/STRHBen Dooks2005-10-141-16/+42
| | | | | | | | | | | | Patch from Ben Dooks The __inwc/__outwc calls are capable of creating LDRH and STRH instructions with offsets over 8bits as GCC does not have a constraint for an 8bit offset. Signed-off-by: Ben Dooks <ben-linux@fluff.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
* Merge master.kernel.org:/home/rmk/linux-2.6-armLinus Torvalds2005-10-132-3/+11
|\
| * [ARM] 3005/1: S3C2440 - add definition for s3c2440_set_dsc() call in hardware.hBen Dooks2005-10-131-0/+7
| | | | | | | | | | | | | | | | | | | | Patch from Ben Dooks include/asm-arm/arch-s3c2410/hardware.h was missing the definition for s3c2440_set_dsc() Signed-off-by: Ben Dooks <ben-linux@fluff.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| * [ARM] 3003/1: SSP channel map register updates for pxa2xxLiam Girdwood2005-10-121-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch from Liam Girdwood This patch updates the pxa2xx channel map registers definitions in pxa-regs.h Changes:- o Added description for SSP2 registers o Added definitions for SSP3 registers Signed-off-by:Liam Girdwood <liam.girdwood@wolfsonmicro.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
* | Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds2005-10-128-14/+35
|\ \
| * | [NETPOLL]: wrong return for null netpoll_poll_lock()Ben Dooks2005-10-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When netpoll is not being used, the macro that defines the removed routing netpoll_poll_lock defines the return as zero, but the real routine returns a `void *` Signed-off-by: Ben Dooks <ben-linux@fluff.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | [TWSK]: Grab the module refcount for timewait socketsArnaldo Carvalho de Melo2005-10-101-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | This is required to avoid unloading a module that has active timewait sockets, such as DCCP. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | [NETFILTER] ctnetlink: allow userspace to change TCP statePablo Neira Ayuso2005-10-101-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the ability of changing the state a TCP connection. I know that this must be used with care but it's required to provide a complete conntrack creation via conntrack_netlink. So I'll document this aspect on the upcoming docs. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | [NETFILTER]: Use only 32bit counters for CONNTRACK_ACCTHarald Welte2005-10-102-4/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Initially we used 64bit counters for conntrack-based accounting, since we had no event mechanism to tell userspace that our counters are about to overflow. With nfnetlink_conntrack, we now have such a event mechanism and thus can save 16bytes per connection. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | [NETFILTER] ctnetlink: add one nesting level for TCP statePablo Neira Ayuso2005-10-101-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To keep consistency, the TCP private protocol information is nested attributes under CTA_PROTOINFO_TCP. This way the sequence of attributes to access the TCP state information looks like here below: CTA_PROTOINFO CTA_PROTOINFO_TCP CTA_PROTOINFO_TCP_STATE instead of: CTA_PROTOINFO CTA_PROTOINFO_TCP_STATE Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | [NETFILTER]: Add missing include to ip_conntrack_tuple.hHarald Welte2005-10-101-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | Without this #include, __be16 is not defined and userspace programs will break. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | [NETFILTER] nat: remove bogus structure memberHarald Welte2005-10-101-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When 'rustynat' was merged in 2.6.12, the use of the "helper" pointer of struct ipt_nat_info was obsoleted, but the pointer not removed from the struct. This patch removes the pointer, thereby yet again shrinking struct ip_conntrack. Discovered-by: Rusty Russell <rusty@netfilter.org> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | [NETFILTER] nfnetlink: use highest bit of nfa_type to indicate nested TLVHarald Welte2005-10-101-4/+8
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | As Henrik Nordstrom pointed out, all our efforts with "split endian" (i.e. host byte order tags, net byte order values) are useless, unless a parser can determine whether an attribute is nested or not. This patch steals the highest bit of nfattr.nfa_type to indicate whether the data payload contains a nested nfattr (1) or not (0). This will break userspace compatibility, but luckily no kernel with nfnetlink was released so far. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | [PATCH] ppc32: Tell userland about lack of standard TBBenjamin Herrenschmidt2005-10-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Glibc is about to get some new high precision timer stuff that relies on the standard timebase of the PPC architecture. However, some (rare & old) CPUs do not have such timebase and it is a bit annoying to have your stuff just crash because you are running on the wrong CPU... This exposes to userland a CPU feature bit that tells that the current processor doesn't have a standard timebase. It's negative logic so that glibc will still "just work" on older kernels (it will just be unhappy on those old CPUs but that doesn't really matter as distro tend to update glibc & kernel at the same time). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] ppc32: Fix timekeepingBenjamin Herrenschmidt2005-10-121-1/+1
|/ | | | | | | | | | | | | | | | | Interestingly enough, ppc32 had broken timekeeping for ages... It worked, but probably drifted a bit more than could be explained by the actual bad precision of the timebase calibration. We discovered that recently when somebody figured out that the common code was using CLOCK_TICK_RATE to correct the timekeeing, and ppc32 had a completely bogus value for it. This patch turns it into something saner. Probably not as good as doing something based on the actual timebase frequency precision but I'll leave that sort of math to others. This at least makes it better for the common HZ values. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] x86_64: Allocate cpu local data for all possible CPUsAndi Kleen2005-10-101-0/+1
| | | | | | | | | CPU hotplug fills up the possible map to NR_CPUs, but it did that after setting up per CPU data. This lead to CPU data not getting allocated for all possible CPUs, which lead to various side effects. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Fix signal sending in usbdevio on async URB completionHarald Welte2005-10-101-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | If a process issues an URB from userspace and (starts to) terminate before the URB comes back, we run into the issue described above. This is because the urb saves a pointer to "current" when it is posted to the device, but there's no guarantee that this pointer is still valid afterwards. In fact, there are three separate issues: 1) the pointer to "current" can become invalid, since the task could be completely gone when the URB completion comes back from the device. 2) Even if the saved task pointer is still pointing to a valid task_struct, task_struct->sighand could have gone meanwhile. 3) Even if the process is perfectly fine, permissions may have changed, and we can no longer send it a signal. So what we do instead, is to save the PID and uid's of the process, and introduce a new kill_proc_info_as_uid() function. Signed-off-by: Harald Welte <laforge@gnumonks.org> [ Fixed up types and added symbol exports ] Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* Merge master.kernel.org:/home/rmk/linux-2.6-armLinus Torvalds2005-10-102-1/+3
|\
| * [ARM] 2962/1: scoop: Allow GPIO pin suspend state to be specifiedRichard Purdie2005-10-101-0/+2
| | | | | | | | | | | | | | | | | | | | Patch from Richard Purdie Allow the GPIO pin suspend states to be specified for SCOOP devices. This is needed for correct operation on the spitz platform. Signed-off-by: Richard Purdie <rpurdie@rpsys.net> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| * [ARM] 2958/1: fix definition in imx-regs.hSascha Hauer2005-10-101-1/+1
| | | | | | | | | | | | | | | | | | | | Patch from Sascha Hauer Fix PD7_AF_UART2_DTR definition Signed-off-by: Giancarlo Formicuccia <gformicuccia@atinno.com> Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
* | [PATCH] x86_64: Set up safe page tables during resumeRafael J. Wysocki2005-10-101-0/+2
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following patch makes swsusp avoid the possible temporary corruption of page translation tables during resume on x86-64. This is achieved by creating a copy of the relevant page tables that will not be modified by swsusp and can be safely used by it on resume. The problem is that during resume on x86-64 swsusp may temporarily corrupt the page tables used for the direct mapping of RAM. If that happens, a page fault occurs and cannot be handled properly, which leads to the solid hang of the affected system. This leads to the loss of the system's state from before suspend and may result in the loss of data or the corruption of filesystems, so it is a serious issue. Also, it appears to happen quite often (for me, as often as 50% of the time). The problem is related to the fact that (at least) one of the PMD entries used in the direct memory mapping (starting at PAGE_OFFSET) points to a page table the physical address of which is much greater than the physical address of the PMD entry itself. Moreover, unfortunately, the physical address of the page table before suspend (i.e. the one stored in the suspend image) happens to be different to the physical address of the corresponding page table used during resume (i.e. the one that is valid right before swsusp_arch_resume() in arch/x86_64/kernel/suspend_asm.S is executed). Thus while the image is restored, the "offending" PMD entry gets overwritten, so it does not point to the right physical address any more (i.e. there's no page table at the address pointed to by it, because it points to the address the page table has been at during suspend). Consequently, if the PMD entry is used later on, and it _is_ used in the process of copying the image pages, a page fault occurs, but it cannot be handled in the normal way and the system hangs. In principle we can call create_resume_mapping() from swsusp_arch_resume() (ie. from suspend_asm.S), but then the memory allocations in create_resume_mapping(), resume_pud_mapping(), and resume_pmd_mapping() must be made carefully so that we use _only_ NosaveFree pages in them (the other pages are overwritten by the loop in swsusp_arch_resume()). Additionally, we are in atomic context at that time, so we cannot use GFP_KERNEL. Moreover, if one of the allocations fails, we should free all of the allocated pages, so we need to trace them somehow. All of this is done in the appended patch, except that the functions populating the page tables are located in arch/x86_64/kernel/suspend.c rather than in init.c. It may be done in a more elegan way in the future, with the help of some swsusp patches that are in the works now. [AK: move some externs into headers, renamed a function] Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] gfp flags annotations - part 1Al Viro2005-10-0848-138/+133
| | | | | | | | | | | | - added typedef unsigned int __nocast gfp_t; - replaced __nocast uses for gfp flags with gfp_t - it gives exactly the same warnings as far as sparse is concerned, doesn't change generated code (from gcc point of view we replaced unsigned int with typedef) and documents what's going on far better. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* Merge master.kernel.org:/pub/scm/linux/kernel/git/perex/alsaLinus Torvalds2005-10-082-1/+3
|\
| * [ALSA] emu10k1 - Fix loading of SBLive Game boardTakashi Iwai2005-10-071-1/+1
| | | | | | | | | | | | | | | | | | EMU10K1/EMU10K2 driver Fixed the error at loading SBLive Game board (and possible other models). The PCI SSIDs of this board conflicts with SB Live 5.1 Platinum, which has no AC97 chip. Signed-off-by: Takashi Iwai <tiwai@suse.de>
| * [ALSA] remove redundent assignment to the ac97 device structureNicolas Pitre2005-10-071-0/+2
| | | | | | | | | | | | | | | | | | | | | | AC97 Codec Don't use dev.platform_data to store a reference to the containing ac97_t structure. Such assignment is redundent since we can deduce the ac97_t structure location from the contained device structure. This sets platform_data free for other purposes. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
* | Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds2005-10-082-12/+31
|\ \
| * | [ATM]: add support for LECS addresses learned from networkEric Kinzie2005-10-061-0/+10
| | | | | | | | | | | | | | | | | | From: Eric Kinzie <ekinzie@cmf.nrl.navy.mil> Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | [SCTP] Fix SCTP socket options to work with 32-bit apps on 64-bit kernels.Sridhar Samudrala2005-10-061-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | Adds alignment attribute to a few structures used with SCTP socket options so that the sizes and offsets remain the same when built using either 32 or 64 bit tools. Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | [SCTP] Fix sctp_get{pl}addrs() API to work with 32-bit apps on 64-bit kernels.Ivan Skytte Jørgensen2005-10-061-7/+16
| |/ | | | | | | | | | | | | | | | | The old socket options are marked with a _OLD suffix so that the existing 32-bit apps on 32-bit kernels do not break. Signed-off-by: Ivan Skytte Jørgensen <isj-sctp@i1.dk> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | [PATCH] Keys: Split key permissions checking into a .c fileDavid Howells2005-10-081-86/+5
|/ | | | | | | | | | | The attached patch splits key permissions checking out of key-ui.h and moves it into a .c file. It's quite large and called quite a lot, and it's about to get bigger with the addition of LSM support for keys... key_any_permission() is also discarded as it's no longer used. Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds2005-10-0610-13/+19
|\
| * [IPSEC]: Document that policy direction is derived from the index.Herbert Xu2005-10-051-0/+5
| | | | | | | | | | | | | | | | | | | | | | Here is a patch that adds a helper called xfrm_policy_id2dir to document the fact that the policy direction can be and is derived from the index. This is based on a patch by YOSHIFUJI Hideaki and 210313105@suda.edu.cn. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
| * [XFRM]: fix sparse gfp nocast warningsRandy Dunlap2005-10-041-1/+1
| | | | | | | | | | | | | | | | Fix implicit nocast warnings in xfrm code: net/xfrm/xfrm_policy.c:232:47: warning: implicit cast to nocast type Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * [TEXTSEARCH]: fix sparse gfp nocast warningsRandy Dunlap2005-10-041-1/+2
| | | | | | | | | | | | | | | | Fix nocast sparse warnings: include/linux/textsearch.h:165:57: warning: implicit cast to nocast type Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * [RPC]: fix sparse gfp nocast warningsRandy Dunlap2005-10-042-2/+2
| | | | | | | | | | | | | | | | | | | | | | Fix nocast sparse warnings: net/rxrpc/call.c:2013:25: warning: implicit cast to nocast type net/rxrpc/connection.c:538:46: warning: implicit cast to nocast type net/sunrpc/sched.c:730:36: warning: implicit cast to nocast type net/sunrpc/sched.c:734:56: warning: implicit cast to nocast type Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * [IPVS]: fix sparse gfp nocast warningsRandy Dunlap2005-10-041-1/+1
| | | | | | | | | | | | | | | | | | | | From: Randy Dunlap <rdunlap@xenotime.net> Fix implicit nocast warnings in ip_vs code: net/ipv4/ipvs/ip_vs_app.c:631:54: warning: implicit cast to nocast type Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * [DECNET]: fix sparse gfp nocast warningsRandy Dunlap2005-10-042-5/+5
| | | | | | | | | | | | | | | | | | | | Fix implicit nocast warnings in decnet code: net/decnet/af_decnet.c:458:40: warning: implicit cast to nocast type net/decnet/dn_nsp_out.c:125:35: warning: implicit cast to nocast type net/decnet/dn_nsp_out.c:219:29: warning: implicit cast to nocast type Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * [CONNECTOR]: fix sparse gfp nocast warningsRandy Dunlap2005-10-041-1/+1
| | | | | | | | | | | | | | | | | | Fix implicit nocast warnings in connector code: drivers/connector/connector.c:102:24: warning: implicit cast to nocast type drivers/connector/connector.c:114:45: warning: implicit cast to nocast type Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * [ATM]: fix sparse gfp nocast warningsRandy Dunlap2005-10-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | Fix implicit nocast warnings in atm code: net/atm/atm_misc.c:35:44: warning: implicit cast to nocast type drivers/atm/fore200e.c:183:33: warning: implicit cast to nocast type Also use kzalloc() instead of kmalloc(). Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * [INET]: Shrink struct inet_ehash_bucket on 32 bits UPEric Dumazet2005-10-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | No need to align struct inet_ehash_bucket on a 8 bytes boundary. On 32 bits Uniprocessor, that's a waste of 4 bytes per struct (50 %) On other platforms, the attribute is useless, natual alignement is already 8. platform | Size before | Size after patch -------------+-------------+------------------ 32 bits, UP | 8 | 4 32 bits, SMP | 8 | 8 64 bits, UP | 8 | 8 64 bits, SMP | 16 | 16 Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | [PATCH] fix the breakage in sparc headersAl Viro2005-10-052-14/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | If we switch extern inline to static inline, we'd better switch the pre-declarations we use to say that these puppies have __attribute_const__ on them. Otherwise we get extern declaration followed by static inline one. Which makes gcc unhappy, and for a good reason... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] Fix broken IXP4xx GPIO macroDeepak Saxena2005-10-041-1/+1
| | | | | | | | | | | | | | Macro ended up backwards during one of cleanups. Found by Alessandro Zummo. Signed-off-by: Deepak Saxena <dsaxena@plexity.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | Merge master.kernel.org:/home/rmk/linux-2.6-armLinus Torvalds2005-10-042-15/+39
|\ \ | |/ |/|
| * [ARM] 2950/1: i.MX gpio setup functionSascha Hauer2005-10-041-12/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | Patch from Sascha Hauer Current implementation of imx_gpio_mode does not allow to configure all alternate routing possibilities of the i.MX. With this patch every bit in the gpio setup registers has a corresponding bit in the gpio_mode parameter, so every routing should be possible now. Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| * [ARM] 2949/1: Hynix h720x Run modeSascha Hauer2005-10-041-3/+5
| | | | | | | | | | | | | | | | | | | | Patch from Sascha Hauer After coming out of idle mode the h720x goes into slow mode. Switch it back to run mode. Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
* | [PATCH] uml: Fix sysrq-r support for skas modeAllan Graves2005-10-043-35/+17
| | | | | | | | | | | | | | | | | | | | | | The old code had the IP and SP coming from the registers in the thread struct, which are completely wrong since those are the userspace registers. This fixes that by pulling the correct values from the jmp_buf in which the kernel state of each thread is stored. Signed-off-by: Allan Graves <allan.graves@oracle.com> Signed-off-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] bfs endianness annotationsAlexey Dobriyan2005-10-041-21/+21
| | | | | | | | | | | | Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [SPARC]: "extern inline" doesn't make much sense.Adrian Bunk2005-10-0323-114/+114
|/ | | | | Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: Replace __in_dev_get with __in_dev_get_rcu/rtnlHerbert Xu2005-10-031-2/+10
| | | | | | | | | | | | | | | | | | The following patch renames __in_dev_get() to __in_dev_get_rtnl() and introduces __in_dev_get_rcu() to cover the second case. 1) RCU with refcnt should use in_dev_get(). 2) RCU without refcnt should use __in_dev_get_rcu(). 3) All others must hold RTNL and use __in_dev_get_rtnl(). There is one exception in net/ipv4/route.c which is in fact a pre-existing race condition. I've marked it as such so that we remember to fix it. This patch is based on suggestions and prior work by Suzanne Wood and Paul McKenney. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [INET]: speedup inet (tcp/dccp) lookupsEric Dumazet2005-10-036-43/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Arnaldo and I agreed it could be applied now, because I have other pending patches depending on this one (Thank you Arnaldo) (The other important patch moves skc_refcnt in a separate cache line, so that the SMP/NUMA performance doesnt suffer from cache line ping pongs) 1) First some performance data : -------------------------------- tcp_v4_rcv() wastes a *lot* of time in __inet_lookup_established() The most time critical code is : sk_for_each(sk, node, &head->chain) { if (INET_MATCH(sk, acookie, saddr, daddr, ports, dif)) goto hit; /* You sunk my battleship! */ } The sk_for_each() does use prefetch() hints but only the begining of "struct sock" is prefetched. As INET_MATCH first comparison uses inet_sk(__sk)->daddr, wich is far away from the begining of "struct sock", it has to bring into CPU cache cold cache line. Each iteration has to use at least 2 cache lines. This can be problematic if some chains are very long. 2) The goal ----------- The idea I had is to change things so that INET_MATCH() may return FALSE in 99% of cases only using the data already in the CPU cache, using one cache line per iteration. 3) Description of the patch --------------------------- Adds a new 'unsigned int skc_hash' field in 'struct sock_common', filling a 32 bits hole on 64 bits platform. struct sock_common { unsigned short skc_family; volatile unsigned char skc_state; unsigned char skc_reuse; int skc_bound_dev_if; struct hlist_node skc_node; struct hlist_node skc_bind_node; atomic_t skc_refcnt; + unsigned int skc_hash; struct proto *skc_prot; }; Store in this 32 bits field the full hash, not masked by (ehash_size - 1) Using this full hash as the first comparison done in INET_MATCH permits us immediatly skip the element without touching a second cache line in case of a miss. Suppress the sk_hashent/tw_hashent fields since skc_hash (aliased to sk_hash and tw_hash) already contains the slot number if we mask with (ehash_size - 1) File include/net/inet_hashtables.h 64 bits platforms : #define INET_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\ (((__sk)->sk_hash == (__hash)) ((*((__u64 *)&(inet_sk(__sk)->daddr)))== (__cookie)) && \ ((*((__u32 *)&(inet_sk(__sk)->dport))) == (__ports)) && \ (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif)))) 32bits platforms: #define TCP_IPV4_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\ (((__sk)->sk_hash == (__hash)) && \ (inet_sk(__sk)->daddr == (__saddr)) && \ (inet_sk(__sk)->rcv_saddr == (__daddr)) && \ (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif)))) - Adds a prefetch(head->chain.first) in __inet_lookup_established()/__tcp_v4_check_established() and __inet6_lookup_established()/__tcp_v6_check_established() and __dccp_v4_check_established() to bring into cache the first element of the list, before the {read|write}_lock(&head->lock); Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>