aboutsummaryrefslogtreecommitdiffstats
path: root/include
Commit message (Collapse)AuthorAgeFilesLines
* zlib: slim down zlib_deflate() workspace when possibleJim Keniston2011-03-221-3/+8
| | | | | | | | | | | | | | | | | | | | | | | Instead of always creating a huge (268K) deflate_workspace with the maximum compression parameters (windowBits=15, memLevel=8), allow the caller to obtain a smaller workspace by specifying smaller parameter values. For example, when capturing oops and panic reports to a medium with limited capacity, such as NVRAM, compression may be the only way to capture the whole report. In this case, a small workspace (24K works fine) is a win, whether you allocate the workspace when you need it (i.e., during an oops or panic) or at boot time. I've verified that this patch works with all accepted values of windowBits (positive and negative), memLevel, and compression level. Signed-off-by: Jim Keniston <jkenisto@us.ibm.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: David Miller <davem@davemloft.net> Cc: Chris Mason <chris.mason@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* crc32: add missed brackets in macroKonstantin Khlebnikov2011-03-221-1/+1
| | | | | | | | Add brackets around typecasted argument in crc32() macro. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* sigma-firmware: loader for Analog Devices' SigmaStudioMike Frysinger2011-03-221-0/+60
| | | | | | | | | | Analog Devices' SigmaStudio can produce firmware blobs for devices with these DSPs embedded (like some audio codecs). Allow these device drivers to easily parse and load them. Signed-off-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* kstrto*: converting strings to integers done (hopefully) rightAlexey Dobriyan2011-03-221-4/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. simple_strto*() do not contain overflow checks and crufty, libc way to indicate failure. 2. strict_strto*() also do not have overflow checks but the name and comments pretend they do. 3. Both families have only "long long" and "long" variants, but users want strtou8() 4. Both "simple" and "strict" prefixes are wrong: Simple doesn't exactly say what's so simple, strict should not exist because conversion should be strict by default. The solution is to use "k" prefix and add convertors for more types. Enter kstrtoull() kstrtoll() kstrtoul() kstrtol() kstrtouint() kstrtoint() kstrtou64() kstrtos64() kstrtou32() kstrtos32() kstrtou16() kstrtos16() kstrtou8() kstrtos8() Include runtime testsuite (somewhat incomplete) as well. strict_strto*() become deprecated, stubbed to kstrto*() and eventually will be removed altogether. Use kstrto*() in code today! Note: on some archs _kstrtoul() and _kstrtol() are left in tree, even if they'll be unused at runtime. This is temporarily solution, because I don't want to hardcode list of archs where these functions aren't needed. Current solution with sizeof() and __alignof__ at least always works. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* smp: move smp setup functions to kernel/smp.cAmerigo Wang2011-03-221-0/+2
| | | | | | | | | | | | | | | | | Move setup_nr_cpu_ids(), smp_init() and some other SMP boot parameter setup functions from init/main.c to kenrel/smp.c, saves some #ifdef CONFIG_SMP. Signed-off-by: WANG Cong <amwang@redhat.com> Cc: Rakib Mullick <rakib.mullick@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Tejun Heo <tj@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* include/linux/err.h: add a function to cast error-pointers to a return valueUwe Kleine-König2011-03-221-0/+8
| | | | | | | | | | PTR_RET() can be used if you have an error-pointer and are only interested in the eventual error value, but not the pointer. Yields the usual 0 for no error, -ESOMETHING otherwise. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs.h: remove 8 bytes of padding from block_device on 64bit buildsRichard Kennedy2011-03-221-1/+1
| | | | | | | | | | | | Re-ordering struct block_inode to remove 8 bytes of padding on 64 bit builds, which also shrinks bdev_inode by 8 bytes (776 -> 768) allowing it to fit into one fewer cache lines. Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* include/linux/compiler-gcc*.h: unify macro definitionsBorislav Petkov2011-03-223-16/+8
| | | | | | | | Unify identical gcc3.x and gcc4.x macros. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* add the common dma_addr_t typedef to include/linux/types.hFUJITA Tomonori2011-03-222-27/+6
| | | | | | | | | | | | | | | | | | | | | | All architectures can use the common dma_addr_t typedef now. We can remove the arch specific dma_addr_t. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Richard Henderson <rth@twiddle.net> Cc: Matt Turner <mattst88@gmail.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: add __GFP_OTHER_NODE flagAndi Kleen2011-03-222-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | Add a new __GFP_OTHER_NODE flag to tell the low level numa statistics in zone_statistics() that an allocation is on behalf of another thread. This way the local and remote counters can be still correct, even when background daemons like khugepaged are changing memory mappings. This only affects the accounting, but I think it's worth doing that right to avoid confusing users. I first tried to just pass down the right node, but this required a lot of changes to pass down this parameter and at least one addition of a 10th argument to a 9 argument function. Using the flag is a lot less intrusive. Open: should be also used for migration? [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: vmscan: kswapd should not free an excessive number of pages when ↵Mel Gorman2011-03-221-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | balancing small zones When reclaiming for order-0 pages, kswapd requires that all zones be balanced. Each cycle through balance_pgdat() does background ageing on all zones if necessary and applies equal pressure on the inactive zone unless a lot of pages are free already. A "lot of free pages" is defined as a "balance gap" above the high watermark which is currently 7*high_watermark. Historically this was reasonable as min_free_kbytes was small. However, on systems using huge pages, it is recommended that min_free_kbytes is higher and it is tuned with hugeadm --set-recommended-min_free_kbytes. With the introduction of transparent huge page support, this recommended value is also applied. On X86-64 with 4G of memory, min_free_kbytes becomes 67584 so one would expect around 68M of memory to be free. The Normal zone is approximately 35000 pages so under even normal memory pressure such as copying a large file, it gets exhausted quickly. As it is getting exhausted, kswapd applies pressure equally to all zones, including the DMA32 zone. DMA32 is approximately 700,000 pages with a high watermark of around 23,000 pages. In this situation, kswapd will reclaim around (23000*8 where 8 is the high watermark + balance gap of 7 * high watermark) pages or 718M of pages before the zone is ignored. What the user sees is that free memory far higher than it should be. To avoid an excessive number of pages being reclaimed from the larger zones, explicitely defines the "balance gap" to be either 1% of the zone or the low watermark for the zone, whichever is smaller. While kswapd will check all zones to apply pressure, it'll ignore zones that meets the (high_wmark + balance_gap) watermark. To test this, 80G were copied from a partition and the amount of memory being used was recorded. A comparison of a patch and unpatched kernel can be seen at http://www.csn.ul.ie/~mel/postings/minfree-20110222/memory-usage-hydra.ps and shows that kswapd is not reclaiming as much memory with the patch applied. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Rik van Riel <riel@redhat.com> Cc: Shaohua Li <shaohua.li@intel.com> Cc: "Chen, Tim C" <tim.c.chen@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* pagewalk: only split huge pages when necessaryDave Hansen2011-03-221-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Right now, if a mm_walk has either ->pte_entry or ->pmd_entry set, it will unconditionally split any transparent huge pages it runs in to. In practice, that means that anyone doing a cat /proc/$pid/smaps will unconditionally break down every huge page in the process and depend on khugepaged to re-collapse it later. This is fairly suboptimal. This patch changes that behavior. It teaches each ->pmd_entry handler (there are five) that they must break down the THPs themselves. Also, the _generic_ code will never break down a THP unless a ->pte_entry handler is actually set. This means that the ->pmd_entry handlers can now choose to deal with THPs without breaking them down. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Acked-by: David Rientjes <rientjes@google.com> Reviewed-by: Eric B Munson <emunson@mgebm.net> Tested-by: Eric B Munson <emunson@mgebm.net> Cc: Michael J Wolf <mjwolf@us.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matt Mackall <mpm@selenic.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* memcg: move memcg reclaimable page into tail of inactive listMinchan Kim2011-03-221-0/+6
| | | | | | | | | | | | | | | | | | | The rotate_reclaimable_page function moves just written out pages, which the VM wanted to reclaim, to the end of the inactive list. That way the VM will find those pages first next time it needs to free memory. This patch applies the rule in memcg. It can help to prevent unnecessary working page eviction of memcg. Signed-off-by: Minchan Kim <minchan.kim@gmail.com> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: deactivate invalidated pagesMinchan Kim2011-03-221-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recently, there are reported problem about thrashing. (http://marc.info/?l=rsync&m=128885034930933&w=2) It happens by backup workloads(ex, nightly rsync). That's because the workload makes just use-once pages and touches pages twice. It promotes the page into active list so that it results in working set page eviction. Some app developer want to support POSIX_FADV_NOREUSE. But other OSes don't support it, either. (http://marc.info/?l=linux-mm&m=128928979512086&w=2) By other approach, app developers use POSIX_FADV_DONTNEED. But it has a problem. If kernel meets page is writing during invalidate_mapping_pages, it can't work. It makes for application programmer to use it since they always have to sync data before calling fadivse(..POSIX_FADV_DONTNEED) to make sure the pages could be discardable. At last, they can't use deferred write of kernel so that they could see performance loss. (http://insights.oetiker.ch/linux/fadvise.html) In fact, invalidation is very big hint to reclaimer. It means we don't use the page any more. So let's move the writing page into inactive list's head if we can't truncate it right now. Why I move page to head of lru on this patch, Dirty/Writeback page would be flushed sooner or later. It can prevent writeout of pageout which is less effective than flusher's writeout. Originally, I reused lru_demote of Peter with some change so added his Signed-off-by. Signed-off-by: Minchan Kim <minchan.kim@gmail.com> Reported-by: Ben Gamari <bgamari.foss@gmail.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: mm_struct: remove 16 bytes of alignment padding on 64 bit buildsRichard Kennedy2011-03-221-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reorder mm_struct to remove 16 bytes of alignment padding on 64 bit builds. On my config this shrinks mm_struct by enough to fit in one fewer cache lines and allows more objects per slab in mm_struct kmem_cache under SLUB. slabinfo before patch :- Sizes (bytes) Slabs -------------------------------- Object : 848 Total : 9 SlabObj: 896 Full : 2 SlabSiz: 16384 Partial: 5 Loss : 48 CpuSlab: 2 Align : 64 Objects: 18 slabinfo after :- Sizes (bytes) Slabs -------------------------------- Object : 832 Total : 7 SlabObj: 832 Full : 2 SlabSiz: 16384 Partial: 3 Loss : 0 CpuSlab: 2 Align : 64 Objects: 19 Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: remove unused TestSetPageLocked() interfaceMichel Lespinasse2011-03-221-1/+1
| | | | | | | | | | TestSetPageLocked() isn't being used anywhere. Also, using it would likely be an error, since the proper interface trylock_page() provides stronger ordering guarantees. Signed-off-by: Michel Lespinasse <walken@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: simplify anon_vma refcountsPeter Zijlstra2011-03-221-2/+9
| | | | | | | | | | | | | | | | | | This patch changes the anon_vma refcount to be 0 when the object is free. It does this by adding 1 ref to being in use in the anon_vma structure (iow. the anon_vma->head list is not empty). This allows a simpler release scheme without having to check both the refcount and the list as well as avoids taking a ref for each entry on the list. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: move anon_vma ref out from under CONFIG_fooPeter Zijlstra2011-03-221-36/+4
| | | | | | | | | | | | | We need the anon_vma refcount unconditionally to simplify the anon_vma lifetime rules. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: rename drop_anon_vma() to put_anon_vma()Peter Zijlstra2011-03-221-2/+2
| | | | | | | | | | | | The normal code pattern used in the kernel is: get/put. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Hugh Dickins <hughd@google.com> Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: change __remove_from_page_cache()Minchan Kim2011-03-221-1/+1
| | | | | | | | | | | | | | | | Now we renamed remove_from_page_cache with delete_from_page_cache. As consistency of __remove_from_swap_cache and remove_from_swap_cache, we change internal page cache handling function name, too. Signed-off-by: Minchan Kim <minchan.kim@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: goodbye remove_from_page_cache()Minchan Kim2011-03-221-1/+0
| | | | | | | | | | | | | | | | Now delete_from_page_cache() replaces remove_from_page_cache(). So we remove remove_from_page_cache so fs or something out of mainline will notice it when compile time and can fix it. Signed-off-by: Minchan Kim <minchan.kim@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: introduce delete_from_page_cache()Minchan Kim2011-03-221-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Presently we increase the page refcount in add_to_page_cache() but don't decrease it in remove_from_page_cache(). Such asymmetry adds confusion, requiring that callers notice it and a comment explaining why they release a page reference. It's not a good API. A long time ago, Hugh tried it (http://lkml.org/lkml/2004/10/24/140) but gave up because reiser4's drop_page() had to unlock the page between removing it from page cache and doing the page_cache_release(). But now the situation is changed. I think at least things in current mainline don't have any obstacles. The problem is for out-of-mainline filesystems - if they have done such things as reiser4, this patch could be a problem but they will discover this at compile time since we remove remove_from_page_cache(). This patch: This function works as just wrapper remove_from_page_cache(). The difference is that it decreases page references in itself. So caller have to make sure it has a page reference before calling. This patch is ready for removing remove_from_page_cache(). Signed-off-by: Minchan Kim <minchan.kim@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Edward Shishkin <edward.shishkin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: add replace_page_cache_page() functionMiklos Szeredi2011-03-222-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | This function basically does: remove_from_page_cache(old); page_cache_release(old); add_to_page_cache_locked(new); Except it does this atomically, so there's no possibility for the "add" to fail because of a race. If memory cgroups are enabled, then the memory cgroup charge is also moved from the old page to the new. This function is currently used by fuse to move pages into the page cache on read, instead of copying the page contents. [minchan.kim@gmail.com: add freepage() hook to replace_page_cache_page()] Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Minchan Kim <minchan.kim@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: allow GUP to fail instead of waiting on a pageGleb Natapov2011-03-221-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | GUP user may want to try to acquire a reference to a page if it is already in memory, but not if IO, to bring it in, is needed. For example KVM may tell vcpu to schedule another guest process if current one is trying to access swapped out page. Meanwhile, the page will be swapped in and the guest process, that depends on it, will be able to run again. This patch adds FAULT_FLAG_RETRY_NOWAIT (suggested by Linus) and FOLL_NOWAIT follow_page flags. FAULT_FLAG_RETRY_NOWAIT, when used in conjunction with VM_FAULT_ALLOW_RETRY, indicates to handle_mm_fault that it shouldn't drop mmap_sem and wait on a page, but return VM_FAULT_RETRY instead. [akpm@linux-foundation.org: improve FOLL_NOWAIT comment] Signed-off-by: Gleb Natapov <gleb@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Hugh Dickins <hughd@google.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Michel Lespinasse <walken@google.com> Cc: Avi Kivity <avi@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* oom: suppress nodes that are not allowed from meminfo on oom killDavid Rientjes2011-03-221-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The oom killer is extremely verbose for machines with a large number of cpus and/or nodes. This verbosity can often be harmful if it causes other important messages to be scrolled from the kernel log and incurs a signicant time delay, specifically for kernels with CONFIG_NODES_SHIFT > 8. This patch causes only memory information to be displayed for nodes that are allowed by current's cpuset when dumping the VM state. Information for all other nodes is irrelevant to the oom condition; we don't care if there's an abundance of memory elsewhere if we can't access it. This only affects the behavior of dumping memory information when an oom is triggered. Other dumps, such as for sysrq+m, still display the unfiltered form when using the existing show_mem() interface. Additionally, the per-cpu pageset statistics are extremely verbose in oom killer output, so it is now suppressed. This removes nodes_weight(current->mems_allowed) * (1 + nr_cpus) lines from the oom killer output. Callers may use __show_mem(SHOW_MEM_FILTER_NODES) to filter disallowed nodes. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* kthread: NUMA aware kthread_create_on_node()Eric Dumazet2011-03-222-4/+11
| | | | | | | | | | | | | | | | | | | | | | | All kthreads being created from a single helper task, they all use memory from a single node for their kernel stack and task struct. This patch suite creates kthread_create_on_node(), adding a 'cpu' parameter to parameters already used by kthread_create(). This parameter serves in allocating memory for the new kthread on its memory node if possible. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Reviewed-by: Andi Kleen <ak@linux.intel.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Tejun Heo <tj@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: David Howells <dhowells@redhat.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: compaction: prevent kswapd compacting memory to reduce CPU usageAndrea Arcangeli2011-03-221-7/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch reverts 5a03b051 ("thp: use compaction in kswapd for GFP_ATOMIC order > 0") due to reports stating that kswapd CPU usage was higher and IRQs were being disabled more frequently. This was reported at http://www.spinics.net/linux/fedora/alsa-user/msg09885.html. Without this patch applied, CPU usage by kswapd hovers around the 20% mark according to the tester (Arthur Marsh: http://www.spinics.net/linux/fedora/alsa-user/msg09899.html). With this patch applied, it's around 2%. The problem is not related to THP which specifies __GFP_NO_KSWAPD but is triggered by high-order allocations hitting the low watermark for their order and waking kswapd on kernels with CONFIG_COMPACTION set. The most common trigger for this is network cards configured for jumbo frames but it's also possible it'll be triggered by fork-heavy workloads (order-1) and some wireless cards which depend on order-1 allocations. The symptoms for the user will be high CPU usage by kswapd in low-memory situations which could be confused with another writeback problem. While a patch like 5a03b051 may be reintroduced in the future, this patch plays it safe for now and reverts it. [mel@csn.ul.ie: Beefed up the changelog] Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Mel Gorman <mel@csn.ul.ie> Reported-by: Arthur Marsh <arthur.marsh@internode.on.net> Tested-by: Arthur Marsh <arthur.marsh@internode.on.net> Cc: <stable@kernel.org> [2.6.38.1] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* pwm_backlight: add check_fb() hookRobert Morell2011-03-221-0/+3
| | | | | | | | | | | | | | | | | | In systems with multiple framebuffer devices, one of the devices might be blanked while another is unblanked. In order for the backlight blanking logic to know whether to turn off the backlight for a particular framebuffer's blanking notification, it needs to be able to check if a given framebuffer device corresponds to the backlight. This plumbs the check_fb hook from core backlight through the pwm_backlight helper to allow platform code to plug in a check_fb hook. Signed-off-by: Robert Morell <rmorell@nvidia.com> Cc: Richard Purdie <rpurdie@rpsys.net> Cc: Arun Murthy <arun.murthy@stericsson.com> Cc: Linus Walleij <linus.walleij@stericsson.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* backlight: add backlight typeMatthew Garrett2011-03-221-0/+9
| | | | | | | | | | | | | | | | | | | There may be multiple ways of controlling the backlight on a given machine. Allow drivers to expose the type of interface they are providing, making it possible for userspace to make appropriate policy decisions. Signed-off-by: Matthew Garrett <mjg@redhat.com> Cc: Richard Purdie <rpurdie@rpsys.net> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: David Airlie <airlied@linux.ie> Cc: Alex Deucher <alexdeucher@gmail.com> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Zhang Rui <rui.zhang@intel.com> Cc: Len Brown <lenb@kernel.org> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* leds: make *struct gpio_led_platform_data.leds constUwe Kleine-König2011-03-221-2/+2
| | | | | | | | | | And fix a typo. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Cc: Lars-Peter Clausen <lars@metafoo.de> Cc: Richard Purdie <rpurdie@rpsys.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* leds: add driver for LM3530 ALSShreshtha Kumar Sahu2011-03-221-0/+107
| | | | | | | | | | | Simple backlight driver for National Semiconductor LM3530. Presently only manual mode is supported, PWM and ALS support to be added. Signed-off-by: Shreshtha Kumar Sahu <shreshthakumar.sahu@stericsson.com> Cc: Linus Walleij <linus.walleij@stericsson.com> Cc: Richard Purdie <rpurdie@rpsys.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* include/asm-generic/unistd.h: fix syncfs syscall numberAndrew Morton2011-03-221-1/+1
| | | | | | | | | syncfs() is duplicating name_to_handle_at() due to a merging mistake. Cc: Sage Weil <sage@newdream.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'slab/urgent' of ↵Linus Torvalds2011-03-221-0/+1
|\ | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6 * 'slab/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6: slub: Add statistics for this_cmpxchg_double failures slub: Add missing irq restore for the OOM path
| * slub: Add statistics for this_cmpxchg_double failuresChristoph Lameter2011-03-221-0/+1
| | | | | | | | | | | | | | Add some statistics for debugging. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-clientLinus Torvalds2011-03-224-16/+100
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: rbd: use watch/notify for changes in rbd header libceph: add lingering request and watch/notify event framework rbd: update email address in Documentation ceph: rename dentry_release -> d_release, fix comment ceph: add request to the tail of unsafe write list ceph: remove request from unsafe list if it is canceled/timed out ceph: move readahead default to fs/ceph from libceph ceph: add ino32 mount option ceph: update common header files ceph: remove debugfs debug cruft libceph: fix osd request queuing on osdmap updates ceph: preserve I_COMPLETE across rename libceph: Fix base64-decoding when input ends in newline.
| * | libceph: add lingering request and watch/notify event frameworkYehuda Sadeh2011-03-221-0/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Lingering requests are requests that are sent to the OSD normally but tracked also after we get a successful request. This keeps the OSD connection open and resends the original request if the object moves to another OSD. The OSD can then send notification messages back to us if another client initiates a notify. This framework will be used by RBD so that the client gets notification when a snapshot is created by another node or tool. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
| * | ceph: move readahead default to fs/ceph from libcephSage Weil2011-03-211-1/+0
| | | | | | | | | | | | Signed-off-by: Sage Weil <sage@newdream.net>
| * | ceph: update common header filesYehuda Sadeh2011-03-212-13/+45
| | | | | | | | | | | | | | | | | | | | | | | | This updates the common header files used by the different ceph related modules. Specifically it adds definitions required by the rbd watch/notify feature. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
| * | libceph: fix osd request queuing on osdmap updatesSage Weil2011-03-211-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we send a request to osd A, and the request's pg remaps to osd B and then back to A in quick succession, we need to resend the request to A. The old code was only calling kick_requests after processing all incremental maps in a message, so it was very possible to not resend a request that needed to be resent. This would make the osd eventually time out (at least with the current default of osd timeouts enabled). The correct approach is to scan requests on every map incremental. This patch refactors the kick code in a few ways: - all requests are either on req_lru (in flight), req_unsent (ready to send), or req_notarget (currently map to no up osd) - mapping always done by map_request (previous map_osds) - if the mapping changes, we requeue. requests are resent only after all map incrementals are processed. - some osd reset code is moved out of kick_requests into a separate function - the "kick this osd" functionality is moved to kick_osd_requests, as it is unrelated to scanning for request->pg->osd mapping changes Signed-off-by: Sage Weil <sage@newdream.net>
* | | tty: stop using "delayed_work" in the tty layerLinus Torvalds2011-03-222-2/+2
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Using delayed-work for tty flip buffers ends up causing us to wait for the next tick to complete some actions. That's usually not all that noticeable, but for certain latency-critical workloads it ends up being totally unacceptable. As an extreme case of this, passing a token back-and-forth over a pty will take two ticks per iteration, so even just a thousand iterations will take 8 seconds assuming a common 250Hz configuration. Avoiding the whole delayed work issue brings that ping-pong test-case down to 0.009s on my machine. In more practical terms, this latency has been a performance problem for things like dive computer simulators (simulating the serial interface using the ptys) and for other environments (Alan mentions a CP/M emulator). Reported-by: Jef Driesen <jefdriesen@telenet.be> Acked-by: Greg KH <gregkh@suse.de> Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge branch 'for-linus/2639/i2c-2' of git://git.fluff.org/bjdooks/linuxLinus Torvalds2011-03-221-0/+82
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-linus/2639/i2c-2' of git://git.fluff.org/bjdooks/linux: i2c-pxa2xx: Don't clear isr bits too early i2c-pxa2xx: Fix register offsets i2c-pxa2xx: pass of_node from platform driver to adapter and publish i2c-pxa2xx: check timeout correctly i2c-pxa2xx: add support for shared IRQ handler i2c-pxa2xx: Add PCI support for PXA I2C controller ARM: pxa2xx: reorganize I2C files i2c-pxa2xx: use dynamic register layout i2c-mxs: set controller to pio queue mode after reset i2c-eg20t: support new device OKI SEMICONDUCTOR ML7213 IOH i2c/busses: Add support for Diolan U2C-12 USB-I2C adapter
| * | ARM: pxa2xx: reorganize I2C filesSebastian Andrzej Siewior2011-03-211-0/+82
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch moves the platform data definition from arch/arm/plat-pxa/include/plat/i2c.h to include/linux/i2c/pxa-i2c.h so it can be accessed from x86 the same way as on ARM. This change should make no functional change to the PXA code. The move is verified by building the following defconfigs: cm_x2xx_defconfig corgi_defconfig em_x270_defconfig ezx_defconfig imote2_defconfig pxa3xx_defconfig spitz_defconfig zeus_defconfig raumfeld_defconfig magician_defconfig mmp2_defconfig pxa168_defconfig pxa910_defconfig Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Dirk Brandewie <dirk.brandewie@gmail.com> Signed-off-by: Ben Dooks <ben-linux@fluff.org>
* | | Merge branch 'for-linus' of ↵Linus Torvalds2011-03-222-3/+6
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6: slub: Dont define useless label in the !CONFIG_CMPXCHG_LOCAL case slab,rcu: don't assume the size of struct rcu_head slub,rcu: don't assume the size of struct rcu_head slub: automatically reserve bytes at the end of slab Lockless (and preemptless) fastpaths for slub slub: Get rid of slab_free_hook_irq() slub: min_partial needs to be in first cacheline slub: fix ksize() build error slub: fix kmemcheck calls to match ksize() hints Revert "slab: Fix missing DEBUG_SLAB last user" mm: Remove support for kmem_cache_name()
| * \ \ Merge branch 'slub/lockless' into for-linusPekka Enberg2011-03-203-15/+155
| |\ \ \ | | | | | | | | | | | | | | | | | | | | Conflicts: include/linux/slub_def.h
| | * | | Lockless (and preemptless) fastpaths for slubChristoph Lameter2011-03-111-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the this_cpu_cmpxchg_double functionality to implement a lockless allocation algorithm on arches that support fast this_cpu_ops. Each of the per cpu pointers is paired with a transaction id that ensures that updates of the per cpu information can only occur in sequence on a certain cpu. A transaction id is a "long" integer that is comprised of an event number and the cpu number. The event number is incremented for every change to the per cpu state. This means that the cmpxchg instruction can verify for an update that nothing interfered and that we are updating the percpu structure for the processor where we picked up the information and that we are also currently on that processor when we update the information. This results in a significant decrease of the overhead in the fastpaths. It also makes it easy to adopt the fast path for realtime kernels since this is lockless and does not require the use of the current per cpu area over the critical section. It is only important that the per cpu area is current at the beginning of the critical section and at the end. So there is no need even to disable preemption. Test results show that the fastpath cycle count is reduced by up to ~ 40% (alloc/free test goes from ~140 cycles down to ~80). The slowpath for kfree adds a few cycles. Sadly this does nothing for the slowpath which is where the main issues with performance in slub are but the best case performance rises significantly. (For that see the more complex slub patches that require cmpxchg_double) Kmalloc: alloc/free test Before: 10000 times kmalloc(8)/kfree -> 134 cycles 10000 times kmalloc(16)/kfree -> 152 cycles 10000 times kmalloc(32)/kfree -> 144 cycles 10000 times kmalloc(64)/kfree -> 142 cycles 10000 times kmalloc(128)/kfree -> 142 cycles 10000 times kmalloc(256)/kfree -> 132 cycles 10000 times kmalloc(512)/kfree -> 132 cycles 10000 times kmalloc(1024)/kfree -> 135 cycles 10000 times kmalloc(2048)/kfree -> 135 cycles 10000 times kmalloc(4096)/kfree -> 135 cycles 10000 times kmalloc(8192)/kfree -> 144 cycles 10000 times kmalloc(16384)/kfree -> 754 cycles After: 10000 times kmalloc(8)/kfree -> 78 cycles 10000 times kmalloc(16)/kfree -> 78 cycles 10000 times kmalloc(32)/kfree -> 82 cycles 10000 times kmalloc(64)/kfree -> 88 cycles 10000 times kmalloc(128)/kfree -> 79 cycles 10000 times kmalloc(256)/kfree -> 79 cycles 10000 times kmalloc(512)/kfree -> 85 cycles 10000 times kmalloc(1024)/kfree -> 82 cycles 10000 times kmalloc(2048)/kfree -> 82 cycles 10000 times kmalloc(4096)/kfree -> 85 cycles 10000 times kmalloc(8192)/kfree -> 82 cycles 10000 times kmalloc(16384)/kfree -> 706 cycles Kmalloc: Repeatedly allocate then free test Before: 10000 times kmalloc(8) -> 211 cycles kfree -> 113 cycles 10000 times kmalloc(16) -> 174 cycles kfree -> 115 cycles 10000 times kmalloc(32) -> 235 cycles kfree -> 129 cycles 10000 times kmalloc(64) -> 222 cycles kfree -> 120 cycles 10000 times kmalloc(128) -> 343 cycles kfree -> 139 cycles 10000 times kmalloc(256) -> 827 cycles kfree -> 147 cycles 10000 times kmalloc(512) -> 1048 cycles kfree -> 272 cycles 10000 times kmalloc(1024) -> 2043 cycles kfree -> 528 cycles 10000 times kmalloc(2048) -> 4002 cycles kfree -> 571 cycles 10000 times kmalloc(4096) -> 7740 cycles kfree -> 628 cycles 10000 times kmalloc(8192) -> 8062 cycles kfree -> 850 cycles 10000 times kmalloc(16384) -> 8895 cycles kfree -> 1249 cycles After: 10000 times kmalloc(8) -> 190 cycles kfree -> 129 cycles 10000 times kmalloc(16) -> 76 cycles kfree -> 123 cycles 10000 times kmalloc(32) -> 126 cycles kfree -> 124 cycles 10000 times kmalloc(64) -> 181 cycles kfree -> 128 cycles 10000 times kmalloc(128) -> 310 cycles kfree -> 140 cycles 10000 times kmalloc(256) -> 809 cycles kfree -> 165 cycles 10000 times kmalloc(512) -> 1005 cycles kfree -> 269 cycles 10000 times kmalloc(1024) -> 1999 cycles kfree -> 527 cycles 10000 times kmalloc(2048) -> 3967 cycles kfree -> 570 cycles 10000 times kmalloc(4096) -> 7658 cycles kfree -> 637 cycles 10000 times kmalloc(8192) -> 8111 cycles kfree -> 859 cycles 10000 times kmalloc(16384) -> 8791 cycles kfree -> 1173 cycles Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
| | * | | slub: min_partial needs to be in first cachelineChristoph Lameter2011-03-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is used in unfreeze_slab() which is a performance critical function. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
| | * | | Merge branch 'for-2.6.39' of ↵Pekka Enberg2011-03-112-13/+150
| | |\ \ \ | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu into slub/lockless
| * | \ \ \ Merge branch 'slab/next' into for-linusPekka Enberg2011-03-202-1/+1
| |\ \ \ \ \ | | |_|_|_|/ | |/| | | |
| | * | | | Merge branch 'slab/rcu' into slab/nextPekka Enberg2011-03-1170-187/+355
| | |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: mm/slub.c
| | | * | | | slub: automatically reserve bytes at the end of slabLai Jiangshan2011-03-111-0/+1
| | | |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is no "struct" for slub's slab, it shares with struct page. But struct page is very small, it is insufficient when we need to add some metadata for slab. So we add a field "reserved" to struct kmem_cache, when a slab is allocated, kmem_cache->reserved bytes are automatically reserved at the end of the slab for slab's metadata. Changed from v1: Export the reserved field via sysfs Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>