From da55e1b8ab1b32978beaaffe9c18e367da161966 Mon Sep 17 00:00:00 2001 From: Dan Magenheimer Date: Thu, 7 Jul 2011 07:37:19 -0700 Subject: staging: zcache: support multiple clients, prep for KVM and RAMster This is version 3 of an update to zcache, incorporating feedback from the list. This patch adds support to the in-kernel transcendent memory ("tmem") code and the zcache driver for multiple clients, which will be needed for both RAMster and KVM support. It also adds additional tmem callbacks to support RAMster and corresponding no-op stubs in the zcache driver. In v2, I've also taken the liberty of adding some additional sysfs variables to both surface information and allow policy control. Those experimenting with zcache should find them useful. V3 clarifies some code walking and declaring arrays. Signed-off-by: Dan Magenheimer [v3: error27@gmail.com: fix array bounds/walking] [v2: konrad.wilk@oracle.com: fix bools, add check for NULL, fix a comment] [v2: sjenning@linux.vnet.ibm.com: add info/tunables for poor compression] [v2: marcusklemm@googlemail.com: add tunable for max persistent pages] Acked-by: Dan Carpenter Cc: Nitin Gupta Cc: linux-mm@kvack.org Cc: kvm@vger.kernel.org Signed-off-by: Greg Kroah-Hartman staging: fix zcache building zcache is only building tmem.c and not building zcache.c. To keep the module name, zcache.c must be renamed if symbols from tmem.c are to remain unexported. Signed-off-by: Thadeu Lima de Souza Cascardo Acked-by: Dan Magenheimer Signed-off-by: Greg Kroah-Hartman staging: zcache: module is GPL This avoids tainting the kernel as if a proprietary module was loaded. The kernel will still be tainted because this is a staging driver. Signed-off-by: Thadeu Lima de Souza Cascardo Signed-off-by: Greg Kroah-Hartman staging: zcache: include module.h for MODULE_LICENSE The oncoming cleanup of module.h usage requires the explicit inclusion of module.h when it was otherwise being included indirectly. Otherwise, building zcache will fail. Reported-by: Stephen Rothwell Signed-off-by: Thadeu Lima de Souza Cascardo Signed-off-by: Greg Kroah-Hartman zcache: Use div_u64 for 64-bit division xv_get_total_size_bytes returns a u64 value and it's used in a division. This causes build failures in 32-bit architectures, as reported by Randy Dunlap. Reported-by: Randy Dunlap Signed-off-by: Thadeu Lima de Souza Cascardo Cc: Stephen Rothwell Cc: Dan Magenheimer Cc: Nitin Gupta Acked-by: Randy Dunlap Signed-off-by: Greg Kroah-Hartman zcache: Fix build error when sysfs is not defined Signed-off-by: Nitin Gupta Signed-off-by: Greg Kroah-Hartman staging: zcache: fix possible sleep under lock zcache_new_pool() calls kmalloc() with GFP_KERNEL which has __GFP_WAIT set. However, zcache_new_pool() gets called on a stack that holds the swap_lock spinlock, leading to a possible sleep-with-lock situation. The lock is obtained in enable_swap_info(). The patch replaces GFP_KERNEL with GFP_ATOMIC. v2: replace with GFP_ATOMIC, not GFP_IOFS Signed-off-by: Seth Jennings Acked-by: Dan Magenheimer Signed-off-by: Greg Kroah-Hartman staging: zcache: fix typos The patch fixes two typos in zcache-main.c Signed-off-by: Seth Jennings Acked-by: Dan Magenheimer Signed-off-by: Greg Kroah-Hartman staging: zcache: fix crash on high memory swap zcache_put_page() was modified to pass page_address(page) instead of the actual page structure. In combination with the function signature changes to tmem_put() and zcache_pampd_create(), zcache_pampd_create() tries to (re)derive the page structure from the virtual address. However, if the original page is a high memory page (or any unmapped page), this virt_to_page() fails because the page_address() in zcache_put_page() returned NULL. This patch changes zcache_put_page() and zcache_get_page() to pass the page structure instead of the page's virtual address, which may or may not exist. Signed-off-by: Seth Jennings Acked-by: Dan Magenheimer Signed-off-by: Greg Kroah-Hartman Staging: zcache: signedness bug in tmem_get() "ret" needs to be signed for the error handling to work properly. Signed-off-by: Dan Carpenter Acked-by: Dan Magenheimer Signed-off-by: Greg Kroah-Hartman staging: zcache: fix cleancache crash After commit c5f5c4db3938 ("staging: zcache: fix crash on high memory swap") cleancache crashes on the first successful get. This was caused by a remaining virt_to_page() call in zcache_pampd_get_data_and_free() that only gets run in the cleancache path. The patch converts the virt_to_page() to struct page casting like was done for other instances in c5f5c4db3938. Signed-off-by: Seth Jennings Tested-By: Valdis Kletnieks Acked-by: Dan Magenheimer Signed-off-by: Greg Kroah-Hartman Signed-off-by: Linus Torvalds staging: zcache: fix crash on cpu remove In the case that a cpu is taken offline before zcache_do_preload() is ever called on the cpu, the per-cpu zcache_preloads structure will be uninitialized. In the CPU_DEAD case for zcache_cpu_notifier(), kp->obj is not checked before calling kmem_cache_free() on it. If it is NULL, a crash results. This patch ensures that both kp->obj and kp->page are not NULL before calling the respective free functions. In practice, just checking one or the other should be sufficient since they are assigned together in zcache_do_preload(), but I check both for safety. Signed-off-by: Seth Jennings Acked-by: Dave Hansen Signed-off-by: Greg Kroah-Hartman staging: zcache: reduce tmem bucket lock contention tmem uses hash buckets each with their own rbtree and lock to quickly lookup tmem objects. tmem has TMEM_HASH_BUCKETS (256) buckets per pool. However, because of the way the tmem_oid is generated for frontswap pages, only 16 unique tmem_oids are being generated, resulting in only 16 of the 256 buckets being used. This cause high lock contention for the per bucket locks. This patch changes SWIZ_BITS to include more bits of the offset. The result is that all 256 hash buckets are potentially used resulting in a 95% drop in hash bucket lock contention. Signed-off-by: Seth Jennings Acked-by: Dan Magenheimer Signed-off-by: Greg Kroah-Hartman staging: zcache: remove zcache_direct_reclaim_lock zcache_do_preload() currently does a spin_trylock() on the zcache_direct_reclaim_lock. Holding this lock intends to prevent shrink_zcache_memory() from evicting zbud pages as a result of a preload. However, it also prevents two threads from executing zcache_do_preload() at the same time. The first thread will obtain the lock and the second thread's spin_trylock() will fail (an aborted preload) causing the page to be either lost (cleancache) or pushed out to the swap device (frontswap). It also doesn't ensure that the call to shrink_zcache_memory() is on the same thread as the call to zcache_do_preload(). Additional, there is no need for this mechanism because all zcache_do_preload() calls that come down from cleancache already have PF_MEMALLOC set in the process flags which prevents direct reclaim in the memory manager. If the zcache_do_preload() call is done from the frontswap path, we _want_ reclaim to be done (which it isn't right now). This patch removes the zcache_direct_reclaim_lock and related statistics in zcache. Based on v3.1-rc8 Signed-off-by: Seth Jennings Reviewed-by: Dave Hansen Acked-by: Dan Magenheimer Signed-off-by: Greg Kroah-Hartman Staging: zcache: Fix calls to obsolete function Function "strict_strtol" replaced by "kstrtol" as suggested by the checkpatch script Signed-off-by: Bernhard Heinloth Signed-off-by: Greg Kroah-Hartman zcache: fix deadlock condition I discovered this deadlock condition awhile ago working on RAMster but it affects zcache as well. The list spinlock must be locked prior to the page spinlock and released after. As a result, the page copy must also be done while the locks are held. Applies to 3.2. Konrad, please push (via GregKH?)... this is definitely a bug fix so need not be pushed during a -rc0 window. Signed-off-by: Dan Magenheimer Acked-by: Konrad Rzeszutek Wilk Cc: stable Signed-off-by: Greg Kroah-Hartman zcache: Set SWIZ_BITS to 8 to reduce tmem bucket lock contention. SWIZ_BITS > 8 results in a much larger number of "tmem_obj" allocations, likely one per page-placed-in-frontswap. The tmem_obj is not huge (roughly 100 bytes), but it is large enough to add a not-insignificant memory overhead to zcache. The SWIZ_BITS=8 will get roughly the same lock contention without the space wastage. The effect of SWIZ_BITS can be thought of as "2^SWIZ_BITS is the number of unique oids that be generated" (This concept is limited to frontswap's use of tmem). Acked-by: Seth Jennings Signed-off-by: Konrad Rzeszutek Wilk Cc: stable Signed-off-by: Greg Kroah-Hartman staging: zcache: fix serialization bug in zv stats In a multithreaded workload, the zv_curr_dist_counts and zv_cumul_dist_counts statistics are being corrupted because the increments and decrements in zv_create and zv_free are not atomic. This patch converts these statistics and their corresponding increments/decrements/reads to atomic operations. Signed-off-by: Seth Jennings Acked-by: Dan Magenheimer Signed-off-by: Greg Kroah-Hartman staging: zcache: crypto API support This patch allow zcache to use the crypto API for page compression. It replaces the direct LZO compress/decompress calls with calls into the crypto compression API. The compressor to be used is specified in the kernel boot line with the zcache parameter like: zcache=lzo or zcache=deflate. If the specified compressor can't be loaded, zcache uses lzo as the default compressor. Signed-off-by: Seth Jennings Acked-by: Dan Magenheimer Signed-off-by: Greg Kroah-Hartman staging: zcache: avoid AB-BA deadlock condition Commit 9256a47 fixed a deadlock condition, being sure that the buddy list spinlock is always taken before the page spinlock. However in zbud_free_and_delist() locking order is the opposite (page lock -> list lock). Possible unsafe locking scenario (reported by lockdep): CPU0 CPU1 ---- ---- lock(&(&zbpg->lock)->rlock); lock(zbud_budlists_spinlock); lock(&(&zbpg->lock)->rlock); lock(zbud_budlists_spinlock); Fix by grabbing the locks in opposite order in zbud_free_and_delist(). Signed-off-by: Andrea Righi Cc: stable Signed-off-by: Greg Kroah-Hartman --- drivers/staging/zcache/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'drivers/staging/zcache/Makefile') diff --git a/drivers/staging/zcache/Makefile b/drivers/staging/zcache/Makefile index f5ec64f..60daa27 100644 --- a/drivers/staging/zcache/Makefile +++ b/drivers/staging/zcache/Makefile @@ -1,3 +1,3 @@ -zcache-y := tmem.o +zcache-y := zcache-main.o tmem.o obj-$(CONFIG_ZCACHE) += zcache.o -- cgit v1.1