aboutsummaryrefslogtreecommitdiffstats
path: root/fs/gfs2/glock.c
Commit message (Collapse)AuthorAgeFilesLines
* getting newer filesystem code workingWolfgang Wiedmeyer2015-10-231-10/+29
|
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixesLinus Torvalds2011-06-071-2/+7
|\ | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes: GFS2: Processes waiting on inode glock that no processes are holding
| * GFS2: Processes waiting on inode glock that no processes are holdingBob Peterson2011-05-251-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes a race in the GFS2 glock state machine that may result in lockups. The symptom is that all nodes but one will hang, waiting for a particular glock. All the holder records will have the "W" (Waiting) bit set. The other node will typically have the glock stuck in Exclusive mode (EX) with no holder records, but the dinode will be cached. In other words, an entry with "I:" will appear in the glock dump for that glock, but nothing else. The race has to do with the glock "Pending Demote" bit, which can be set, then immediately reset, thus losing the fact that another node needs the glock. The sequence of events is: 1. Something schedules the glock workqueue (e.g. glock request from fs) 2. The glock workqueue gets to the point between the test of the reply pending bit and the spin lock: if (test_and_clear_bit(GLF_REPLY_PENDING, &gl->gl_flags)) { finish_xmote(gl, gl->gl_reply); drop_ref = 1; } down_read(&gfs2_umount_flush_sem); <---- i.e. here spin_lock(&gl->gl_spin); 3. In comes (a) the reply to our EX lock request setting GLF_REPLY_PENDING and (b) the demote request which sets GLF_PENDING_DEMOTE 4. The following test is executed: if (test_and_clear_bit(GLF_PENDING_DEMOTE, &gl->gl_flags) && gl->gl_state != LM_ST_UNLOCKED && gl->gl_demote_state != LM_ST_EXCLUSIVE) { This resets the pending demote flag, and gl->gl_demote_state is not equal to exclusive, however because the reply from the dlm arrived after we checked for the GLF_REPLY_PENDING flag, gl->gl_state is still equal to unlocked, so although we reset the GLF_PENDING_DEMOTE flag, we didn't then set the GLF_DEMOTE flag or reinstate the GLF_PENDING_DEMOTE_FLAG. The patch closes the timing window by only transitioning the "Pending demote" bit to the "demote" flag once we know the other conditions (not unlocked and not exclusive) are met. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* | vmscan: change shrinker API by passing shrink_control structYing Han2011-05-251-1/+4
|/ | | | | | | | | | | | | | | | | | | | | | | | | Change each shrinker's API by consolidating the existing parameters into shrink_control struct. This will simplify any further features added w/o touching each file of shrinker. [akpm@linux-foundation.org: fix build] [akpm@linux-foundation.org: fix warning] [kosaki.motohiro@jp.fujitsu.com: fix up new shrinker API] [akpm@linux-foundation.org: fix xfs warning] [akpm@linux-foundation.org: update gfs2] Signed-off-by: Ying Han <yinghan@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Minchan Kim <minchan.kim@gmail.com> Acked-by: Pavel Emelyanov <xemul@openvz.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Acked-by: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmwLinus Torvalds2011-05-201-46/+48
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (32 commits) GFS2: Move all locking inside the inode creation function GFS2: Clean up symlink creation GFS2: Clean up mkdir GFS2: Use UUID field in generic superblock GFS2: Rename ops_inode.c to inode.c GFS2: Inode.c is empty now, remove it GFS2: Move final part of inode.c into super.c GFS2: Move most of the remaining inode.c into ops_inode.c GFS2: Move gfs2_refresh_inode() and friends into glops.c GFS2: Remove gfs2_dinode_print() function GFS2: When adding a new dir entry, inc link count if it is a subdir GFS2: Make gfs2_dir_del update link count when required GFS2: Don't use gfs2_change_nlink in link syscall GFS2: Don't use a try lock when promoting to a higher mode GFS2: Double check link count under glock GFS2: Improve bug trap code in ->releasepage() GFS2: Fix ail list traversal GFS2: make sure fallocate bytes is a multiple of blksize GFS2: Add an AIL writeback tracepoint GFS2: Make writeback more responsive to system conditions ...
| * GFS2: Don't use a try lock when promoting to a higher modeSteven Whitehouse2011-05-051-5/+0
| | | | | | | | | | | | | | | | | | | | Previously we marked all locks being promoted to a higher mode with the try flag to avoid any potential deadlocks issues. The DLM is able to detect these and report them in way that GFS2 can deal with them correctly. So we can just request the required mode and wait for a response without needing to perform this check. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * GFS2: Make writeback more responsive to system conditionsSteven Whitehouse2011-04-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds writeback_control to writing back the AIL list. This means that we can then take advantage of the information we get in ->write_inode() in order to set off some pre-emptive writeback. In addition, the AIL code is cleaned up a bit to make it a bit simpler to understand. There is still more which can usefully be done in this area, but this is a good start at least. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * GFS2: Optimise glock lru and end of life inodesSteven Whitehouse2011-04-201-20/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The GLF_LRU flag introduced in the previous patch can be used to check if a glock is on the lru list when a new holder is queued and if so remove it, without having first to get the lru_lock. The main purpose of this patch however is to optimise the glocks left over when an inode at end of life is being evicted. Previously such glocks were left with the GLF_LFLUSH flag set, so that when reclaimed, each one required a log flush. This patch resets the GLF_LFLUSH flag when there is nothing left to flush thus preventing later log flushes as glocks are reused or demoted. In order to do this, we need to keep track of the number of revokes which are outstanding, and also to clear the GLF_LFLUSH bit after a log commit when only revokes have been processed. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * GFS2: Improve tracing support (adds two flags)Steven Whitehouse2011-04-201-2/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds support for two new flags. One keeps track of whether the glock is on the LRU list or not. The other isn't really a flag as such, but an indication of whether the glock has an attached object or not. This indication is reported without any locking, which is ok since we do not dereference the object pointer but merely report whether it is NULL or not. Also, this fixes one place where a tracepoint was missing, which was at the point we remove deallocated blocks from the journal. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * GFS2: Alter point of entry to glock lru list for glocks with an address_spaceSteven Whitehouse2011-04-201-18/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rather than allowing the glocks to be scheduled for possible reclaim as soon as they have exited the journal, this patch delays their entry to the list until the glocks in question are no longer in use. This means that we will rely on the vm for writeback of all dirty data and metadata from now on. When glocks are added to the lru list they should be freeable much faster since all the I/O required to free them should have already been completed. This should lead to much better I/O patterns under low memory conditions. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* | add hlist_bl_lock/unlock helpersChristoph Hellwig2011-04-251-4/+2
|/ | | | | | | | | | | | Now that the whole dcache_hash_bucket crap is gone, go all the way and also remove the weird locking layering violations for locking the hash buckets. Add hlist_bl_lock/unlock helpers to move the locking into the list abstraction instead of requiring each caller to open code it. After all allowing for the bit locks is the whole point of these helpers over the plain hlist variant. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Fix common misspellingsLucas De Marchi2011-03-311-1/+1
| | | | | | Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmwLinus Torvalds2011-03-161-252/+158
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: GFS2: Don't use _raw version of RCU dereference GFS2: Adding missing unlock_page() GFS2: Update to AIL list locking GFS2: introduce AIL lock GFS2: fix block allocation check for fallocate GFS2: Optimize glock multiple-dequeue code GFS2: Remove potential race in flock code GFS2: Fix glock deallocation race GFS2: quota allows exceeding hard limit GFS2: deallocation performance patch GFS2: panics on quotacheck update GFS2: Improve cluster mmap scalability GFS2: Fix glock queue trace point GFS2: Post-VFS scale update for RCU path walk GFS2: Use RCU for glock hash table
| * GFS2: Don't use _raw version of RCU dereferenceSteven Whitehouse2011-03-151-1/+1
| | | | | | | | | | | | | | | | | | As per RCU glock patch review comments, don't use the _raw version of this function here. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
| * GFS2: Optimize glock multiple-dequeue codeBob Peterson2011-03-111-8/+4
| | | | | | | | | | | | | | | | | | | | | | This is a small patch that optimizes multiple glock dequeue operations. It changes the unlock order to be more efficient and makes it easier for lock debugging tools to unravel. It also eliminates the need for the temp variable x, although that would likely be optimized out. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * GFS2: Fix glock deallocation raceSteven Whitehouse2011-03-091-2/+8
| | | | | | | | | | | | | | | | | | | | | | This patch fixes a race in deallocating glocks which was introduced in the RCU glock patch. We need to ensure that the glock count is kept correct even in the case that there is a race to add a new glock into the hash table. Also, to avoid having to wait for an RCU grace period, the glock counter can be decremented before call_rcu() is called. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * GFS2: Fix glock queue trace pointSteven Whitehouse2011-01-311-1/+1
| | | | | | | | | | | | | | Somehow this tracepoint landed up in the wrong place. This moves it to where it should be. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * GFS2: Use RCU for glock hash tableSteven Whitehouse2011-01-211-243/+147
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This has a number of advantages: - Reduces contention on the hash table lock - Makes the code smaller and simpler - Should speed up glock dumps when under load - Removes ref count changing in examine_bucket - No longer need hash chain lock in glock_put() in common case There are some further changes which this enables and which we may do in the future. One is to look at using SLAB_RCU, and another is to look at using a per-cpu counter for the per-sb glock counter, since that is touched twice in the lifetime of each glock (but only used at umount time). Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
* | workqueue, freezer: unify spelling of 'freeze' + 'able' to 'freezable'Tejun Heo2011-02-161-2/+2
|/ | | | | | | | | | | | | | | | | There are two spellings in use for 'freeze' + 'able' - 'freezable' and 'freezeable'. The former is the more prominent one. The latter is mostly used by workqueue and in a few other odd places. Unify the spelling to 'freezable'. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Alan Stern <stern@rowland.harvard.edu> Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Acked-by: Dmitry Torokhov <dtor@mail.ru> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Alex Dubov <oakad@yahoo.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Merge glock state fields into a bitfieldSteven Whitehouse2010-11-301-2/+7
| | | | | | | | | | | | | | | | | | We can only merge the fields into a bitfield if the locking rules for them are the same. In this case gl_spin covers all of the fields (write side) but a couple of them are used with GLF_LOCK as the read side lock, which should be ok since we know that the field in question won't be changing at the time. The gl_req setting has to be done earlier (in glock.c) in order to place it under gl_spin. The gl_reply setting also has to be brought under gl_spin in order to comply with the new rules. This saves 4*sizeof(unsigned int) per glock. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Bob Peterson <rpeterso@redhat.com>
* GFS2: Clean up of gdlm_lock functionSteven Whitehouse2010-11-301-22/+9
| | | | | | | | | | | | The DLM never returns -EAGAIN in response to dlm_lock(), and even if it did, the test in gdlm_lock() was wrong anyway. Once that test is removed, it is possible to greatly simplify this code by simply using a "normal" error return code (0 for success). We then no longer need the LM_OUT_ASYNC return code which can be removed. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: fs/gfs2/glock.c: Use printf extension %pVJoe Perches2010-11-301-2/+7
| | | | | | | | | Using %pV reduces the number of printk calls and eliminates any possible message interleaving from other printk calls. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: fs/gfs2/glock.c: Convert sprintf_symbol to %pSJoe Perches2010-11-301-8/+7
| | | | | Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Change two WQ_RESCUERs into WQ_MEM_RECLAIMSteven Whitehouse2010-11-301-3/+4
| | | | | | | | The WQ_RESCUER flag should only be used internally to the workqueue implementation. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Acked-by: Tejun Heo <tj@kernel.org>
* GFS2: Fix inode deallocation raceSteven Whitehouse2010-11-151-11/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | This area of the code has always been a bit delicate due to the subtleties of lock ordering. The problem is that for "normal" alloc/dealloc, we always grab the inode locks first and the rgrp lock later. In order to ensure no races in looking up the unlinked, but still allocated inodes, we need to hold the rgrp lock when we do the lookup, which means that we can't take the inode glock. The solution is to borrow the technique already used by NFS to solve what is essentially the same problem (given an inode number, look up the inode carefully, checking that it really is in the expected state). We cannot do that directly from the allocation code (lock ordering again) so we give the job to the pre-existing delete workqueue and carry on with the allocation as normal. If we find there is no space, we do a journal flush (required anyway if space from a deallocation is to be released) which should block against the pending deallocations, so we should always get the space back. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Fix spectator umount issueSteven Whitehouse2010-09-291-1/+1
| | | | | | | | | | The tests further down the recovery function relating to unlocking the journal need to be updated to match the intial test. Also, a test in the umount code which was surplus to requirements has been removed. Umounting spectator mounts now works correctly, as expected. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Use new workqueue schemeSteven Whitehouse2010-09-201-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | The recovery workqueue can be freezable since we want it to finish what it is doing if the system is to be frozen (although why you'd want to freeze a cluster node is beyond me since it will result in it being ejected from the cluster). It does still make sense for single node GFS2 filesystems though. The glock workqueue will benefit from being able to run more work items concurrently. A test running postmark shows improved performance and multi-threaded workloads are likely to benefit even more. It needs to be high priority because the latency directly affects the latency of filesystem glock operations. The delete workqueue is similar to the recovery workqueue in that it must not get blocked by memory allocations, and may run for a long time. Potentially other GFS2 threads might also be converted to workqueues, but I'll leave that for a later patch. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Acked-by: Tejun Heo <tj@kernel.org>
* GFS2: Don't enforce min hold time when two demotes occur in rapid successionSteven Whitehouse2010-09-201-4/+11
| | | | | | | | | | | | | | | | | | | | | Due to the design of the VFS, it is quite usual for operations on GFS2 to consist of a lookup (requiring a shared lock) followed by an operation requiring an exclusive lock. If a remote node has cached an exclusive lock, then it will receive two demote events in rapid succession firstly for a shared lock and then to unlocked. The existing min hold time code was triggering in this case, even if the node was otherwise idle since the state change time was being updated by the initial demote. This patch introduces logic to skip the min hold timer in the case that a "double demote" of this kind has occurred. The min hold timer will still be used in all other cases. A new glock flag is introduced which is used to keep track of whether there have been any newly queued holders since the last glock state change. The min hold time is only applied if the flag is set. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Tested-by: Abhijith Das <adas@redhat.com>
* GFS2: Fix recovery stuck bug (try #2)Steven Whitehouse2010-08-021-7/+39
| | | | | | | | | This is a clean up of the code which deals with LM_FLAG_NOEXP which aims to remove any possible race conditions by using gl_spin to cover the gap between testing for the LM_FLAG_NOEXP and the GL_FROZEN flag. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* Revert "GFS2: recovery stuck on transaction lock"Steven Whitehouse2010-07-291-10/+0
| | | | | | | | | | This reverts commit b7dc2df5725fe7355fd76000ead7e39728e1b8a9. The initial patch didn't quite work since it doesn't cover all the possible routes by which the GLF_FROZEN flag might be set. A revised fix is coming up in the next patch. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Make "try" lock not try quite so hardSteven Whitehouse2010-07-291-24/+25
| | | | | | | | | | | | | | | | | | | This looks like a big change, but in reality its only a single line of actual code change, the rest is just moving a function to before its new caller. The "try" flag for glocks is a rather subtle and delicate setting since it requires that the state machine tries just hard enough to ensure that it has a good chance of getting the requested lock, but no so hard that the request can land up blocked behind another. The patch adds in an additional check which will fail any queued try locks if there is another request blocking the try lock request which is not granted and compatible, nor in progress already. The check is made only after all pending locks which may be granted have been granted. I've checked this with the reproducer for the reported flock bug which this is intended to fix, and it now passes. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* mm: add context argument to shrinker callbackDave Chinner2010-07-191-1/+1
| | | | | | | | | | | | The current shrinker implementation requires the registered callback to have global state to work from. This makes it difficult to shrink caches that are not global (e.g. per-filesystem caches). Pass the shrinker structure to the callback so that users can embed the shrinker structure in the context the shrinker needs to operate on and get back to it in the callback via container_of(). Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
* GFS2: recovery stuck on transaction lockBob Peterson2010-07-151-0/+10
| | | | | | | | | | | | | | | | This patch fixes bugzilla bug #590878: GFS2: recovery stuck on transaction lock. We set the frozen flag on the glock when we receive a completion that cannot be delivered due to blocked locks. At that point we check to see whether the first waiting holder has the noexp flag set. If the noexp lock is queued later, then we need to unfreeze the glock at that point in time, namely, in the glock work function. This patch was originally written by Steve Whitehouse, but since he's on holiday, I'm submitting it. It's been well tested with a complex recovery test called revolver. Signed-off-by: Steve Whitehouse <swhiteho@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
* GFS2: glock livelockBob Peterson2010-04-141-0/+3
| | | | | | | | | | | | | | This patch fixes a couple gfs2 problems with the reclaiming of unlinked dinodes. First, there were a couple of livelocks where everything would come to a halt waiting for a glock that was seemingly held by a process that no longer existed. In fact, the process did exist, it just had the wrong pid number in the holder information. Second, there was a lock ordering problem between inode locking and glock locking. Third, glock/inode contention could sometimes cause inodes to be improperly marked invalid by iget_failed. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
* GFS2: print glock numbers in hexBob Peterson2010-03-011-1/+1
| | | | | | | | | This patch changes glock numbers from printing in decimal to hex. Since DLM prints corresponding resource IDs in hex, it makes debugging easier. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Remove loopy umount codeSteven Whitehouse2010-03-011-31/+2
| | | | | | | | | | | | | As a consequence of the previous patch, we can now remove the loop which used to be required due to the circular dependency between the inodes and glocks. Instead we can just invalidate the inodes, and then clear up any glocks which are left. Also we no longer need the rwsem since there is no longer any danger of the inode invalidation calling back into the glock code (and from there back into the inode code). Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Metadata address space clean upSteven Whitehouse2010-03-011-19/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since the start of GFS2, an "extra" inode has been used to store the metadata belonging to each inode. The only reason for using this inode was to have an extra address space, the other fields were unused. This means that the memory usage was rather inefficient. The reason for keeping each inode's metadata in a separate address space is that when glocks are requested on remote nodes, we need to be able to efficiently locate the data and metadata which relating to that glock (inode) in order to sync or sync and invalidate it (depending on the remotely requested lock mode). This patch adds a new type of glock, which has in addition to its normal fields, has an address space. This applies to all inode and rgrp glocks (but to no other glock types which remain as before). As a result, we no longer need to have the second inode. This results in three major improvements: 1. A saving of approx 25% of memory used in caching inodes 2. A removal of the circular dependency between inodes and glocks 3. No confusion between "normal" and "metadata" inodes in super.c Although the first of these is the more immediately apparent, the second is just as important as it now enables a number of clean ups at umount time. Those will be the subject of future patches. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Extend umount wait coverage to full glock lifetimeSteven Whitehouse2010-02-031-0/+4
| | | | | | | | | | | | | | Although all glocks are, by the time of the umount glock wait, scheduled for demotion, some of them haven't made it far enough through the process for the original set of waiting code to wait for them. This extends the ref count to the whole glock lifetime in order to ensure that the waiting does catch all glocks. It does make it a bit more invasive, but it seems the only sensible solution at the moment. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Fix glock refcount issuesSteven Whitehouse2009-12-031-13/+8
| | | | | | | | | | | | | | This patch fixes some ref counting issues. Firstly by moving the point at which we drop the ref count after a dlm lock operation has completed we ensure that we never call gfs2_glock_hold() on a lock with a zero ref count. Secondly, by using atomic_dec_and_lock() in gfs2_glock_put() we ensure that at no time will a glock with zero ref count appear on the lru_list. That means that we can remove the check for this in our shrinker (which was racy). Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Fix potential race in glock codeSteven Whitehouse2009-12-031-2/+8
| | | | | | | We need to be careful of the ordering between clearing the GLF_LOCK bit and scheduling the workqueue. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: remove dcache entries for remote deleted inodesBenjamin Marzinski2009-07-301-5/+38
| | | | | | | | | | | | | When a file is deleted from a gfs2 filesystem on one node, a dcache entry for it may still exist on other nodes in the cluster. If this happens, gfs2 will be unable to free this file on disk. Because of this, it's possible to have a gfs2 filesystem with no files on it and no free space. With this patch, when a node receives a callback notifying it that the file is being deleted on another node, it schedules a new workqueue thread to remove the file's dcache entry. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Don't put unlikely reclaim candidates on the reclaim list.Benjamin Marzinski2009-07-301-26/+46
| | | | | | | | | | | GFS2 was placing far too many glocks on the reclaim list that were not good candidates for freeing up from cache. These locks would sit there and repeatedly get scanned to see if they could be reclaimed, wasting a lot of time when there was memory pressure. This fix does more checks on the locks to see if they are actually likely to be removable from cache. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Fix panic in glock memory shrinkerBenjamin Marzinski2009-07-301-0/+4
| | | | | | | | | | | | | | | | It is possible for gfs2_shrink_glock_memory() to check a glock for demotion that's in the process of being freed by gfs2_glock_put(). In this case, gfs2_shrink_glock_memory() will acquire a new reference to this glock, and then try to free the glock itself when it drops the refernce. To solve this, gfs2_shrink_glock_memory() just needs to check if the glock is in the process of being freed, and if so skip it without ever unlocking the lru_lock. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Acked-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Shrink the shrinkerSteven Whitehouse2009-07-301-18/+5
| | | | | | | | | | This patch removes some of the special cases that the shrinker was trying to deal with. As a result we leave fewer items on the list and none at all which cannot be demoted. This makes the list scanning more efficient and solves some issues seen with large numbers of inodes. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Add tracepointsSteven Whitehouse2009-06-121-1/+11
| | | | | | | | | | | This patch adds the ability to trace various aspects of the GFS2 filesystem. The trace points are divided into three groups, glocks, logging and bmap. These points have been chosen because they allow inspection of the major internal functions of GFS2 and they are also generic enough that they are unlikely to need any major changes as the filesystem evolves. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Umount recovery race fixSteven Whitehouse2009-05-191-3/+18
| | | | | | | | | | | | | | | | | | | This patch fixes a race condition where we can receive recovery requests part way through processing a umount. This was causing problems since the recovery thread had already gone away. Looking in more detail at the recovery code, it was really trying to implement a slight variation on a work queue, and that happens to align nicely with the recently introduced slow-work subsystem. As a result I've updated the code to use slow-work, rather than its own home grown variety of work queue. When using the wait_on_bit() function, I noticed that the wait function that was supplied as an argument was appearing in the WCHAN field, so I've updated the function names in order to produce more meaningful output. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Fix glock ref counting bugSteven Whitehouse2009-05-091-0/+1
| | | | | | | | | Depending on the ordering of events as we go around the glock shrinker loop, it is possible to drop the ref count of a glock incorrectly. It doesn't happen very often. This patch corrects the got_ref variable, fixing the problem. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Move umount flush rwsemSteven Whitehouse2009-04-151-5/+5
| | | | | | | | The rwsem, used only on umount, is in the wrong place in glock.c. This patch moves it up a bit so that it does not get called under a spinlock. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Add a "demote a glock" interface to sysfsSteven Whitehouse2009-03-241-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a sysfs file called demote_rq to GFS2's per filesystem directory. Its possible to use this file to demote arbitrary glocks in exactly the same way as if a request had come in from a remote node. This is intended for testing issues relating to caching of data under glocks. Despite that, the interface is generic enough to send requests to any type of glock, but be careful as its not always safe to send an arbitrary message to an arbitrary glock. For that reason and to prevent DoS, this interface is restricted to root only. The messages look like this: <type>:<glocknumber> <mode> Example: echo -n "2:13324 EX" >/sys/fs/gfs2/unity:myfs/demote_rq Which means "please demote inode glock (type 2) number 13324 so that I can get an EX (exclusive) lock". The lock modes are those which would normally be sent by a remote node in its callback so if you want to unlock a glock, you use EX, to demote to shared, use SH or PR (depending on whether you like GFS2 or DLM lock modes better!). If the glock doesn't exist, you'll get -ENOENT returned. If the arguments don't make sense, you'll get -EINVAL returned. The plan is that this interface will be used in combination with the blktrace patch which I recently posted for comments although it is, of course, still useful in its own right. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Fix deadlock on journal flushSteven Whitehouse2009-03-241-14/+12
| | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes a deadlock when the journal is flushed and there are dirty inodes other than the one which caused the journal flush. Originally the journal flushing code was trying to obtain the transaction glock while running the flush code for an inode glock. We no longer require the transaction glock at this point in time since we know that any attempt to get the transaction glock from another node will result in a journal flush. So if we are flushing the journal, we can be sure that the transaction lock is still cached from when the transaction was started. By inlining a version of gfs2_trans_begin() (minus the bit which gets the transaction glock) we can avoid the deadlock problems caused if there is a demote request queued up on the transaction glock. In addition I've also moved the umount rwsem so that it covers the glock workqueue, since it all demotions are done by this workqueue now. That fixes a bug on umount which I came across while fixing the original problem. Reported-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>