aboutsummaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
* UBIFS: introduce categorized lprops counterArtem Bityutskiy2012-11-262-0/+9
| | | | | | | | | | | commit 98a1eebda3cb2a84ecf1f219bb3a95769033d1bf upstream. This commit is a preparation for a subsequent bugfix. We introduce a counter for categorized lprops. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* UBIFS: fix mounting problems after power cutsArtem Bityutskiy2012-11-261-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit a28ad42a4a0c6f302f488f26488b8b37c9b30024 upstream. This is a bugfix for a problem with the following symptoms: 1. A power cut happens 2. After reboot, we try to mount UBIFS 3. Mount fails with "No space left on device" error message UBIFS complains like this: UBIFS error (pid 28225): grab_empty_leb: could not find an empty LEB The root cause of this problem is that when we mount, not all LEBs are categorized. Only those which were read are. However, the 'ubifs_find_free_leb_for_idx()' function assumes that all LEBs were categorized and 'c->freeable_cnt' is valid, which is a false assumption. This patch fixes the problem by teaching 'ubifs_find_free_leb_for_idx()' to always fall back to LPT scanning if no freeable LEBs were found. This problem was reported by few people in the past, but Brent Taylor was able to reproduce it and send me a flash image which cannot be mounted, which made it easy to hunt the bug. Kudos to Brent. Reported-by: Brent Taylor <motobud@gmail.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* fanotify: fix missing breakEric Paris2012-11-261-0/+1
| | | | | | | | | | | | | | | | | | | | | | | commit 848561d368751a1c0f679b9f045a02944506a801 upstream. Anders Blomdell noted in 2010 that Fanotify lost events and provided a test case. Eric Paris confirmed it was a bug and posted a fix to the list https://groups.google.com/forum/?fromgroups=#!topic/linux.kernel/RrJfTfyW2BE but never applied it. Repeated attempts over time to actually get him to apply it have never had a reply from anyone who has raised it So apply it anyway Signed-off-by: Alan Cox <alan@linux.intel.com> Reported-by: Anders Blomdell <anders.blomdell@control.lth.se> Cc: Eric Paris <eparis@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* xfs: fix reading of wrapped log dataDave Chinner2012-11-171-1/+1
| | | | | | | | | | | | | | | | | | | | commit 6ce377afd1755eae5c93410ca9a1121dfead7b87 upstream. Commit 4439647 ("xfs: reset buffer pointers before freeing them") in 3.0-rc1 introduced a regression when recovering log buffers that wrapped around the end of log. The second part of the log buffer at the start of the physical log was being read into the header buffer rather than the data buffer, and hence recovery was seeing garbage in the data buffer when it got to the region of the log buffer that was incorrectly read. Reported-by: Torsten Kaiser <just.for.lkml@googlemail.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* NFS: Fix Oopses in nfs_lookup_revalidate and nfs4_lookup_revalidateTrond Myklebust2012-11-171-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [Fixed upstream as part of 0b728e1911c, but that's a much larger patch, this is only the nfs portion backported as needed.] Fix the following Oops in 3.5.1: BUG: unable to handle kernel NULL pointer dereference at 0000000000000038 IP: [<ffffffffa03789cd>] nfs_lookup_revalidate+0x2d/0x480 [nfs] PGD 337c63067 PUD 0 Oops: 0000 [#1] SMP CPU 5 Modules linked in: nfs fscache nfsd lockd nfs_acl auth_rpcgss sunrpc af_packet binfmt_misc cpufreq_conservative cpufreq_userspace cpufreq_powersave dm_mod acpi_cpufreq mperf coretemp gpio_ich kvm_intel joydev kvm ioatdma hid_generic igb lpc_ich i7core_edac edac_core ptp serio_raw dca pcspkr i2c_i801 mfd_core sg pps_core usbhid crc32c_intel microcode button autofs4 uhci_hcd ttm drm_kms_helper drm i2c_algo_bit sysimgblt sysfillrect syscopyarea ehci_hcd usbcore usb_common scsi_dh_rdac scsi_dh_emc scsi_dh_hp_sw scsi_dh_alua scsi_dh edd fan ata_piix thermal processor thermal_sys Pid: 30431, comm: java Not tainted 3.5.1-2-default #1 Supermicro X8DTT/X8DTT RIP: 0010:[<ffffffffa03789cd>] [<ffffffffa03789cd>] nfs_lookup_revalidate+0x2d/0x480 [nfs] RSP: 0018:ffff8801b418bd38 EFLAGS: 00010292 RAX: 00000000fffffff6 RBX: ffff88032016d800 RCX: 0000000000000020 RDX: ffffffff00000000 RSI: 0000000000000000 RDI: ffff8801824a7b00 RBP: ffff8801b418bdf8 R08: 7fffff0034323030 R09: fffffffff04c03ed R10: ffff8801824a7b00 R11: 0000000000000002 R12: ffff8801824a7b00 R13: ffff8801824a7b00 R14: 0000000000000000 R15: ffff8803201725d0 FS: 00002b53a46cb700(0000) GS:ffff88033fc20000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000038 CR3: 000000020a426000 CR4: 00000000000007e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process java (pid: 30431, threadinfo ffff8801b418a000, task ffff8801b5d20600) Stack: ffff8801b418be44 ffff88032016d800 ffff8801b418bdf8 0000000000000000 ffff8801824a7b00 ffff8801b418bdd7 ffff8803201725d0 ffffffff8116a9c0 ffff8801b5c38dc0 0000000000000007 ffff88032016d800 0000000000000000 Call Trace: [<ffffffff8116a9c0>] lookup_dcache+0x80/0xe0 [<ffffffff8116aa43>] __lookup_hash+0x23/0x90 [<ffffffff8116b4a5>] lookup_one_len+0xc5/0x100 [<ffffffffa03869a3>] nfs_sillyrename+0xe3/0x210 [nfs] [<ffffffff8116cadf>] vfs_unlink.part.25+0x7f/0xe0 [<ffffffff8116f22c>] do_unlinkat+0x1ac/0x1d0 [<ffffffff815717b9>] system_call_fastpath+0x16/0x1b [<00002b5348b5f527>] 0x2b5348b5f526 Code: ec 38 b8 f6 ff ff ff 4c 89 64 24 18 4c 89 74 24 28 49 89 fc 48 89 5c 24 08 48 89 6c 24 10 49 89 f6 4c 89 6c 24 20 4c 89 7c 24 30 <f6> 46 38 40 0f 85 d1 00 00 00 e8 c4 c4 df e0 48 8b 58 30 49 89 RIP [<ffffffffa03789cd>] nfs_lookup_revalidate+0x2d/0x480 [nfs] RSP <ffff8801b418bd38> CR2: 0000000000000038 ---[ end trace 845113ed191985dd ]--- This Oops affects 3.5 kernels and older, and is due to lookup_one_len() calling down to the dentry revalidation code with a NULL pointer to struct nameidata. It is fixed upstream by commit 0b728e1911c (stop passing nameidata * to ->d_revalidate()) Reported-by: Richard Ems <richard.ems@cape-horn-eng.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* NFS: fix bug in legacy DNS resolver.NeilBrown2012-11-171-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | commit 8d96b10639fb402357b75b055b1e82a65ff95050 upstream. The DNS resolver's use of the sunrpc cache involves a 'ttl' number (relative) rather that a timeout (absolute). This confused me when I wrote commit c5b29f885afe890f953f7f23424045cdad31d3e4 "sunrpc: use seconds since boot in expiry cache" and I managed to break it. The effect is that any TTL is interpreted as 0, and nothing useful gets into the cache. This patch removes the use of get_expiry() - which really expects an expiry time - and uses get_uint() instead, treating the int correctly as a ttl. This fixes a regression that has been present since 2.6.37, causing certain NFS accesses in certain environments to incorrectly fail. Reported-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* nfsd: add get_uint for u32'sJ. Bruce Fields2012-11-171-3/+3
| | | | | | | | | | | | commit a007c4c3e943ecc054a806c259d95420a188754b upstream. I don't think there's a practical difference for the range of values these interfaces should see, but it would be safer to be unambiguous. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Cc: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* NFSv4: nfs4_locku_done must release the sequence idTrond Myklebust2012-11-171-0/+1
| | | | | | | | | | | commit 2b1bc308f492589f7d49012ed24561534ea2be8c upstream. If the state recovery machinery is triggered by the call to nfs4_async_handle_error() then we can deadlock. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* nfs: Show original device name verbatim in /proc/*/mount{s,info}Ben Hutchings2012-11-174-9/+20
| | | | | | | | | | | | | | | | | | | | | | commit 97a54868262da1629a3e65121e65b8e8c4419d9f upstream. Since commit c7f404b ('vfs: new superblock methods to override /proc/*/mount{s,info}'), nfs_path() is used to generate the mounted device name reported back to userland. nfs_path() always generates a trailing slash when the given dentry is the root of an NFS mount, but userland may expect the original device name to be returned verbatim (as it used to be). Make this canonicalisation optional and change the callers accordingly. [jrnieder@gmail.com: use flag instead of bool argument] Reported-and-tested-by: Chris Hiestand <chiestand@salk.edu> Reference: http://bugs.debian.org/669314 Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* nfsv3: Make v3 mounts fail with ETIMEDOUTs instead EIO on mountd timeoutsScott Mayhew2012-11-171-1/+1
| | | | | | | | | | | | | | | | | | | commit acce94e68a0f346115fd41cdc298197d2d5a59ad upstream. In very busy v3 environment, rpc.mountd can respond to the NULL procedure but not the MNT procedure in a timely manner causing the MNT procedure to time out. The problem is the mount system call returns EIO which causes the mount to fail, instead of ETIMEDOUT, which would cause the mount to be retried. This patch sets the RPC_TASK_SOFT|RPC_TASK_TIMEOUT flags to the rpc_call_sync() call in nfs_mount() which causes ETIMEDOUT to be returned on timed out connections. Signed-off-by: Steve Dickson <steved@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* sysfs: sysfs_pathname/sysfs_add_one: Use strlcat() instead of strcat()Geert Uytterhoeven2012-10-311-8/+8
| | | | | | | | | | | | | | | | | commit 66081a72517a131430dcf986775f3268aafcb546 upstream. The warning check for duplicate sysfs entries can cause a buffer overflow when printing the warning, as strcat() doesn't check buffer sizes. Use strlcat() instead. Since strlcat() doesn't return a pointer to the passed buffer, unlike strcat(), I had to convert the nested concatenation in sysfs_add_one() to an admittedly more obscure comma operator construct, to avoid emitting code for the concatenation if CONFIG_BUG is disabled. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* fs/compat_ioctl.c: VIDEO_SET_SPU_PALETTE missing error checkKees Cook2012-10-311-0/+2
| | | | | | | | | | | | | | | | | | | commit 12176503366885edd542389eed3aaf94be163fdb upstream. The compat ioctl for VIDEO_SET_SPU_PALETTE was missing an error check while converting ioctl arguments. This could lead to leaking kernel stack contents into userspace. Patch extracted from existing fix in grsecurity. Signed-off-by: Kees Cook <keescook@chromium.org> Cc: David Miller <davem@davemloft.net> Cc: Brad Spengler <spender@grsecurity.net> Cc: PaX Team <pageexec@freemail.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* Revert: lockd: use rpc client's cl_nodename for id encodingGreg Kroah-Hartman2012-10-281-3/+1
| | | | | | | | | | | | | | This reverts 12d63702c53bc2230dfc997e91ca891f39cb6446 which was commit 303a7ce92064c285a04c870f2dc0192fdb2968cb upstream. Taking hostname from uts namespace if not safe, because this cuold be performind during umount operation on child reaper death. And in this case current->nsproxy is NULL already. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Stanislav Kinsbursky <skinsbursky@parallels.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
* NLM: nlm_lookup_file() may return NLMv4-specific error codesTrond Myklebust2012-10-282-2/+3
| | | | | | | | | | | | | | | | | | commit cd0b16c1c3cda12dbed1f8de8f1a9b0591990724 upstream. If the filehandle is stale, or open access is denied for some reason, nlm_fopen() may return one of the NLMv4-specific error codes nlm4_stale_fh or nlm4_failed. These get passed right through nlm_lookup_file(), and so when nlmsvc_retrieve_args() calls the latter, it needs to filter the result through the cast_status() machinery. Failure to do so, will trigger the BUG_ON() in encode_nlm_stat... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Reported-by: Larry McVoy <lm@bitmover.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* udf: fix retun value on error path in udf_load_logicalvolNikola Pajkovsky2012-10-211-1/+4
| | | | | | | | | | | | commit 68766a2edcd5cd744262a70a2f67a320ac944760 upstream. In case we detect a problem and bail out, we fail to set "ret" to a nonzero value, and udf_load_logicalvol will mistakenly report success. Signed-off-by: Nikola Pajkovsky <npajkovs@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* jbd: Fix assertion failure in commit code due to lacking transaction creditsJan Kara2012-10-212-31/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 09e05d4805e6c524c1af74e524e5d0528bb3fef3 upstream. ext3 users of data=journal mode with blocksize < pagesize were occasionally hitting assertion failure in journal_commit_transaction() checking whether the transaction has at least as many credits reserved as buffers attached. The core of the problem is that when a file gets truncated, buffers that still need checkpointing or that are attached to the committing transaction are left with buffer_mapped set. When this happens to buffers beyond i_size attached to a page stradding i_size, subsequent write extending the file will see these buffers and as they are mapped (but underlying blocks were freed) things go awry from here. The assertion failure just coincidentally (and in this case luckily as we would start corrupting filesystem) triggers due to journal_head not being properly cleaned up as well. Under some rare circumstances this bug could even hit data=ordered mode users. There the assertion won't trigger and we would end up corrupting the filesystem. We fix the problem by unmapping buffers if possible (in lots of cases we just need a buffer attached to a transaction as a place holder but it must not be written out anyway). And in one case, we just have to bite the bullet and wait for transaction commit to finish. Reviewed-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* autofs4 - fix reset pending flag on mount failIan Kent2012-10-211-2/+4
| | | | | | | | | | | | | | | | commit 49999ab27eab6289a8e4f450e148bdab521361b2 upstream. In autofs4_d_automount(), if a mount fail occurs the AUTOFS_INF_PENDING mount pending flag is not cleared. One effect of this is when using the "browse" option, directory entry attributes show up with all "?"s due to the incorrect callback and subsequent failure return (when in fact no callback should be made). Signed-off-by: Ian Kent <ikent@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* tmpfs,ceph,gfs2,isofs,reiserfs,xfs: fix fh_len checkingHugh Dickins2012-10-215-6/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 35c2a7f4908d404c9124c2efc6ada4640ca4d5d5 upstream. Fuzzing with trinity oopsed on the 1st instruction of shmem_fh_to_dentry(), u64 inum = fid->raw[2]; which is unhelpfully reported as at the end of shmem_alloc_inode(): BUG: unable to handle kernel paging request at ffff880061cd3000 IP: [<ffffffff812190d0>] shmem_alloc_inode+0x40/0x40 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Call Trace: [<ffffffff81488649>] ? exportfs_decode_fh+0x79/0x2d0 [<ffffffff812d77c3>] do_handle_open+0x163/0x2c0 [<ffffffff812d792c>] sys_open_by_handle_at+0xc/0x10 [<ffffffff83a5f3f8>] tracesys+0xe1/0xe6 Right, tmpfs is being stupid to access fid->raw[2] before validating that fh_len includes it: the buffer kmalloc'ed by do_sys_name_to_handle() may fall at the end of a page, and the next page not be present. But some other filesystems (ceph, gfs2, isofs, reiserfs, xfs) are being careless about fh_len too, in fh_to_dentry() and/or fh_to_parent(), and could oops in the same way: add the missing fh_len checks to those. Reported-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Sage Weil <sage@inktank.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* lockd: use rpc client's cl_nodename for id encodingStanislav Kinsbursky2012-10-211-1/+3
| | | | | | | | | | | | | commit 303a7ce92064c285a04c870f2dc0192fdb2968cb upstream. Taking hostname from uts namespace if not safe, because this cuold be performind during umount operation on child reaper death. And in this case current->nsproxy is NULL already. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ext4: fix fdatasync() for files with only i_size changesJan Kara2012-10-131-2/+6
| | | | | | | | | | | | | | | | | commit b71fc079b5d8f42b2a52743c8d2f1d35d655b1c5 upstream. Code tracking when transaction needs to be committed on fdatasync(2) forgets to handle a situation when only inode's i_size is changed. Thus in such situations fdatasync(2) doesn't force transaction with new i_size to disk and that can result in wrong i_size after a crash. Fix the issue by updating inode's i_datasync_tid whenever its size is updated. Reported-by: Kristian Nielsen <knielsen@knielsen-hq.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ext4: always set i_op in ext4_mknod()Bernd Schubert2012-10-131-2/+0
| | | | | | | | | | | | | commit 6a08f447facb4f9e29fcc30fb68060bb5a0d21c2 upstream. ext4_special_inode_operations have their own ifdef CONFIG_EXT4_FS_XATTR to mask those methods. And ext4_iget also always sets it, so there is an inconsistency. Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ext4: online defrag is not supported for journaled filesDmitry Monakhov2012-10-131-1/+6
| | | | | | | | | | | | | commit f066055a3449f0e5b0ae4f3ceab4445bead47638 upstream. Proper block swap for inodes with full journaling enabled is truly non obvious task. In order to be on a safe side let's explicitly disable it for now. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* coredump: prevent double-free on an error path in core dumperDenys Vlasenko2012-10-071-15/+4
| | | | | | | | | | | | | | | | | | | | commit f34f9d186df35e5c39163444c43b4fc6255e39c5 upstream. In !CORE_DUMP_USE_REGSET case, if elf_note_info_init fails to allocate memory for info->fields, it frees already allocated stuff and returns error to its caller, fill_note_info. Which in turn returns error to its caller, elf_core_dump. Which jumps to cleanup label and calls free_note_info, which will happily try to free all info->fields again. BOOM. This is the fix. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com> Cc: Venu Byravarasu <vbyravarasu@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* vfs: dcache: fix deadlock in tree traversalMiklos Szeredi2012-10-071-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 8110e16d42d587997bcaee0c864179e6d93603fe upstream. IBM reported a deadlock in select_parent(). This was found to be caused by taking rename_lock when already locked when restarting the tree traversal. There are two cases when the traversal needs to be restarted: 1) concurrent d_move(); this can only happen when not already locked, since taking rename_lock protects against concurrent d_move(). 2) racing with final d_put() on child just at the moment of ascending to parent; rename_lock doesn't protect against this rare race, so it can happen when already locked. Because of case 2, we need to be able to handle restarting the traversal when rename_lock is already held. This patch fixes all three callers of try_to_ascend(). IBM reported that the deadlock is gone with this patch. [ I rewrote the patch to be smaller and just do the "goto again" if the lock was already held, but credit goes to Miklos for the real work. - Linus ] Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* Squashfs: fix mount time sanity check for corrupted superblockPhillip Lougher2012-10-021-1/+1
| | | | | | | | | | | | | | | | commit cc37f75a9ffbbfcb1c3297534f293c8284e3c5a6 upstream. A Squashfs filesystem containing nothing but an empty directory, although unusual and ultimately pointless, is still valid. The directory_table >= next_table sanity check rejects these filesystems as invalid because the directory_table is empty and equal to next_table. Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* NFS: return error from decode_getfh in decode openWeston Andros Adamson2012-10-021-1/+2
| | | | | | | | | | | | commit 01913b49cf1dc6409a07dd2a4cc6af2e77f3c410 upstream. If decode_getfh failed, nfs4_xdr_dec_open would return 0 since the last decode_* call must have succeeded. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* NFS: Fix a problem with the legacy binary mount codeTrond Myklebust2012-10-021-0/+2
| | | | | | | | | | | | | | | commit 872ece86ea5c367aa92f44689c2d01a1c767aeb3 upstream. Apparently, am-utils is still using the legacy binary mountdata interface, and is having trouble parsing /proc/mounts due to the 'port=' field being incorrectly set. The following patch should fix up the regression. Reported-by: Marius Tolzmann <tolzmann@molgen.mpg.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* NFS: Fix the initialisation of the readdir 'cookieverf' arrayTrond Myklebust2012-10-023-4/+4
| | | | | | | | | | | | | | | | | | | | | | commit c3f52af3e03013db5237e339c817beaae5ec9e3a upstream. When the NFS_COOKIEVERF helper macro was converted into a static inline function in commit 99fadcd764 (nfs: convert NFS_*(inode) helpers to static inline), we broke the initialisation of the readdir cookies, since that depended on doing a memset with an argument of 'sizeof(NFS_COOKIEVERF(inode))' which therefore changed from sizeof(be32 cookieverf[2]) to sizeof(be32 *). At this point, NFS_COOKIEVERF seems to be more of an obfuscation than a helper, so the best thing would be to just get rid of it. Also see: https://bugzilla.kernel.org/show_bug.cgi?id=46881 Reported-by: Andi Kleen <andi@firstfloor.org> Reported-by: David Binderman <dcb314@hotmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* eCryptfs: Copy up attributes of the lower target inode after renameTyler Hicks2012-10-021-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | commit 8335eafc2859e1a26282bef7c3d19f3d68868b8a upstream. After calling into the lower filesystem to do a rename, the lower target inode's attributes were not copied up to the eCryptfs target inode. This resulted in the eCryptfs target inode staying around, rather than being evicted, because i_nlink was not updated for the eCryptfs inode. This also meant that eCryptfs didn't do the final iput() on the lower target inode so it stayed around, as well. This would result in a failure to free up space occupied by the target file in the rename() operation. Both target inodes would eventually be evicted when the eCryptfs filesystem was unmounted. This patch calls fsstack_copy_attr_all() after the lower filesystem does its ->rename() so that important inode attributes, such as i_nlink, are updated at the eCryptfs layer. ecryptfs_evict_inode() is now called and eCryptfs can drop its final reference on the lower inode. http://launchpad.net/bugs/561129 Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Tested-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* vfs: dcache: use DCACHE_DENTRY_KILLED instead of DCACHE_DISCONNECTED in d_kill()Miklos Szeredi2012-10-021-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | commit b161dfa6937ae46d50adce8a7c6b12233e96e7bd upstream. IBM reported a soft lockup after applying the fix for the rename_lock deadlock. Commit c83ce989cb5f ("VFS: Fix the nfs sillyrename regression in kernel 2.6.38") was found to be the culprit. The nfs sillyrename fix used DCACHE_DISCONNECTED to indicate that the dentry was killed. This flag can be set on non-killed dentries too, which results in infinite retries when trying to traverse the dentry tree. This patch introduces a separate flag: DCACHE_DENTRY_KILLED, which is only set in d_kill() and makes try_to_ascend() test only this flag. IBM reported successful test results with this patch. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* vfs: make O_PATH file descriptors usable for 'fstat()'Linus Torvalds2012-10-021-1/+1
| | | | | | | | | | | | | | | | | | | | | commit 55815f70147dcfa3ead5738fd56d3574e2e3c1c2 upstream. We already use them for openat() and friends, but fstat() also wants to be able to use O_PATH file descriptors. This should make it more directly comparable to the O_SEARCH of Solaris. Note that you could already do the same thing with "fstatat()" and an empty path, but just doing "fstat()" directly is simpler and faster, so there is no reason not to just allow it directly. See also commit 332a2e1244bd, which did the same thing for fchdir, for the same reasons. Reported-by: ольга крыжановская <olga.kryzhanovska@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* fuse: fix retrieve lengthMiklos Szeredi2012-09-141-0/+1
| | | | | | | | | | | commit c9e67d483776d8d2a5f3f70491161b205930ffe1 upstream. In some cases fuse_retrieve() would return a short byte count if offset was non-zero. The data returned was correct, though. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ext3: Fix fdatasync() for files with only i_size changesJan Kara2012-09-141-3/+14
| | | | | | | | | | | | | | | | | commit 156bddd8e505b295540f3ca0e27dda68cb0d49aa upstream. Code tracking when transaction needs to be committed on fdatasync(2) forgets to handle a situation when only inode's i_size is changed. Thus in such situations fdatasync(2) doesn't force transaction with new i_size to disk and that can result in wrong i_size after a crash. Fix the issue by updating inode's i_datasync_tid whenever its size is updated. Reported-by: Kristian Nielsen <knielsen@knielsen-hq.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* udf: Fix data corruption for files in ICBJan Kara2012-09-141-6/+29
| | | | | | | | | | | | | | | | | | | commit 9c2fc0de1a6e638fe58c354a463f544f42a90a09 upstream. When a file is stored in ICB (inode), we overwrite part of the file, and the page containing file's data is not in page cache, we end up corrupting file's data by overwriting them with zeros. The problem is we use simple_write_begin() which simply zeroes parts of the page which are not written to. The problem has been introduced by be021ee4 (udf: convert to new aops). Fix the problem by providing a ->write_begin function which makes the page properly uptodate. Reported-by: Ian Abbott <abbotti@mev.co.uk> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* block: replace __getblk_slow misfix by grow_dev_page fixHugh Dickins2012-09-141-36/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 676ce6d5ca3098339c028d44fe0427d1566a4d2d upstream. Commit 91f68c89d8f3 ("block: fix infinite loop in __getblk_slow") is not good: a successful call to grow_buffers() cannot guarantee that the page won't be reclaimed before the immediate next call to __find_get_block(), which is why there was always a loop there. Yesterday I got "EXT4-fs error (device loop0): __ext4_get_inode_loc:3595: inode #19278: block 664: comm cc1: unable to read itable block" on console, which pointed to this commit. I've been trying to bisect for weeks, why kbuild-on-ext4-on-loop-on-tmpfs sometimes fails from a missing header file, under memory pressure on ppc G5. I've never seen this on x86, and I've never seen it on 3.5-rc7 itself, despite that commit being in there: bisection pointed to an irrelevant pinctrl merge, but hard to tell when failure takes between 18 minutes and 38 hours (but so far it's happened quicker on 3.6-rc2). (I've since found such __ext4_get_inode_loc errors in /var/log/messages from previous weeks: why the message never appeared on console until yesterday morning is a mystery for another day.) Revert 91f68c89d8f3, restoring __getblk_slow() to how it was (plus a checkpatch nitfix). Simplify the interface between grow_buffers() and grow_dev_page(), and avoid the infinite loop beyond end of device by instead checking init_page_buffers()'s end_block there (I presume that's more efficient than a repeated call to blkdev_max_block()), returning -ENXIO to __getblk_slow() in that case. And remove akpm's ten-year-old "__getblk() cannot fail ... weird" comment, but that is worrying: are all users of __getblk() really now prepared for a NULL bh beyond end of device, or will some oops?? Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* NFS: Alias the nfs module to nfs4bjschuma@gmail.com2012-09-141-0/+2
| | | | | | | | | | | | | commit 425e776d93a7a5070b77d4f458a5bab0f924652c upstream. This allows distros to remove the line from their modprobe configuration. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* NFSv4.1: Remove a bogus BUG_ON() in nfs4_layoutreturn_doneTrond Myklebust2012-09-141-6/+2
| | | | | | | | | | | | | | | | | | | | | | commit 47fbf7976e0b7d9dcdd799e2a1baba19064d9631 upstream. Ever since commit 0a57cdac3f (NFSv4.1 send layoutreturn to fence disconnected data server) we've been sending layoutreturn calls while there is potentially still outstanding I/O to the data servers. The reason we do this is to avoid races between replayed writes to the MDS and the original writes to the DS. When this happens, the BUG_ON() in nfs4_layoutreturn_done can be triggered because it assumes that we would never call layoutreturn without knowing that all I/O to the DS is finished. The fix is to remove the BUG_ON() now that the assumptions behind the test are obsolete. Reported-by: Boaz Harrosh <bharrosh@panasas.com> Reported-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* NFSv3: Ensure that do_proc_get_root() reports errors correctlyTrond Myklebust2012-09-141-1/+1
| | | | | | | | | | | | | | commit 086600430493e04b802bee6e5b3ce0458e4eb77f upstream. If the rpc call to NFS3PROC_FSINFO fails, then we need to report that error so that the mount fails. Otherwise we can end up with a superblock with completely unusable values for block sizes, maxfilesize, etc. Reported-by: Yuanming Chen <hikvision_linux@163.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* vfs: canonicalize create mode in build_open_flags()Miklos Szeredi2012-09-141-3/+4
| | | | | | | | | | | | | | | | | commit e68726ff72cf7ba5e7d789857fcd9a75ca573f03 upstream. Userspace can pass weird create mode in open(2) that we canonicalize to "(mode & S_IALLUGO) | S_IFREG" in vfs_create(). The problem is that we use the uncanonicalized mode before calling vfs_create() with unforseen consequences. So do the canonicalization early in build_open_flags(). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Tested-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* vfs: missed source of ->f_pos racesAl Viro2012-09-141-2/+8
| | | | | | | | | | | | commit 0e665d5d1125f9f4ccff56a75e814f10f88861a2 upstream. compat_sys_{read,write}v() need the same "pass a copy of file->f_pos" thing as sys_{read,write}{,v}(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ext4: avoid kmemcheck complaint from reading uninitialized memoryTheodore Ts'o2012-08-261-0/+1
| | | | | | | | | | | | | | | | | | | commit 7e731bc9a12339f344cddf82166b82633d99dd86 upstream. Commit 03179fe923 introduced a kmemcheck complaint in ext4_da_get_block_prep() because we save and restore ei->i_da_metadata_calc_last_lblock even though it is left uninitialized in the case where i_da_metadata_calc_len is zero. This doesn't hurt anything, but silencing the kmemcheck complaint makes it easier for people to find real bugs. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=45631 (which is marked as a regression). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* fuse: verify all ioctl retry iov elementsZach Brown2012-08-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | commit fb6ccff667712c46b4501b920ea73a326e49626a upstream. Commit 7572777eef78ebdee1ecb7c258c0ef94d35bad16 attempted to verify that the total iovec from the client doesn't overflow iov_length() but it only checked the first element. The iovec could still overflow by starting with a small element. The obvious fix is to check all the elements. The overflow case doesn't look dangerous to the kernel as the copy is limited by the length after the overflow. This fix restores the intention of returning an error instead of successfully copying less than the iovec represented. I found this by code inspection. I built it but don't have a test case. I'm cc:ing stable because the initial commit did as well. Signed-off-by: Zach Brown <zab@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* nilfs2: fix deadlock issue between chcp and thaw ioctlsRyusuke Konishi2012-08-154-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 572d8b3945a31bee7c40d21556803e4807fd9141 upstream. An fs-thaw ioctl causes deadlock with a chcp or mkcp -s command: chcp D ffff88013870f3d0 0 1325 1324 0x00000004 ... Call Trace: nilfs_transaction_begin+0x11c/0x1a0 [nilfs2] wake_up_bit+0x20/0x20 copy_from_user+0x18/0x30 [nilfs2] nilfs_ioctl_change_cpmode+0x7d/0xcf [nilfs2] nilfs_ioctl+0x252/0x61a [nilfs2] do_page_fault+0x311/0x34c get_unmapped_area+0x132/0x14e do_vfs_ioctl+0x44b/0x490 __set_task_blocked+0x5a/0x61 vm_mmap_pgoff+0x76/0x87 __set_current_blocked+0x30/0x4a sys_ioctl+0x4b/0x6f system_call_fastpath+0x16/0x1b thaw D ffff88013870d890 0 1352 1351 0x00000004 ... Call Trace: rwsem_down_failed_common+0xdb/0x10f call_rwsem_down_write_failed+0x13/0x20 down_write+0x25/0x27 thaw_super+0x13/0x9e do_vfs_ioctl+0x1f5/0x490 vm_mmap_pgoff+0x76/0x87 sys_ioctl+0x4b/0x6f filp_close+0x64/0x6c system_call_fastpath+0x16/0x1b where the thaw ioctl deadlocked at thaw_super() when called while chcp was waiting at nilfs_transaction_begin() called from nilfs_ioctl_change_cpmode(). This deadlock is 100% reproducible. This is because nilfs_ioctl_change_cpmode() first locks sb->s_umount in read mode and then waits for unfreezing in nilfs_transaction_begin(), whereas thaw_super() locks sb->s_umount in write mode. The locking of sb->s_umount here was intended to make snapshot mounts and the downgrade of snapshots to checkpoints exclusive. This fixes the deadlock issue by replacing the sb->s_umount usage in nilfs_ioctl_change_cpmode() with a dedicated mutex which protects snapshot mounts. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ext4: don't let i_reserved_meta_blocks go negativeBrian Foster2012-08-091-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | commit 97795d2a5b8d3c8dc4365d4bd3404191840453ba upstream. If we hit a condition where we have allocated metadata blocks that were not appropriately reserved, we risk underflow of ei->i_reserved_meta_blocks. In turn, this can throw sbi->s_dirtyclusters_counter significantly out of whack and undermine the nondelalloc fallback logic in ext4_nonda_switch(). Warn if this occurs and set i_allocated_meta_blocks to avoid this problem. This condition is reproduced by xfstests 270 against ext2 with delalloc enabled: Mar 28 08:58:02 localhost kernel: [ 171.526344] EXT4-fs (loop1): delayed block allocation failed for inode 14 at logical offset 64486 with max blocks 64 with error -28 Mar 28 08:58:02 localhost kernel: [ 171.526346] EXT4-fs (loop1): This should not happen!! Data will be lost 270 ultimately fails with an inconsistent filesystem and requires an fsck to repair. The cause of the error is an underflow in ext4_da_update_reserve_space() due to an unreserved meta block allocation. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ext4: pass a char * to ext4_count_free() instead of a buffer_head ptrTheodore Ts'o2012-08-094-8/+8
| | | | | | | | | | | commit f6fb99cadcd44660c68e13f6eab28333653621e6 upstream. Make it possible for ext4_count_free to operate on buffers and not just data in buffer_heads. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* nfs: skip commit in releasepage if we're freeing memory for fs-related reasonsJeff Layton2012-08-091-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 5cf02d09b50b1ee1c2d536c9cf64af5a7d433f56 upstream. We've had some reports of a deadlock where rpciod ends up with a stack trace like this: PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14" #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9 #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs] #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8 #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs] #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs] #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670 #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271 #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638 #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942 #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9 #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808 #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6 #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7 #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc] #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc] #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0 #24 [ffff8810343bfee8] kthread at ffffffff8108dd96 #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca rpciod is trying to allocate memory for a new socket to talk to the server. The VM ends up calling ->releasepage to get more memory, and it tries to do a blocking commit. That commit can't succeed however without a connected socket, so we deadlock. Fix this by setting PF_FSTRANS on the workqueue task prior to doing the socket allocation, and having nfs_release_page check for that flag when deciding whether to do a commit call. Also, set PF_FSTRANS unconditionally in rpc_async_schedule since that function can also do allocations sometimes. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* nfsd4: our filesystems are normally case sensitiveJ. Bruce Fields2012-08-091-1/+1
| | | | | | | | | | | commit 2930d381d22b9c56f40dd4c63a8fa59719ca2c3c upstream. Actually, xfs and jfs can optionally be case insensitive; we'll handle that case in later patches. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* Btrfs: call the ordered free operation without any locks heldChris Mason2012-08-091-1/+8
| | | | | | | | | | | | | | | | commit e9fbcb42201c862fd6ab45c48ead4f47bb2dea9d upstream. Each ordered operation has a free callback, and this was called with the worker spinlock held. Josef made the free callback also call iput, which we can't do with the spinlock. This drops the spinlock for the free operation and grabs it again before moving through the rest of the list. We'll circle back around to this and find a cleaner way that doesn't bounce the lock around so much. Signed-off-by: Chris Mason <chris.mason@fusionio.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* locks: fix checking of fcntl_setlease argumentJ. Bruce Fields2012-08-091-3/+3
| | | | | | | | | | | | | | | | | | commit 0ec4f431eb56d633da3a55da67d5c4b88886ccc7 upstream. The only checks of the long argument passed to fcntl(fd,F_SETLEASE,.) are done after converting the long to an int. Thus some illegal values may be let through and cause problems in later code. [ They actually *don't* cause problems in mainline, as of Dave Jones's commit 8d657eb3b438 "Remove easily user-triggerable BUG from generic_setlease", but we should fix this anyway. And this patch will be necessary to fix real bugs on earlier kernels. ] Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* mm: compaction: introduce sync-light migration for use by compactionMel Gorman2012-08-014-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit a6bc32b899223a877f595ef9ddc1e89ead5072b8 upstream. Stable note: Not tracked in Buzilla. This was part of a series that reduced interactivity stalls experienced when THP was enabled. These stalls were particularly noticable when copying data to a USB stick but the experiences for users varied a lot. This patch adds a lightweight sync migrate operation MIGRATE_SYNC_LIGHT mode that avoids writing back pages to backing storage. Async compaction maps to MIGRATE_ASYNC while sync compaction maps to MIGRATE_SYNC_LIGHT. For other migrate_pages users such as memory hotplug, MIGRATE_SYNC is used. This avoids sync compaction stalling for an excessive length of time, particularly when copying files to a USB stick where there might be a large number of dirty pages backed by a filesystem that does not support ->writepages. [aarcange@redhat.com: This patch is heavily based on Andrea's work] [akpm@linux-foundation.org: fix fs/nfs/write.c build] [akpm@linux-foundation.org: fix fs/btrfs/disk-io.c build] Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Dave Jones <davej@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Andy Isaacson <adi@hexapodia.org> Cc: Nai Xia <nai.xia@gmail.com> Cc: Johannes Weiner <jweiner@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>