From 1541d27bdbcf4637162f4cedf7e3954bee3dace2 Mon Sep 17 00:00:00 2001 From: Josh Cartwright Date: Thu, 29 Mar 2012 19:34:53 -0400 Subject: jffs2: Fix lock acquisition order bug in gc path commit 226bb7df3d22bcf4a1c0fe8206c80cc427498eae upstream. The locking policy is such that the erase_complete_block spinlock is nested within the alloc_sem mutex. This fixes a case in which the acquisition order was erroneously reversed. This issue was caught by the following lockdep splat: ======================================================= [ INFO: possible circular locking dependency detected ] 3.0.5 #1 ------------------------------------------------------- jffs2_gcd_mtd6/299 is trying to acquire lock: (&c->alloc_sem){+.+.+.}, at: [] jffs2_garbage_collect_pass+0x314/0x890 but task is already holding lock: (&(&c->erase_completion_lock)->rlock){+.+...}, at: [] jffs2_garbage_collect_pass+0x308/0x890 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&(&c->erase_completion_lock)->rlock){+.+...}: [] validate_chain+0xe6c/0x10bc [] __lock_acquire+0x54c/0xba4 [] lock_acquire+0xa4/0x114 [] _raw_spin_lock+0x3c/0x4c [] jffs2_garbage_collect_pass+0x4c/0x890 [] jffs2_garbage_collect_thread+0x1b4/0x1cc [] kthread+0x98/0xa0 [] kernel_thread_exit+0x0/0x8 -> #0 (&c->alloc_sem){+.+.+.}: [] print_circular_bug+0x70/0x2c4 [] validate_chain+0x1034/0x10bc [] __lock_acquire+0x54c/0xba4 [] lock_acquire+0xa4/0x114 [] mutex_lock_nested+0x74/0x33c [] jffs2_garbage_collect_pass+0x314/0x890 [] jffs2_garbage_collect_thread+0x1b4/0x1cc [] kthread+0x98/0xa0 [] kernel_thread_exit+0x0/0x8 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&(&c->erase_completion_lock)->rlock); lock(&c->alloc_sem); lock(&(&c->erase_completion_lock)->rlock); lock(&c->alloc_sem); *** DEADLOCK *** 1 lock held by jffs2_gcd_mtd6/299: #0: (&(&c->erase_completion_lock)->rlock){+.+...}, at: [] jffs2_garbage_collect_pass+0x308/0x890 stack backtrace: [] (unwind_backtrace+0x0/0x100) from [] (dump_stack+0x20/0x24) [] (dump_stack+0x20/0x24) from [] (print_circular_bug+0x1c8/0x2c4) [] (print_circular_bug+0x1c8/0x2c4) from [] (validate_chain+0x1034/0x10bc) [] (validate_chain+0x1034/0x10bc) from [] (__lock_acquire+0x54c/0xba4) [] (__lock_acquire+0x54c/0xba4) from [] (lock_acquire+0xa4/0x114) [] (lock_acquire+0xa4/0x114) from [] (mutex_lock_nested+0x74/0x33c) [] (mutex_lock_nested+0x74/0x33c) from [] (jffs2_garbage_collect_pass+0x314/0x890) [] (jffs2_garbage_collect_pass+0x314/0x890) from [] (jffs2_garbage_collect_thread+0x1b4/0x1cc) [] (jffs2_garbage_collect_thread+0x1b4/0x1cc) from [] (kthread+0x98/0xa0) [] (kthread+0x98/0xa0) from [] (kernel_thread_exit+0x0/0x8) This was introduce in '81cfc9f jffs2: Fix serious write stall due to erase'. Signed-off-by: Josh Cartwright Signed-off-by: Artem Bityutskiy Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- fs/jffs2/gc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/jffs2/gc.c b/fs/jffs2/gc.c index 31dce61..4bbd521 100644 --- a/fs/jffs2/gc.c +++ b/fs/jffs2/gc.c @@ -225,8 +225,8 @@ int jffs2_garbage_collect_pass(struct jffs2_sb_info *c) return 0; D1(printk(KERN_DEBUG "No progress from erasing blocks; doing GC anyway\n")); - spin_lock(&c->erase_completion_lock); mutex_lock(&c->alloc_sem); + spin_lock(&c->erase_completion_lock); } /* First, work out which block we're garbage-collecting */ -- cgit v1.1 From 797c09ed348c2e22715cbfaeb3ab658436753570 Mon Sep 17 00:00:00 2001 From: Eric Sandeen Date: Mon, 20 Feb 2012 23:06:18 -0500 Subject: ext4: avoid deadlock on sync-mounted FS w/o journal commit c1bb05a657fb3d8c6179a4ef7980261fae4521d7 upstream. Processes hang forever on a sync-mounted ext2 file system that is mounted with the ext4 module (default in Fedora 16). I can reproduce this reliably by mounting an ext2 partition with "-o sync" and opening a new file an that partition with vim. vim will hang in "D" state forever. The same happens on ext4 without a journal. I am attaching a small patch here that solves this issue for me. In the sync mounted case without a journal, ext4_handle_dirty_metadata() may call sync_dirty_buffer(), which can't be called with buffer lock held. Also move mb_cache_entry_release inside lock to avoid race fixed previously by 8a2bfdcb ext[34]: EA block reference count racing fix Note too that ext2 fixed this same problem in 2006 with b2f49033 [PATCH] fix deadlock in ext2 Signed-off-by: Martin.Wilck@ts.fujitsu.com [sandeen@redhat.com: move mb_cache_entry_release before unlock, edit commit msg] Signed-off-by: Eric Sandeen Signed-off-by: "Theodore Ts'o" Signed-off-by: Greg Kroah-Hartman --- fs/ext4/xattr.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) (limited to 'fs') diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c index 19fe4e3..c2865cc 100644 --- a/fs/ext4/xattr.c +++ b/fs/ext4/xattr.c @@ -487,18 +487,19 @@ ext4_xattr_release_block(handle_t *handle, struct inode *inode, ext4_free_blocks(handle, inode, bh, 0, 1, EXT4_FREE_BLOCKS_METADATA | EXT4_FREE_BLOCKS_FORGET); + unlock_buffer(bh); } else { le32_add_cpu(&BHDR(bh)->h_refcount, -1); + if (ce) + mb_cache_entry_release(ce); + unlock_buffer(bh); error = ext4_handle_dirty_metadata(handle, inode, bh); if (IS_SYNC(inode)) ext4_handle_sync(handle); dquot_free_block(inode, 1); ea_bdebug(bh, "refcount now=%d; releasing", le32_to_cpu(BHDR(bh)->h_refcount)); - if (ce) - mb_cache_entry_release(ce); } - unlock_buffer(bh); out: ext4_std_error(inode->i_sb, error); return; -- cgit v1.1 From 19165bdbb3622cfca0ff66e8b30248d469b849d6 Mon Sep 17 00:00:00 2001 From: Jonathan Nieder Date: Fri, 11 May 2012 04:20:20 -0500 Subject: NFSv4: Revalidate uid/gid after open This is a shorter (and more appropriate for stable kernels) analog to the following upstream commit: commit 6926afd1925a54a13684ebe05987868890665e2b Author: Trond Myklebust Date: Sat Jan 7 13:22:46 2012 -0500 NFSv4: Save the owner/group name string when doing open ...so that we can do the uid/gid mapping outside the asynchronous RPC context. This fixes a bug in the current NFSv4 atomic open code where the client isn't able to determine what the true uid/gid fields of the file are, (because the asynchronous nature of the OPEN call denies it the ability to do an upcall) and so fills them with default values, marking the inode as needing revalidation. Unfortunately, in some cases, the VFS will do some additional sanity checks on the file, and may override the server's decision to allow the open because it sees the wrong owner/group fields. Signed-off-by: Trond Myklebust Without this patch, logging into two different machines with home directories mounted over NFS4 and then running "vim" and typing ":q" in each reliably produces the following error on the second machine: E137: Viminfo file is not writable: /users/system/rtheys/.viminfo This regression was introduced by 80e52aced138 ("NFSv4: Don't do idmapper upcalls for asynchronous RPC calls", merged during the 2.6.32 cycle) --- after the OPEN call, .viminfo has the default values for st_uid and st_gid (0xfffffffe) cached because we do not want to let rpciod wait for an idmapper upcall to fill them in. The fix used in mainline is to save the owner and group as strings and perform the upcall in _nfs4_proc_open outside the rpciod context, which takes about 600 lines. For stable, we can do something similar with a one-liner: make open check for the stale fields and make a (synchronous) GETATTR call to fill them when needed. Trond dictated the patch, I typed it in, and Rik tested it. Addresses http://bugs.debian.org/659111 and https://bugzilla.redhat.com/789298 Reported-by: Rik Theys Explained-by: David Flyn Signed-off-by: Jonathan Nieder Tested-by: Rik Theys Signed-off-by: Greg Kroah-Hartman --- fs/nfs/nfs4proc.c | 1 + 1 file changed, 1 insertion(+) (limited to 'fs') diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index 3d67302..30f6548 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -1771,6 +1771,7 @@ static int _nfs4_do_open(struct inode *dir, struct path *path, fmode_t fmode, in nfs_setattr_update_inode(state->inode, sattr); nfs_post_op_update_inode(state->inode, opendata->o_res.f_attr); } + nfs_revalidate_inode(server, state->inode); nfs4_opendata_put(opendata); nfs4_put_state_owner(sp); *res = state; -- cgit v1.1 From 8e8a21270cbfc41bf3bf2e014b99dc113ba554ec Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Thu, 8 Dec 2011 21:13:46 +0100 Subject: ext3: Fix error handling on inode bitmap corruption commit 1415dd8705394399d59a3df1ab48d149e1e41e77 upstream. When insert_inode_locked() fails in ext3_new_inode() it most likely means inode bitmap got corrupted and we allocated again inode which is already in use. Also doing unlock_new_inode() during error recovery is wrong since inode does not have I_NEW set. Fix the problem by jumping to fail: (instead of fail_drop:) which declares filesystem error and does not call unlock_new_inode(). Reviewed-by: Eric Sandeen Signed-off-by: Jan Kara Signed-off-by: Greg Kroah-Hartman --- fs/ext3/ialloc.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) (limited to 'fs') diff --git a/fs/ext3/ialloc.c b/fs/ext3/ialloc.c index bfc2dc4..0b3da7c 100644 --- a/fs/ext3/ialloc.c +++ b/fs/ext3/ialloc.c @@ -561,8 +561,12 @@ got: if (IS_DIRSYNC(inode)) handle->h_sync = 1; if (insert_inode_locked(inode) < 0) { - err = -EINVAL; - goto fail_drop; + /* + * Likely a bitmap corruption causing inode to be allocated + * twice. + */ + err = -EIO; + goto fail; } spin_lock(&sbi->s_next_gen_lock); inode->i_generation = sbi->s_next_generation++; -- cgit v1.1 From 1a28fbbebe6bd7e3f0338663302b3b3ce500e088 Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Sun, 18 Dec 2011 17:37:02 -0500 Subject: ext4: fix error handling on inode bitmap corruption commit acd6ad83517639e8f09a8c5525b1dccd81cd2a10 upstream. When insert_inode_locked() fails in ext4_new_inode() it most likely means inode bitmap got corrupted and we allocated again inode which is already in use. Also doing unlock_new_inode() during error recovery is wrong since the inode does not have I_NEW set. Fix the problem by jumping to fail: (instead of fail_drop:) which declares filesystem error and does not call unlock_new_inode(). Signed-off-by: Jan Kara Signed-off-by: "Theodore Ts'o" Signed-off-by: Greg Kroah-Hartman --- fs/ext4/ialloc.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) (limited to 'fs') diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c index 21bb2f6..412469b 100644 --- a/fs/ext4/ialloc.c +++ b/fs/ext4/ialloc.c @@ -1021,8 +1021,12 @@ got: if (IS_DIRSYNC(inode)) ext4_handle_sync(handle); if (insert_inode_locked(inode) < 0) { - err = -EINVAL; - goto fail_drop; + /* + * Likely a bitmap corruption causing inode to be allocated + * twice. + */ + err = -EIO; + goto fail; } spin_lock(&sbi->s_next_gen_lock); inode->i_generation = sbi->s_next_generation++; -- cgit v1.1 From fcb2c2e95085336d1d6ac9cc1a07a783f80e60c0 Mon Sep 17 00:00:00 2001 From: Kazuya Mio Date: Thu, 1 Dec 2011 16:51:07 +0900 Subject: wake up s_wait_unfrozen when ->freeze_fs fails commit e1616300a20c80396109c1cf013ba9a36055a3da upstream. dd slept infinitely when fsfeeze failed because of EIO. To fix this problem, if ->freeze_fs fails, freeze_super() wakes up the tasks waiting for the filesystem to become unfrozen. When s_frozen isn't SB_UNFROZEN in __generic_file_aio_write(), the function sleeps until FITHAW ioctl wakes up s_wait_unfrozen. However, if ->freeze_fs fails, s_frozen is set to SB_UNFROZEN and then freeze_super() returns an error number. In this case, FITHAW ioctl returns EINVAL because s_frozen is already SB_UNFROZEN. There is no way to wake up s_wait_unfrozen, so __generic_file_aio_write() sleeps infinitely. Signed-off-by: Kazuya Mio Signed-off-by: Al Viro Signed-off-by: Greg Kroah-Hartman --- fs/super.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'fs') diff --git a/fs/super.c b/fs/super.c index ab3d672..caf4dfa 100644 --- a/fs/super.c +++ b/fs/super.c @@ -1009,6 +1009,8 @@ int freeze_super(struct super_block *sb) printk(KERN_ERR "VFS:Filesystem freeze failed\n"); sb->s_frozen = SB_UNFROZEN; + smp_wmb(); + wake_up(&sb->s_wait_unfrozen); deactivate_locked_super(sb); return ret; } -- cgit v1.1 From 37a8457773c266cdb77fcddec008cd73e81786be Mon Sep 17 00:00:00 2001 From: Jeff Moyer Date: Fri, 11 May 2012 16:34:10 +0200 Subject: block: don't mark buffers beyond end of disk as mapped commit 080399aaaf3531f5b8761ec0ac30ff98891e8686 upstream. Hi, We have a bug report open where a squashfs image mounted on ppc64 would exhibit errors due to trying to read beyond the end of the disk. It can easily be reproduced by doing the following: [root@ibm-p750e-02-lp3 ~]# ls -l install.img -rw-r--r-- 1 root root 142032896 Apr 30 16:46 install.img [root@ibm-p750e-02-lp3 ~]# mount -o loop ./install.img /mnt/test [root@ibm-p750e-02-lp3 ~]# dd if=/dev/loop0 of=/dev/null dd: reading `/dev/loop0': Input/output error 277376+0 records in 277376+0 records out 142016512 bytes (142 MB) copied, 0.9465 s, 150 MB/s In dmesg, you'll find the following: squashfs: version 4.0 (2009/01/31) Phillip Lougher [ 43.106012] attempt to access beyond end of device [ 43.106029] loop0: rw=0, want=277410, limit=277408 [ 43.106039] Buffer I/O error on device loop0, logical block 138704 [ 43.106053] attempt to access beyond end of device [ 43.106057] loop0: rw=0, want=277412, limit=277408 [ 43.106061] Buffer I/O error on device loop0, logical block 138705 [ 43.106066] attempt to access beyond end of device [ 43.106070] loop0: rw=0, want=277414, limit=277408 [ 43.106073] Buffer I/O error on device loop0, logical block 138706 [ 43.106078] attempt to access beyond end of device [ 43.106081] loop0: rw=0, want=277416, limit=277408 [ 43.106085] Buffer I/O error on device loop0, logical block 138707 [ 43.106089] attempt to access beyond end of device [ 43.106093] loop0: rw=0, want=277418, limit=277408 [ 43.106096] Buffer I/O error on device loop0, logical block 138708 [ 43.106101] attempt to access beyond end of device [ 43.106104] loop0: rw=0, want=277420, limit=277408 [ 43.106108] Buffer I/O error on device loop0, logical block 138709 [ 43.106112] attempt to access beyond end of device [ 43.106116] loop0: rw=0, want=277422, limit=277408 [ 43.106120] Buffer I/O error on device loop0, logical block 138710 [ 43.106124] attempt to access beyond end of device [ 43.106128] loop0: rw=0, want=277424, limit=277408 [ 43.106131] Buffer I/O error on device loop0, logical block 138711 [ 43.106135] attempt to access beyond end of device [ 43.106139] loop0: rw=0, want=277426, limit=277408 [ 43.106143] Buffer I/O error on device loop0, logical block 138712 [ 43.106147] attempt to access beyond end of device [ 43.106151] loop0: rw=0, want=277428, limit=277408 [ 43.106154] Buffer I/O error on device loop0, logical block 138713 [ 43.106158] attempt to access beyond end of device [ 43.106162] loop0: rw=0, want=277430, limit=277408 [ 43.106166] attempt to access beyond end of device [ 43.106169] loop0: rw=0, want=277432, limit=277408 ... [ 43.106307] attempt to access beyond end of device [ 43.106311] loop0: rw=0, want=277470, limit=2774 Squashfs manages to read in the end block(s) of the disk during the mount operation. Then, when dd reads the block device, it leads to block_read_full_page being called with buffers that are beyond end of disk, but are marked as mapped. Thus, it would end up submitting read I/O against them, resulting in the errors mentioned above. I fixed the problem by modifying init_page_buffers to only set the buffer mapped if it fell inside of i_size. Cheers, Jeff Signed-off-by: Jeff Moyer Acked-by: Nick Piggin -- Changes from v1->v2: re-used max_block, as suggested by Nick Piggin. Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman --- fs/block_dev.c | 6 +++--- fs/buffer.c | 4 +++- 2 files changed, 6 insertions(+), 4 deletions(-) (limited to 'fs') diff --git a/fs/block_dev.c b/fs/block_dev.c index 74fc5ed..a580028 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -64,7 +64,7 @@ static void bdev_inode_switch_bdi(struct inode *inode, spin_unlock(&inode_wb_list_lock); } -static sector_t max_block(struct block_device *bdev) +sector_t blkdev_max_block(struct block_device *bdev) { sector_t retval = ~((sector_t)0); loff_t sz = i_size_read(bdev->bd_inode); @@ -135,7 +135,7 @@ static int blkdev_get_block(struct inode *inode, sector_t iblock, struct buffer_head *bh, int create) { - if (iblock >= max_block(I_BDEV(inode))) { + if (iblock >= blkdev_max_block(I_BDEV(inode))) { if (create) return -EIO; @@ -157,7 +157,7 @@ static int blkdev_get_blocks(struct inode *inode, sector_t iblock, struct buffer_head *bh, int create) { - sector_t end_block = max_block(I_BDEV(inode)); + sector_t end_block = blkdev_max_block(I_BDEV(inode)); unsigned long max_blocks = bh->b_size >> inode->i_blkbits; if ((iblock + max_blocks) > end_block) { diff --git a/fs/buffer.c b/fs/buffer.c index 1a80b04..330cbce 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -968,6 +968,7 @@ init_page_buffers(struct page *page, struct block_device *bdev, struct buffer_head *head = page_buffers(page); struct buffer_head *bh = head; int uptodate = PageUptodate(page); + sector_t end_block = blkdev_max_block(I_BDEV(bdev->bd_inode)); do { if (!buffer_mapped(bh)) { @@ -976,7 +977,8 @@ init_page_buffers(struct page *page, struct block_device *bdev, bh->b_blocknr = block; if (uptodate) set_buffer_uptodate(bh); - set_buffer_mapped(bh); + if (block < end_block) + set_buffer_mapped(bh); } block++; bh = bh->b_this_page; -- cgit v1.1 From 2ec196c975ffb8076df77f6fa929448717e5141b Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Mon, 21 May 2012 16:06:20 -0700 Subject: vfs: make AIO use the proper rw_verify_area() area helpers commit a70b52ec1aaeaf60f4739edb1b422827cb6f3893 upstream. We had for some reason overlooked the AIO interface, and it didn't use the proper rw_verify_area() helper function that checks (for example) mandatory locking on the file, and that the size of the access doesn't cause us to overflow the provided offset limits etc. Instead, AIO did just the security_file_permission() thing (that rw_verify_area() also does) directly. This fixes it to do all the proper helper functions, which not only means that now mandatory file locking works with AIO too, we can actually remove lines of code. Reported-by: Manish Honap Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- fs/aio.c | 30 ++++++++++++++---------------- 1 file changed, 14 insertions(+), 16 deletions(-) (limited to 'fs') diff --git a/fs/aio.c b/fs/aio.c index b4a88cc..278ed7d 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1395,6 +1395,10 @@ static ssize_t aio_setup_vectored_rw(int type, struct kiocb *kiocb, bool compat) if (ret < 0) goto out; + ret = rw_verify_area(type, kiocb->ki_filp, &kiocb->ki_pos, ret); + if (ret < 0) + goto out; + kiocb->ki_nr_segs = kiocb->ki_nbytes; kiocb->ki_cur_seg = 0; /* ki_nbytes/left now reflect bytes instead of segs */ @@ -1406,11 +1410,17 @@ out: return ret; } -static ssize_t aio_setup_single_vector(struct kiocb *kiocb) +static ssize_t aio_setup_single_vector(int type, struct file * file, struct kiocb *kiocb) { + int bytes; + + bytes = rw_verify_area(type, file, &kiocb->ki_pos, kiocb->ki_left); + if (bytes < 0) + return bytes; + kiocb->ki_iovec = &kiocb->ki_inline_vec; kiocb->ki_iovec->iov_base = kiocb->ki_buf; - kiocb->ki_iovec->iov_len = kiocb->ki_left; + kiocb->ki_iovec->iov_len = bytes; kiocb->ki_nr_segs = 1; kiocb->ki_cur_seg = 0; return 0; @@ -1435,10 +1445,7 @@ static ssize_t aio_setup_iocb(struct kiocb *kiocb, bool compat) if (unlikely(!access_ok(VERIFY_WRITE, kiocb->ki_buf, kiocb->ki_left))) break; - ret = security_file_permission(file, MAY_READ); - if (unlikely(ret)) - break; - ret = aio_setup_single_vector(kiocb); + ret = aio_setup_single_vector(READ, file, kiocb); if (ret) break; ret = -EINVAL; @@ -1453,10 +1460,7 @@ static ssize_t aio_setup_iocb(struct kiocb *kiocb, bool compat) if (unlikely(!access_ok(VERIFY_READ, kiocb->ki_buf, kiocb->ki_left))) break; - ret = security_file_permission(file, MAY_WRITE); - if (unlikely(ret)) - break; - ret = aio_setup_single_vector(kiocb); + ret = aio_setup_single_vector(WRITE, file, kiocb); if (ret) break; ret = -EINVAL; @@ -1467,9 +1471,6 @@ static ssize_t aio_setup_iocb(struct kiocb *kiocb, bool compat) ret = -EBADF; if (unlikely(!(file->f_mode & FMODE_READ))) break; - ret = security_file_permission(file, MAY_READ); - if (unlikely(ret)) - break; ret = aio_setup_vectored_rw(READ, kiocb, compat); if (ret) break; @@ -1481,9 +1482,6 @@ static ssize_t aio_setup_iocb(struct kiocb *kiocb, bool compat) ret = -EBADF; if (unlikely(!(file->f_mode & FMODE_WRITE))) break; - ret = security_file_permission(file, MAY_WRITE); - if (unlikely(ret)) - break; ret = aio_setup_vectored_rw(WRITE, kiocb, compat); if (ret) break; -- cgit v1.1 From 93b71523531161bfdd2ac30dd1ea4bb70d1708bd Mon Sep 17 00:00:00 2001 From: Shirish Pargaonkar Date: Mon, 21 May 2012 09:20:12 -0500 Subject: cifs: fix oops while traversing open file list (try #4) commit 2c0c2a08bed7a3b791f88d09d16ace56acb3dd98 upstream. While traversing the linked list of open file handles, if the identfied file handle is invalid, a reopen is attempted and if it fails, we resume traversing where we stopped and cifs can oops while accessing invalid next element, for list might have changed. So mark the invalid file handle and attempt reopen if no valid file handle is found in rest of the list. If reopen fails, move the invalid file handle to the end of the list and start traversing the list again from the begining. Repeat this four times before giving up and returning an error if file reopen keeps failing. Signed-off-by: Shirish Pargaonkar Reviewed-by: Jeff Layton Signed-off-by: Steve French Signed-off-by: Greg Kroah-Hartman --- fs/cifs/cifsglob.h | 1 + fs/cifs/file.c | 57 +++++++++++++++++++++++++++++++----------------------- 2 files changed, 34 insertions(+), 24 deletions(-) (limited to 'fs') diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h index 6255fa8..7cb9dd2 100644 --- a/fs/cifs/cifsglob.h +++ b/fs/cifs/cifsglob.h @@ -43,6 +43,7 @@ #define CIFS_MIN_RCV_POOL 4 +#define MAX_REOPEN_ATT 5 /* these many maximum attempts to reopen a file */ /* * default attribute cache timeout (jiffies) */ diff --git a/fs/cifs/file.c b/fs/cifs/file.c index a9b4a24..9040cb0 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -973,10 +973,11 @@ struct cifsFileInfo *find_readable_file(struct cifsInodeInfo *cifs_inode, struct cifsFileInfo *find_writable_file(struct cifsInodeInfo *cifs_inode, bool fsuid_only) { - struct cifsFileInfo *open_file; + struct cifsFileInfo *open_file, *inv_file = NULL; struct cifs_sb_info *cifs_sb; bool any_available = false; int rc; + unsigned int refind = 0; /* Having a null inode here (because mapping->host was set to zero by the VFS or MM) should not happen but we had reports of on oops (due to @@ -996,40 +997,25 @@ struct cifsFileInfo *find_writable_file(struct cifsInodeInfo *cifs_inode, spin_lock(&cifs_file_list_lock); refind_writable: + if (refind > MAX_REOPEN_ATT) { + spin_unlock(&cifs_file_list_lock); + return NULL; + } list_for_each_entry(open_file, &cifs_inode->openFileList, flist) { if (!any_available && open_file->pid != current->tgid) continue; if (fsuid_only && open_file->uid != current_fsuid()) continue; if (OPEN_FMODE(open_file->f_flags) & FMODE_WRITE) { - cifsFileInfo_get(open_file); - if (!open_file->invalidHandle) { /* found a good writable file */ + cifsFileInfo_get(open_file); spin_unlock(&cifs_file_list_lock); return open_file; + } else { + if (!inv_file) + inv_file = open_file; } - - spin_unlock(&cifs_file_list_lock); - - /* Had to unlock since following call can block */ - rc = cifs_reopen_file(open_file, false); - if (!rc) - return open_file; - - /* if it fails, try another handle if possible */ - cFYI(1, "wp failed on reopen file"); - cifsFileInfo_put(open_file); - - spin_lock(&cifs_file_list_lock); - - /* else we simply continue to the next entry. Thus - we do not loop on reopen errors. If we - can not reopen the file, for example if we - reconnected to a server with another client - racing to delete or lock the file we would not - make progress if we restarted before the beginning - of the loop here. */ } } /* couldn't find useable FH with same pid, try any available */ @@ -1037,7 +1023,30 @@ refind_writable: any_available = true; goto refind_writable; } + + if (inv_file) { + any_available = false; + cifsFileInfo_get(inv_file); + } + spin_unlock(&cifs_file_list_lock); + + if (inv_file) { + rc = cifs_reopen_file(inv_file, false); + if (!rc) + return inv_file; + else { + spin_lock(&cifs_file_list_lock); + list_move_tail(&inv_file->flist, + &cifs_inode->openFileList); + spin_unlock(&cifs_file_list_lock); + cifsFileInfo_put(inv_file); + spin_lock(&cifs_file_list_lock); + ++refind; + goto refind_writable; + } + } + return NULL; } -- cgit v1.1 From 6ab5902511d32b50b4b973320979ba99693aa5be Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Mon, 28 May 2012 11:36:28 -0400 Subject: NFSv4: Map NFS4ERR_SHARE_DENIED into an EACCES error instead of EIO commit fb13bfa7e1bcfdcfdece47c24b62f1a1cad957e9 upstream. If a file OPEN is denied due to a share lock, the resulting NFS4ERR_SHARE_DENIED is currently mapped to the default EIO. This patch adds a more appropriate mapping, and brings Linux into line with what Solaris 10 does. See https://bugzilla.kernel.org/show_bug.cgi?id=43286 Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman --- fs/nfs/nfs4proc.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'fs') diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index 30f6548..b7a7e5f 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -94,6 +94,8 @@ static int nfs4_map_errors(int err) case -NFS4ERR_BADOWNER: case -NFS4ERR_BADNAME: return -EINVAL; + case -NFS4ERR_SHARE_DENIED: + return -EACCES; default: dprintk("%s could not handle NFSv4 error %d\n", __func__, -err); -- cgit v1.1 From 6baeff72b712f09aec676153d5b752dffaf90c17 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Tue, 29 May 2012 22:03:48 -0400 Subject: vfs: umount_tree() might be called on subtree that had never made it commit 63d37a84ab6004c235314ffd7a76c5eb28c2fae0 upstream. __mnt_make_shortterm() in there undoes the effect of __mnt_make_longterm() we'd done back when we set ->mnt_ns non-NULL; it should not be done to vfsmounts that had never gone through commit_tree() and friends. Kudos to lczerner for catching that one... Signed-off-by: Al Viro Signed-off-by: Greg Kroah-Hartman --- fs/namespace.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/namespace.c b/fs/namespace.c index edc1c4a..b3d8f51 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -1244,8 +1244,9 @@ void umount_tree(struct vfsmount *mnt, int propagate, struct list_head *kill) list_del_init(&p->mnt_expire); list_del_init(&p->mnt_list); __touch_mnt_namespace(p->mnt_ns); + if (p->mnt_ns) + __mnt_make_shortterm(p); p->mnt_ns = NULL; - __mnt_make_shortterm(p); list_del_init(&p->mnt_child); if (p->mnt_parent != p) { p->mnt_parent->mnt_ghosts++; -- cgit v1.1 From e36db7f818de59e3b0fdeaabda97e696a177b9a9 Mon Sep 17 00:00:00 2001 From: Eric Sandeen Date: Mon, 28 May 2012 14:17:25 -0400 Subject: ext4: force ro mount if ext4_setup_super() fails commit 7e84b6216467b84cd332c8e567bf5aa113fd2f38 upstream. If ext4_setup_super() fails i.e. due to a too-high revision, the error is logged in dmesg but the fs is not mounted RO as indicated. Tested by: # mkfs.ext4 -r 4 /dev/sdb6 # mount /dev/sdb6 /mnt/test # dmesg | grep "too high" [164919.759248] EXT4-fs (sdb6): revision level too high, forcing read-only mode # grep sdb6 /proc/mounts /dev/sdb6 /mnt/test2 ext4 rw,seclabel,relatime,data=ordered 0 0 Reviewed-by: Andreas Dilger Signed-off-by: Eric Sandeen Signed-off-by: "Theodore Ts'o" Signed-off-by: Greg Kroah-Hartman --- fs/ext4/super.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/ext4/super.c b/fs/ext4/super.c index df121b2..be6c0cc 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -3618,7 +3618,8 @@ no_journal: goto failed_mount4; } - ext4_setup_super(sb, es, sb->s_flags & MS_RDONLY); + if (ext4_setup_super(sb, es, sb->s_flags & MS_RDONLY)) + sb->s_flags |= MS_RDONLY; /* determine the minimum size of new large inodes, if present */ if (sbi->s_inode_size > EXT4_GOOD_OLD_INODE_SIZE) { -- cgit v1.1 From 801bdd926b6229da233f6db25770c9e817f98d4e Mon Sep 17 00:00:00 2001 From: Theodore Ts'o Date: Wed, 30 May 2012 23:00:16 -0400 Subject: ext4: add missing save_error_info() to ext4_error() commit f3fc0210c0fc91900766c995f089c39170e68305 upstream. The ext4_error() function is missing a call to save_error_info(). Since this is the function which marks the file system as containing an error, this oversight (which was introduced in 2.6.36) is quite significant, and should be backported to older stable kernels with high urgency. Reported-by: Ken Sumrall Signed-off-by: "Theodore Ts'o" Cc: ksumrall@google.com Signed-off-by: Greg Kroah-Hartman --- fs/ext4/super.c | 1 + 1 file changed, 1 insertion(+) (limited to 'fs') diff --git a/fs/ext4/super.c b/fs/ext4/super.c index be6c0cc..113b107 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -433,6 +433,7 @@ void __ext4_error(struct super_block *sb, const char *function, printk(KERN_CRIT "EXT4-fs error (device %s): %s:%d: comm %s: %pV\n", sb->s_id, function, line, current->comm, &vaf); va_end(args); + save_error_info(sb, function, line); ext4_handle_error(sb); } -- cgit v1.1 From eeb7cb57cf619ae9ab8210b21b49820ed40a472f Mon Sep 17 00:00:00 2001 From: Theodore Ts'o Date: Thu, 31 May 2012 23:46:01 -0400 Subject: ext4: don't trash state flags in EXT4_IOC_SETFLAGS commit 79906964a187c405db72a3abc60eb9b50d804fbc upstream. In commit 353eb83c we removed i_state_flags with 64-bit longs, But when handling the EXT4_IOC_SETFLAGS ioctl, we replace i_flags directly, which trashes the state flags which are stored in the high 32-bits of i_flags on 64-bit platforms. So use the the ext4_{set,clear}_inode_flags() functions which use atomic bit manipulation functions instead. Reported-by: Tao Ma Signed-off-by: "Theodore Ts'o" Signed-off-by: Greg Kroah-Hartman --- fs/ext4/ioctl.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) (limited to 'fs') diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c index 808c554..892427d 100644 --- a/fs/ext4/ioctl.c +++ b/fs/ext4/ioctl.c @@ -35,7 +35,7 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) handle_t *handle = NULL; int err, migrate = 0; struct ext4_iloc iloc; - unsigned int oldflags; + unsigned int oldflags, mask, i; unsigned int jflag; if (!inode_owner_or_capable(inode)) @@ -112,8 +112,14 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) if (err) goto flags_err; - flags = flags & EXT4_FL_USER_MODIFIABLE; - flags |= oldflags & ~EXT4_FL_USER_MODIFIABLE; + for (i = 0, mask = 1; i < 32; i++, mask <<= 1) { + if (!(mask & EXT4_FL_USER_MODIFIABLE)) + continue; + if (mask & flags) + ext4_set_inode_flag(inode, i); + else + ext4_clear_inode_flag(inode, i); + } ei->i_flags = flags; ext4_set_inode_flags(inode); -- cgit v1.1 From 97434cf53353728708c133af183a11a158c8c26a Mon Sep 17 00:00:00 2001 From: Salman Qazi Date: Thu, 31 May 2012 23:51:27 -0400 Subject: ext4: add ext4_mb_unload_buddy in the error path commit 02b7831019ea4e7994968c84b5826fa8b248ffc8 upstream. ext4_free_blocks fails to pair an ext4_mb_load_buddy with a matching ext4_mb_unload_buddy when it fails a memory allocation. Signed-off-by: Salman Qazi Signed-off-by: "Theodore Ts'o" Signed-off-by: Greg Kroah-Hartman --- fs/ext4/mballoc.c | 1 + 1 file changed, 1 insertion(+) (limited to 'fs') diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 0f1be7f..e3d5575 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -4583,6 +4583,7 @@ do_more: */ new_entry = kmem_cache_alloc(ext4_free_ext_cachep, GFP_NOFS); if (!new_entry) { + ext4_mb_unload_buddy(&e4b); err = -ENOMEM; goto error_return; } -- cgit v1.1 From 32e090b1f4bdfe9756e1b8f0b5280acb036d1c61 Mon Sep 17 00:00:00 2001 From: Salman Qazi Date: Thu, 31 May 2012 23:52:14 -0400 Subject: ext4: remove mb_groups before tearing down the buddy_cache commit 95599968d19db175829fb580baa6b68939b320fb upstream. We can't have references held on pages in the s_buddy_cache while we are trying to truncate its pages and put the inode. All the pages must be gone before we reach clear_inode. This can only be gauranteed if we can prevent new users from grabbing references to s_buddy_cache's pages. The original bug can be reproduced and the bug fix can be verified by: while true; do mount -t ext4 /dev/ram0 /export/hda3/ram0; \ umount /export/hda3/ram0; done & while true; do cat /proc/fs/ext4/ram0/mb_groups; done Signed-off-by: Salman Qazi Signed-off-by: "Theodore Ts'o" Signed-off-by: Greg Kroah-Hartman --- fs/ext4/mballoc.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'fs') diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index e3d5575..b6adf68 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -2528,6 +2528,9 @@ int ext4_mb_release(struct super_block *sb) struct ext4_sb_info *sbi = EXT4_SB(sb); struct kmem_cache *cachep = get_groupinfo_cache(sb->s_blocksize_bits); + if (sbi->s_proc) + remove_proc_entry("mb_groups", sbi->s_proc); + if (sbi->s_group_info) { for (i = 0; i < ngroups; i++) { grinfo = ext4_get_group_info(sb, i); @@ -2575,8 +2578,6 @@ int ext4_mb_release(struct super_block *sb) } free_percpu(sbi->s_locality_groups); - if (sbi->s_proc) - remove_proc_entry("mb_groups", sbi->s_proc); return 0; } -- cgit v1.1 From 749c8151fdf307fb7527aded025850027d44aadc Mon Sep 17 00:00:00 2001 From: Tao Ma Date: Thu, 7 Jun 2012 19:04:19 -0400 Subject: ext4: don't set i_flags in EXT4_IOC_SETFLAGS commit b22b1f178f6799278d3178d894f37facb2085765 upstream. Commit 7990696 uses the ext4_{set,clear}_inode_flags() functions to change the i_flags automatically but fails to remove the error setting of i_flags. So we still have the problem of trashing state flags. Fix this by removing the assignment. Signed-off-by: Tao Ma Signed-off-by: "Theodore Ts'o" Signed-off-by: Greg Kroah-Hartman --- fs/ext4/ioctl.c | 1 - 1 file changed, 1 deletion(-) (limited to 'fs') diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c index 892427d..4cbe1c2 100644 --- a/fs/ext4/ioctl.c +++ b/fs/ext4/ioctl.c @@ -120,7 +120,6 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) else ext4_clear_inode_flag(inode, i); } - ei->i_flags = flags; ext4_set_inode_flags(inode); inode->i_ctime = ext4_current_time(inode); -- cgit v1.1 From 6140710c5dba509a20b26dfe38b58f40baf2a2c8 Mon Sep 17 00:00:00 2001 From: Pavel Shilovsky Date: Thu, 10 May 2012 19:49:38 +0400 Subject: fuse: fix stat call on 32 bit platforms commit 45c72cd73c788dd18c8113d4a404d6b4a01decf1 upstream. Now we store attr->ino at inode->i_ino, return attr->ino at the first time and then return inode->i_ino if the attribute timeout isn't expired. That's wrong on 32 bit platforms because attr->ino is 64 bit and inode->i_ino is 32 bit in this case. Fix this by saving 64 bit ino in fuse_inode structure and returning it every time we call getattr. Also squash attr->ino into inode->i_ino explicitly. Signed-off-by: Pavel Shilovsky Signed-off-by: Miklos Szeredi Signed-off-by: Greg Kroah-Hartman --- fs/fuse/dir.c | 1 + fs/fuse/fuse_i.h | 3 +++ fs/fuse/inode.c | 17 ++++++++++++++++- 3 files changed, 20 insertions(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index d5016071..c04a025 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -858,6 +858,7 @@ int fuse_update_attributes(struct inode *inode, struct kstat *stat, if (stat) { generic_fillattr(inode, stat); stat->mode = fi->orig_i_mode; + stat->ino = fi->orig_ino; } } diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index b788bec..f621550 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -82,6 +82,9 @@ struct fuse_inode { preserve the original mode */ mode_t orig_i_mode; + /** 64 bit inode number */ + u64 orig_ino; + /** Version of last attribute change */ u64 attr_version; diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 38f84cd..69a1e0f 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -91,6 +91,7 @@ static struct inode *fuse_alloc_inode(struct super_block *sb) fi->nlookup = 0; fi->attr_version = 0; fi->writectr = 0; + fi->orig_ino = 0; INIT_LIST_HEAD(&fi->write_files); INIT_LIST_HEAD(&fi->queued_writes); INIT_LIST_HEAD(&fi->writepages); @@ -140,6 +141,18 @@ static int fuse_remount_fs(struct super_block *sb, int *flags, char *data) return 0; } +/* + * ino_t is 32-bits on 32-bit arch. We have to squash the 64-bit value down + * so that it will fit. + */ +static ino_t fuse_squash_ino(u64 ino64) +{ + ino_t ino = (ino_t) ino64; + if (sizeof(ino_t) < sizeof(u64)) + ino ^= ino64 >> (sizeof(u64) - sizeof(ino_t)) * 8; + return ino; +} + void fuse_change_attributes_common(struct inode *inode, struct fuse_attr *attr, u64 attr_valid) { @@ -149,7 +162,7 @@ void fuse_change_attributes_common(struct inode *inode, struct fuse_attr *attr, fi->attr_version = ++fc->attr_version; fi->i_time = attr_valid; - inode->i_ino = attr->ino; + inode->i_ino = fuse_squash_ino(attr->ino); inode->i_mode = (inode->i_mode & S_IFMT) | (attr->mode & 07777); inode->i_nlink = attr->nlink; inode->i_uid = attr->uid; @@ -175,6 +188,8 @@ void fuse_change_attributes_common(struct inode *inode, struct fuse_attr *attr, fi->orig_i_mode = inode->i_mode; if (!(fc->flags & FUSE_DEFAULT_PERMISSIONS)) inode->i_mode &= ~S_ISVTX; + + fi->orig_ino = attr->ino; } void fuse_change_attributes(struct inode *inode, struct fuse_attr *attr, -- cgit v1.1 From 1466988e8be36b25f01123798ce430176911c3c5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Janne=20Kalliom=C3=A4ki?= Date: Sun, 17 Jun 2012 17:05:24 -0400 Subject: hfsplus: fix overflow in sector calculations in hfsplus_submit_bio MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit a6dc8c04218eb752ff79cdc24a995cf51866caed upstream. The variable io_size was unsigned int, which caused the wrong sector number to be calculated after aligning it. This then caused mount to fail with big volumes, as backup volume header information was searched from a wrong sector. Signed-off-by: Janne Kalliomäki Signed-off-by: Christoph Hellwig Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- fs/hfsplus/wrapper.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/hfsplus/wrapper.c b/fs/hfsplus/wrapper.c index 7b8112da..aac1563 100644 --- a/fs/hfsplus/wrapper.c +++ b/fs/hfsplus/wrapper.c @@ -56,7 +56,7 @@ int hfsplus_submit_bio(struct super_block *sb, sector_t sector, DECLARE_COMPLETION_ONSTACK(wait); struct bio *bio; int ret = 0; - unsigned int io_size; + u64 io_size; loff_t start; int offset; -- cgit v1.1 From f66c6795bdc2dd43b749e90a07c3e1f12caafede Mon Sep 17 00:00:00 2001 From: Ryusuke Konishi Date: Wed, 20 Jun 2012 12:52:57 -0700 Subject: nilfs2: ensure proper cache clearing for gc-inodes commit fbb24a3a915f105016f1c828476be11aceac8504 upstream. A gc-inode is a pseudo inode used to buffer the blocks to be moved by garbage collection. Block caches of gc-inodes must be cleared every time a garbage collection function (nilfs_clean_segments) completes. Otherwise, stale blocks buffered in the caches may be wrongly reused in successive calls of the GC function. For user files, this is not a problem because their gc-inodes are distinguished by a checkpoint number as well as an inode number. They never buffer different blocks if either an inode number, a checkpoint number, or a block offset differs. However, gc-inodes of sufile, cpfile and DAT file can store different data for the same block offset. Thus, the nilfs_clean_segments function can move incorrect block for these meta-data files if an old block is cached. I found this is really causing meta-data corruption in nilfs. This fixes the issue by ensuring cache clear of gc-inodes and resolves reported GC problems including checkpoint file corruption, b-tree corruption, and the following warning during GC. nilfs_palloc_freev: entry number 307234 already freed. ... Signed-off-by: Ryusuke Konishi Tested-by: Ryusuke Konishi Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- fs/nilfs2/gcinode.c | 2 ++ fs/nilfs2/segment.c | 2 ++ 2 files changed, 4 insertions(+) (limited to 'fs') diff --git a/fs/nilfs2/gcinode.c b/fs/nilfs2/gcinode.c index 08a07a2..57ceaf3 100644 --- a/fs/nilfs2/gcinode.c +++ b/fs/nilfs2/gcinode.c @@ -191,6 +191,8 @@ void nilfs_remove_all_gcinodes(struct the_nilfs *nilfs) while (!list_empty(head)) { ii = list_first_entry(head, struct nilfs_inode_info, i_dirty); list_del_init(&ii->i_dirty); + truncate_inode_pages(&ii->vfs_inode.i_data, 0); + nilfs_btnode_cache_clear(&ii->i_btnode_cache); iput(&ii->vfs_inode); } } diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c index bb24ab6..6f24e67 100644 --- a/fs/nilfs2/segment.c +++ b/fs/nilfs2/segment.c @@ -2309,6 +2309,8 @@ nilfs_remove_written_gcinodes(struct the_nilfs *nilfs, struct list_head *head) if (!test_bit(NILFS_I_UPDATED, &ii->i_state)) continue; list_del_init(&ii->i_dirty); + truncate_inode_pages(&ii->vfs_inode.i_data, 0); + nilfs_btnode_cache_clear(&ii->i_btnode_cache); iput(&ii->vfs_inode); } } -- cgit v1.1 From f8db7530c08115356964240e4f9cc90fd112d251 Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Wed, 27 Jun 2012 20:08:44 +0200 Subject: udf: Use 'ret' instead of abusing 'i' in udf_load_logicalvol() commit cb14d340ef1737c24125dd663eff77734a482d47 upstream. Signed-off-by: Jan Kara Signed-off-by: Greg Kroah-Hartman --- fs/udf/super.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) (limited to 'fs') diff --git a/fs/udf/super.c b/fs/udf/super.c index 7f0e18a..0bb6a6d 100644 --- a/fs/udf/super.c +++ b/fs/udf/super.c @@ -1262,11 +1262,9 @@ static int udf_load_logicalvol(struct super_block *sb, sector_t block, BUG_ON(ident != TAG_IDENT_LVD); lvd = (struct logicalVolDesc *)bh->b_data; - i = udf_sb_alloc_partition_maps(sb, le32_to_cpu(lvd->numPartitionMaps)); - if (i != 0) { - ret = i; + ret = udf_sb_alloc_partition_maps(sb, le32_to_cpu(lvd->numPartitionMaps)); + if (ret) goto out_bh; - } for (i = 0, offset = 0; i < sbi->s_partitions && offset < le32_to_cpu(lvd->mapTableLength); -- cgit v1.1 From 8411aa07c7aa22ef3fe269a05e45e672590e4f7f Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Wed, 27 Jun 2012 20:20:22 +0200 Subject: udf: Avoid run away loop when partition table length is corrupted commit adee11b2085bee90bd8f4f52123ffb07882d6256 upstream. Check provided length of partition table so that (possibly maliciously) corrupted partition table cannot cause accessing data beyond current buffer. Signed-off-by: Jan Kara Signed-off-by: Greg Kroah-Hartman --- fs/udf/super.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/udf/super.c b/fs/udf/super.c index 0bb6a6d..ee31a2a 100644 --- a/fs/udf/super.c +++ b/fs/udf/super.c @@ -1254,6 +1254,7 @@ static int udf_load_logicalvol(struct super_block *sb, sector_t block, struct genericPartitionMap *gpm; uint16_t ident; struct buffer_head *bh; + unsigned int table_len; int ret = 0; bh = udf_read_tagged(sb, block, block, &ident); @@ -1261,13 +1262,20 @@ static int udf_load_logicalvol(struct super_block *sb, sector_t block, return 1; BUG_ON(ident != TAG_IDENT_LVD); lvd = (struct logicalVolDesc *)bh->b_data; + table_len = le32_to_cpu(lvd->mapTableLength); + if (sizeof(*lvd) + table_len > sb->s_blocksize) { + udf_error(sb, __func__, "error loading logical volume descriptor: " + "Partition table too long (%u > %lu)\n", table_len, + sb->s_blocksize - sizeof(*lvd)); + goto out_bh; + } ret = udf_sb_alloc_partition_maps(sb, le32_to_cpu(lvd->numPartitionMaps)); if (ret) goto out_bh; for (i = 0, offset = 0; - i < sbi->s_partitions && offset < le32_to_cpu(lvd->mapTableLength); + i < sbi->s_partitions && offset < table_len; i++, offset += gpm->partitionMapLength) { struct udf_part_map *map = &sbi->s_partmaps[i]; gpm = (struct genericPartitionMap *) -- cgit v1.1 From b1c5701ad6b3e5d21d16f65475651cfaaa41e7aa Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Wed, 27 Jun 2012 21:23:07 +0200 Subject: udf: Fortify loading of sparing table commit 1df2ae31c724e57be9d7ac00d78db8a5dabdd050 upstream. Add sanity checks when loading sparing table from disk to avoid accessing unallocated memory or writing to it. Signed-off-by: Jan Kara Signed-off-by: Greg Kroah-Hartman --- fs/udf/super.c | 86 ++++++++++++++++++++++++++++++++++++---------------------- 1 file changed, 53 insertions(+), 33 deletions(-) (limited to 'fs') diff --git a/fs/udf/super.c b/fs/udf/super.c index ee31a2a..a8e867a 100644 --- a/fs/udf/super.c +++ b/fs/udf/super.c @@ -56,6 +56,7 @@ #include #include #include +#include #include #include "udf_sb.h" @@ -1244,11 +1245,59 @@ out_bh: return ret; } +static int udf_load_sparable_map(struct super_block *sb, + struct udf_part_map *map, + struct sparablePartitionMap *spm) +{ + uint32_t loc; + uint16_t ident; + struct sparingTable *st; + struct udf_sparing_data *sdata = &map->s_type_specific.s_sparing; + int i; + struct buffer_head *bh; + + map->s_partition_type = UDF_SPARABLE_MAP15; + sdata->s_packet_len = le16_to_cpu(spm->packetLength); + if (!is_power_of_2(sdata->s_packet_len)) { + udf_error(sb, __func__, "error loading logical volume descriptor: " + "Invalid packet length %u\n", + (unsigned)sdata->s_packet_len); + return -EIO; + } + if (spm->numSparingTables > 4) { + udf_error(sb, __func__, "error loading logical volume descriptor: " + "Too many sparing tables (%d)\n", + (int)spm->numSparingTables); + return -EIO; + } + + for (i = 0; i < spm->numSparingTables; i++) { + loc = le32_to_cpu(spm->locSparingTable[i]); + bh = udf_read_tagged(sb, loc, loc, &ident); + if (!bh) + continue; + + st = (struct sparingTable *)bh->b_data; + if (ident != 0 || + strncmp(st->sparingIdent.ident, UDF_ID_SPARING, + strlen(UDF_ID_SPARING)) || + sizeof(*st) + le16_to_cpu(st->reallocationTableLen) > + sb->s_blocksize) { + brelse(bh); + continue; + } + + sdata->s_spar_map[i] = bh; + } + map->s_partition_func = udf_get_pblock_spar15; + return 0; +} + static int udf_load_logicalvol(struct super_block *sb, sector_t block, struct kernel_lb_addr *fileset) { struct logicalVolDesc *lvd; - int i, j, offset; + int i, offset; uint8_t type; struct udf_sb_info *sbi = UDF_SB(sb); struct genericPartitionMap *gpm; @@ -1310,38 +1359,9 @@ static int udf_load_logicalvol(struct super_block *sb, sector_t block, } else if (!strncmp(upm2->partIdent.ident, UDF_ID_SPARABLE, strlen(UDF_ID_SPARABLE))) { - uint32_t loc; - struct sparingTable *st; - struct sparablePartitionMap *spm = - (struct sparablePartitionMap *)gpm; - - map->s_partition_type = UDF_SPARABLE_MAP15; - map->s_type_specific.s_sparing.s_packet_len = - le16_to_cpu(spm->packetLength); - for (j = 0; j < spm->numSparingTables; j++) { - struct buffer_head *bh2; - - loc = le32_to_cpu( - spm->locSparingTable[j]); - bh2 = udf_read_tagged(sb, loc, loc, - &ident); - map->s_type_specific.s_sparing. - s_spar_map[j] = bh2; - - if (bh2 == NULL) - continue; - - st = (struct sparingTable *)bh2->b_data; - if (ident != 0 || strncmp( - st->sparingIdent.ident, - UDF_ID_SPARING, - strlen(UDF_ID_SPARING))) { - brelse(bh2); - map->s_type_specific.s_sparing. - s_spar_map[j] = NULL; - } - } - map->s_partition_func = udf_get_pblock_spar15; + if (udf_load_sparable_map(sb, map, + (struct sparablePartitionMap *)gpm) < 0) + goto out_bh; } else if (!strncmp(upm2->partIdent.ident, UDF_ID_METADATA, strlen(UDF_ID_METADATA))) { -- cgit v1.1 From d0f7cf8a1ab0479ecc37989bd332c28d5ff04f89 Mon Sep 17 00:00:00 2001 From: Chris Mason Date: Mon, 2 Jul 2012 15:29:53 -0400 Subject: Btrfs: run delayed directory updates during log replay commit b6305567e7d31b0bec1b8cb9ec0cadd7f7086f5f upstream. While we are resolving directory modifications in the tree log, we are triggering delayed metadata updates to the filesystem btrees. This commit forces the delayed updates to run so the replay code can find any modifications done. It stops us from crashing because the directory deleltion replay expects items to be removed immediately from the tree. Signed-off-by: Chris Mason Signed-off-by: Greg Kroah-Hartman --- fs/btrfs/tree-log.c | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'fs') diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 7fa128d..faf7d0b 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -691,6 +691,8 @@ static noinline int drop_one_dir_item(struct btrfs_trans_handle *trans, kfree(name); iput(inode); + + btrfs_run_delayed_items(trans, root); return ret; } @@ -896,6 +898,7 @@ again: ret = btrfs_unlink_inode(trans, root, dir, inode, victim_name, victim_name_len); + btrfs_run_delayed_items(trans, root); } kfree(victim_name); ptr = (unsigned long)(victim_ref + 1) + victim_name_len; @@ -1476,6 +1479,9 @@ again: ret = btrfs_unlink_inode(trans, root, dir, inode, name, name_len); BUG_ON(ret); + + btrfs_run_delayed_items(trans, root); + kfree(name); iput(inode); -- cgit v1.1 From a2f2aa2f0c648d1dc22cf9ef4990cfc0b721add8 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Sat, 7 Jul 2012 10:17:00 -0700 Subject: vfs: make O_PATH file descriptors usable for 'fchdir()' MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 332a2e1244bd08b9e3ecd378028513396a004a24 upstream. We already use them for openat() and friends, but fchdir() also wants to be able to use O_PATH file descriptors. This should make it comparable to the O_SEARCH of Solaris. In particular, O_PATH allows you to access (not-quite-open) a directory you don't have read persmission to, only execute permission. Noticed during development of multithread support for ksh93. Reported-by: ольга крыжановская Cc: Al Viro Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- fs/open.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'fs') diff --git a/fs/open.c b/fs/open.c index b52cf01..7e18c4d 100644 --- a/fs/open.c +++ b/fs/open.c @@ -396,10 +396,10 @@ SYSCALL_DEFINE1(fchdir, unsigned int, fd) { struct file *file; struct inode *inode; - int error; + int error, fput_needed; error = -EBADF; - file = fget(fd); + file = fget_raw_light(fd, &fput_needed); if (!file) goto out; @@ -413,7 +413,7 @@ SYSCALL_DEFINE1(fchdir, unsigned int, fd) if (!error) set_fs_pwd(current->fs, &file->f_path); out_putf: - fput(file); + fput_light(file, fput_needed); out: return error; } -- cgit v1.1 From 542b7a372e617cf3db2f659378e1ed3342fdc31b Mon Sep 17 00:00:00 2001 From: Tyler Hicks Date: Mon, 11 Jun 2012 09:24:11 -0700 Subject: eCryptfs: Gracefully refuse miscdev file ops on inherited/passed files commit 8dc6780587c99286c0d3de747a2946a76989414a upstream. File operations on /dev/ecryptfs would BUG() when the operations were performed by processes other than the process that originally opened the file. This could happen with open files inherited after fork() or file descriptors passed through IPC mechanisms. Rather than calling BUG(), an error code can be safely returned in most situations. In ecryptfs_miscdev_release(), eCryptfs still needs to handle the release even if the last file reference is being held by a process that didn't originally open the file. ecryptfs_find_daemon_by_euid() will not be successful, so a pointer to the daemon is stored in the file's private_data. The private_data pointer is initialized when the miscdev file is opened and only used when the file is released. https://launchpad.net/bugs/994247 Signed-off-by: Tyler Hicks Reported-by: Sasha Levin Tested-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- fs/ecryptfs/miscdev.c | 23 ++++++++++++++++------- 1 file changed, 16 insertions(+), 7 deletions(-) (limited to 'fs') diff --git a/fs/ecryptfs/miscdev.c b/fs/ecryptfs/miscdev.c index 0dc5a3d..a050e4b 100644 --- a/fs/ecryptfs/miscdev.c +++ b/fs/ecryptfs/miscdev.c @@ -49,7 +49,10 @@ ecryptfs_miscdev_poll(struct file *file, poll_table *pt) mutex_lock(&ecryptfs_daemon_hash_mux); /* TODO: Just use file->private_data? */ rc = ecryptfs_find_daemon_by_euid(&daemon, euid, current_user_ns()); - BUG_ON(rc || !daemon); + if (rc || !daemon) { + mutex_unlock(&ecryptfs_daemon_hash_mux); + return -EINVAL; + } mutex_lock(&daemon->mux); mutex_unlock(&ecryptfs_daemon_hash_mux); if (daemon->flags & ECRYPTFS_DAEMON_ZOMBIE) { @@ -122,6 +125,7 @@ ecryptfs_miscdev_open(struct inode *inode, struct file *file) goto out_unlock_daemon; } daemon->flags |= ECRYPTFS_DAEMON_MISCDEV_OPEN; + file->private_data = daemon; atomic_inc(&ecryptfs_num_miscdev_opens); out_unlock_daemon: mutex_unlock(&daemon->mux); @@ -152,9 +156,9 @@ ecryptfs_miscdev_release(struct inode *inode, struct file *file) mutex_lock(&ecryptfs_daemon_hash_mux); rc = ecryptfs_find_daemon_by_euid(&daemon, euid, current_user_ns()); - BUG_ON(rc || !daemon); + if (rc || !daemon) + daemon = file->private_data; mutex_lock(&daemon->mux); - BUG_ON(daemon->pid != task_pid(current)); BUG_ON(!(daemon->flags & ECRYPTFS_DAEMON_MISCDEV_OPEN)); daemon->flags &= ~ECRYPTFS_DAEMON_MISCDEV_OPEN; atomic_dec(&ecryptfs_num_miscdev_opens); @@ -246,8 +250,16 @@ ecryptfs_miscdev_read(struct file *file, char __user *buf, size_t count, mutex_lock(&ecryptfs_daemon_hash_mux); /* TODO: Just use file->private_data? */ rc = ecryptfs_find_daemon_by_euid(&daemon, euid, current_user_ns()); - BUG_ON(rc || !daemon); + if (rc || !daemon) { + mutex_unlock(&ecryptfs_daemon_hash_mux); + return -EINVAL; + } mutex_lock(&daemon->mux); + if (task_pid(current) != daemon->pid) { + mutex_unlock(&daemon->mux); + mutex_unlock(&ecryptfs_daemon_hash_mux); + return -EPERM; + } if (daemon->flags & ECRYPTFS_DAEMON_ZOMBIE) { rc = 0; mutex_unlock(&ecryptfs_daemon_hash_mux); @@ -284,9 +296,6 @@ check_list: * message from the queue; try again */ goto check_list; } - BUG_ON(euid != daemon->euid); - BUG_ON(current_user_ns() != daemon->user_ns); - BUG_ON(task_pid(current) != daemon->pid); msg_ctx = list_first_entry(&daemon->msg_ctx_out_queue, struct ecryptfs_msg_ctx, daemon_out_list); BUG_ON(!msg_ctx); -- cgit v1.1 From 092c1927ef479124a34fcf0646cf5c403f9b16e5 Mon Sep 17 00:00:00 2001 From: Tyler Hicks Date: Mon, 11 Jun 2012 10:21:34 -0700 Subject: eCryptfs: Fix lockdep warning in miscdev operations commit 60d65f1f07a7d81d3eb3b91fc13fca80f2fdbb12 upstream. Don't grab the daemon mutex while holding the message context mutex. Addresses this lockdep warning: ecryptfsd/2141 is trying to acquire lock: (&ecryptfs_msg_ctx_arr[i].mux){+.+.+.}, at: [] ecryptfs_miscdev_read+0x143/0x470 [ecryptfs] but task is already holding lock: (&(*daemon)->mux){+.+...}, at: [] ecryptfs_miscdev_read+0x21c/0x470 [ecryptfs] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&(*daemon)->mux){+.+...}: [] lock_acquire+0x9d/0x220 [] __mutex_lock_common+0x5a/0x4b0 [] mutex_lock_nested+0x44/0x50 [] ecryptfs_send_miscdev+0x97/0x120 [ecryptfs] [] ecryptfs_send_message+0x134/0x1e0 [ecryptfs] [] ecryptfs_generate_key_packet_set+0x2fe/0xa80 [ecryptfs] [] ecryptfs_write_metadata+0x108/0x250 [ecryptfs] [] ecryptfs_create+0x130/0x250 [ecryptfs] [] vfs_create+0xb4/0x120 [] do_last+0x8c5/0xa10 [] path_openat+0xd9/0x460 [] do_filp_open+0x42/0xa0 [] do_sys_open+0xf8/0x1d0 [] sys_open+0x21/0x30 [] system_call_fastpath+0x16/0x1b -> #0 (&ecryptfs_msg_ctx_arr[i].mux){+.+.+.}: [] __lock_acquire+0x1bf8/0x1c50 [] lock_acquire+0x9d/0x220 [] __mutex_lock_common+0x5a/0x4b0 [] mutex_lock_nested+0x44/0x50 [] ecryptfs_miscdev_read+0x143/0x470 [ecryptfs] [] vfs_read+0xb3/0x180 [] sys_read+0x4d/0x90 [] system_call_fastpath+0x16/0x1b Signed-off-by: Tyler Hicks Signed-off-by: Greg Kroah-Hartman --- fs/ecryptfs/miscdev.c | 25 +++++++++++++------------ 1 file changed, 13 insertions(+), 12 deletions(-) (limited to 'fs') diff --git a/fs/ecryptfs/miscdev.c b/fs/ecryptfs/miscdev.c index a050e4b..de42310 100644 --- a/fs/ecryptfs/miscdev.c +++ b/fs/ecryptfs/miscdev.c @@ -195,31 +195,32 @@ int ecryptfs_send_miscdev(char *data, size_t data_size, struct ecryptfs_msg_ctx *msg_ctx, u8 msg_type, u16 msg_flags, struct ecryptfs_daemon *daemon) { - int rc = 0; + struct ecryptfs_message *msg; - mutex_lock(&msg_ctx->mux); - msg_ctx->msg = kmalloc((sizeof(*msg_ctx->msg) + data_size), - GFP_KERNEL); - if (!msg_ctx->msg) { - rc = -ENOMEM; + msg = kmalloc((sizeof(*msg) + data_size), GFP_KERNEL); + if (!msg) { printk(KERN_ERR "%s: Out of memory whilst attempting " "to kmalloc(%zd, GFP_KERNEL)\n", __func__, - (sizeof(*msg_ctx->msg) + data_size)); - goto out_unlock; + (sizeof(*msg) + data_size)); + return -ENOMEM; } + + mutex_lock(&msg_ctx->mux); + msg_ctx->msg = msg; msg_ctx->msg->index = msg_ctx->index; msg_ctx->msg->data_len = data_size; msg_ctx->type = msg_type; memcpy(msg_ctx->msg->data, data, data_size); msg_ctx->msg_size = (sizeof(*msg_ctx->msg) + data_size); - mutex_lock(&daemon->mux); list_add_tail(&msg_ctx->daemon_out_list, &daemon->msg_ctx_out_queue); + mutex_unlock(&msg_ctx->mux); + + mutex_lock(&daemon->mux); daemon->num_queued_msg_ctx++; wake_up_interruptible(&daemon->wait); mutex_unlock(&daemon->mux); -out_unlock: - mutex_unlock(&msg_ctx->mux); - return rc; + + return 0; } /** -- cgit v1.1 From ad54262e86a5220e853f4aaf2cda51926e03f650 Mon Sep 17 00:00:00 2001 From: Tyler Hicks Date: Tue, 12 Jun 2012 11:17:01 -0700 Subject: eCryptfs: Properly check for O_RDONLY flag before doing privileged open commit 9fe79d7600497ed8a95c3981cbe5b73ab98222f0 upstream. If the first attempt at opening the lower file read/write fails, eCryptfs will retry using a privileged kthread. However, the privileged retry should not happen if the lower file's inode is read-only because a read/write open will still be unsuccessful. The check for determining if the open should be retried was intended to be based on the access mode of the lower file's open flags being O_RDONLY, but the check was incorrectly performed. This would cause the open to be retried by the privileged kthread, resulting in a second failed open of the lower file. This patch corrects the check to determine if the open request should be handled by the privileged kthread. Signed-off-by: Tyler Hicks Reported-by: Dan Carpenter Acked-by: Dan Carpenter Signed-off-by: Greg Kroah-Hartman --- fs/ecryptfs/kthread.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/ecryptfs/kthread.c b/fs/ecryptfs/kthread.c index 69f994a..0dbe58a 100644 --- a/fs/ecryptfs/kthread.c +++ b/fs/ecryptfs/kthread.c @@ -149,7 +149,7 @@ int ecryptfs_privileged_open(struct file **lower_file, (*lower_file) = dentry_open(lower_dentry, lower_mnt, flags, cred); if (!IS_ERR(*lower_file)) goto out; - if (flags & O_RDONLY) { + if ((flags & O_ACCMODE) == O_RDONLY) { rc = PTR_ERR((*lower_file)); goto out; } -- cgit v1.1 From f8e252d7a5687e0c9f11d3c36e3a867a1e64b418 Mon Sep 17 00:00:00 2001 From: Bob Liu Date: Wed, 11 Jul 2012 14:02:35 -0700 Subject: fs: ramfs: file-nommu: add SetPageUptodate() commit fea9f718b3d68147f162ed2d870183ce5e0ad8d8 upstream. There is a bug in the below scenario for !CONFIG_MMU: 1. create a new file 2. mmap the file and write to it 3. read the file can't get the correct value Because sys_read() -> generic_file_aio_read() -> simple_readpage() -> clear_page() which causes the page to be zeroed. Add SetPageUptodate() to ramfs_nommu_expand_for_mapping() so that generic_file_aio_read() do not call simple_readpage(). Signed-off-by: Bob Liu Cc: Hugh Dickins Cc: David Howells Cc: Geert Uytterhoeven Cc: Greg Ungerer Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- fs/ramfs/file-nommu.c | 1 + 1 file changed, 1 insertion(+) (limited to 'fs') diff --git a/fs/ramfs/file-nommu.c b/fs/ramfs/file-nommu.c index fbb0b47..d5378d0 100644 --- a/fs/ramfs/file-nommu.c +++ b/fs/ramfs/file-nommu.c @@ -110,6 +110,7 @@ int ramfs_nommu_expand_for_mapping(struct inode *inode, size_t newsize) /* prevent the page from being discarded on memory pressure */ SetPageDirty(page); + SetPageUptodate(page); unlock_page(page); put_page(page); -- cgit v1.1 From 4ff1ddad40c57605cc33a78699e4559217a06a46 Mon Sep 17 00:00:00 2001 From: Jeff Moyer Date: Thu, 12 Jul 2012 09:43:14 -0400 Subject: block: fix infinite loop in __getblk_slow commit 91f68c89d8f35fe98ea04159b9a3b42d0149478f upstream. Commit 080399aaaf35 ("block: don't mark buffers beyond end of disk as mapped") exposed a bug in __getblk_slow that causes mount to hang as it loops infinitely waiting for a buffer that lies beyond the end of the disk to become uptodate. The problem was initially reported by Torsten Hilbrich here: https://lkml.org/lkml/2012/6/18/54 and also reported independently here: http://www.sysresccd.org/forums/viewtopic.php?f=13&t=4511 and then Richard W.M. Jones and Marcos Mello noted a few separate bugzillas also associated with the same issue. This patch has been confirmed to fix: https://bugzilla.redhat.com/show_bug.cgi?id=835019 The main problem is here, in __getblk_slow: for (;;) { struct buffer_head * bh; int ret; bh = __find_get_block(bdev, block, size); if (bh) return bh; ret = grow_buffers(bdev, block, size); if (ret < 0) return NULL; if (ret == 0) free_more_memory(); } __find_get_block does not find the block, since it will not be marked as mapped, and so grow_buffers is called to fill in the buffers for the associated page. I believe the for (;;) loop is there primarily to retry in the case of memory pressure keeping grow_buffers from succeeding. However, we also continue to loop for other cases, like the block lying beond the end of the disk. So, the fix I came up with is to only loop when grow_buffers fails due to memory allocation issues (return value of 0). The attached patch was tested by myself, Torsten, and Rich, and was found to resolve the problem in call cases. Signed-off-by: Jeff Moyer Reported-and-Tested-by: Torsten Hilbrich Tested-by: Richard W.M. Jones Reviewed-by: Josh Boyer [ Jens is on vacation, taking this directly - Linus ] Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- fs/buffer.c | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-) (limited to 'fs') diff --git a/fs/buffer.c b/fs/buffer.c index 330cbce..d421626 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -1084,6 +1084,9 @@ grow_buffers(struct block_device *bdev, sector_t block, int size) static struct buffer_head * __getblk_slow(struct block_device *bdev, sector_t block, int size) { + int ret; + struct buffer_head *bh; + /* Size must be multiple of hard sectorsize */ if (unlikely(size & (bdev_logical_block_size(bdev)-1) || (size < 512 || size > PAGE_SIZE))) { @@ -1096,20 +1099,21 @@ __getblk_slow(struct block_device *bdev, sector_t block, int size) return NULL; } - for (;;) { - struct buffer_head * bh; - int ret; +retry: + bh = __find_get_block(bdev, block, size); + if (bh) + return bh; + ret = grow_buffers(bdev, block, size); + if (ret == 0) { + free_more_memory(); + goto retry; + } else if (ret > 0) { bh = __find_get_block(bdev, block, size); if (bh) return bh; - - ret = grow_buffers(bdev, block, size); - if (ret < 0) - return NULL; - if (ret == 0) - free_more_memory(); } + return NULL; } /* -- cgit v1.1 From 7d50b51a460072da28e45f2c6ebd6f85af34f2c4 Mon Sep 17 00:00:00 2001 From: Anders Kaseorg Date: Sun, 15 Jul 2012 17:14:25 -0400 Subject: fifo: Do not restart open() if it already found a partner MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 05d290d66be6ef77a0b962ebecf01911bd984a78 upstream. If a parent and child process open the two ends of a fifo, and the child immediately exits, the parent may receive a SIGCHLD before its open() returns. In that case, we need to make sure that open() will return successfully after the SIGCHLD handler returns, instead of throwing EINTR or being restarted. Otherwise, the restarted open() would incorrectly wait for a second partner on the other end. The following test demonstrates the EINTR that was wrongly thrown from the parent’s open(). Change .sa_flags = 0 to .sa_flags = SA_RESTART to see a deadlock instead, in which the restarted open() waits for a second reader that will never come. (On my systems, this happens pretty reliably within about 5 to 500 iterations. Others report that it manages to loop ~forever sometimes; YMMV.) #include #include #include #include #include #include #include #include #define CHECK(x) do if ((x) == -1) {perror(#x); abort();} while(0) void handler(int signum) {} int main() { struct sigaction act = {.sa_handler = handler, .sa_flags = 0}; CHECK(sigaction(SIGCHLD, &act, NULL)); CHECK(mknod("fifo", S_IFIFO | S_IRWXU, 0)); for (;;) { int fd; pid_t pid; putc('.', stderr); CHECK(pid = fork()); if (pid == 0) { CHECK(fd = open("fifo", O_RDONLY)); _exit(0); } CHECK(fd = open("fifo", O_WRONLY)); CHECK(close(fd)); CHECK(waitpid(pid, NULL, 0)); } } This is what I suspect was causing the Git test suite to fail in t9010-svn-fe.sh: http://bugs.debian.org/678852 Signed-off-by: Anders Kaseorg Reviewed-by: Jonathan Nieder Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- fs/fifo.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) (limited to 'fs') diff --git a/fs/fifo.c b/fs/fifo.c index b1a524d..cf6f434 100644 --- a/fs/fifo.c +++ b/fs/fifo.c @@ -14,7 +14,7 @@ #include #include -static void wait_for_partner(struct inode* inode, unsigned int *cnt) +static int wait_for_partner(struct inode* inode, unsigned int *cnt) { int cur = *cnt; @@ -23,6 +23,7 @@ static void wait_for_partner(struct inode* inode, unsigned int *cnt) if (signal_pending(current)) break; } + return cur == *cnt ? -ERESTARTSYS : 0; } static void wake_up_partner(struct inode* inode) @@ -67,8 +68,7 @@ static int fifo_open(struct inode *inode, struct file *filp) * seen a writer */ filp->f_version = pipe->w_counter; } else { - wait_for_partner(inode, &pipe->w_counter); - if(signal_pending(current)) + if (wait_for_partner(inode, &pipe->w_counter)) goto err_rd; } } @@ -90,8 +90,7 @@ static int fifo_open(struct inode *inode, struct file *filp) wake_up_partner(inode); if (!pipe->readers) { - wait_for_partner(inode, &pipe->r_counter); - if (signal_pending(current)) + if (wait_for_partner(inode, &pipe->r_counter)) goto err_wr; } break; -- cgit v1.1 From adccea444c2df5660fff32fe75563075b7d237f7 Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Fri, 6 Jul 2012 07:09:42 -0400 Subject: cifs: always update the inode cache with the results from a FIND_* commit cd60042cc1392e79410dc8de9e9c1abb38a29e57 upstream. When we get back a FIND_FIRST/NEXT result, we have some info about the dentry that we use to instantiate a new inode. We were ignoring and discarding that info when we had an existing dentry in the cache. Fix this by updating the inode in place when we find an existing dentry and the uniqueid is the same. Reported-and-Tested-by: Andrew Bartlett Reported-by: Bill Robertson Reported-by: Dion Edwards Signed-off-by: Jeff Layton Signed-off-by: Steve French Signed-off-by: Greg Kroah-Hartman --- fs/cifs/readdir.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) (limited to 'fs') diff --git a/fs/cifs/readdir.c b/fs/cifs/readdir.c index 6751e74..c71032b 100644 --- a/fs/cifs/readdir.c +++ b/fs/cifs/readdir.c @@ -85,9 +85,12 @@ cifs_readdir_lookup(struct dentry *parent, struct qstr *name, dentry = d_lookup(parent, name); if (dentry) { - /* FIXME: check for inode number changes? */ - if (dentry->d_inode != NULL) + inode = dentry->d_inode; + /* update inode in place if i_ino didn't change */ + if (inode && CIFS_I(inode)->uniqueid == fattr->cf_uniqueid) { + cifs_fattr_to_inode(inode, fattr); return dentry; + } d_drop(dentry); dput(dentry); } -- cgit v1.1 From cd050f56481c26e8f2a1d2fc89188d6c92537545 Mon Sep 17 00:00:00 2001 From: Artem Bityutskiy Date: Sat, 14 Jul 2012 14:33:09 +0300 Subject: UBIFS: fix a bug in empty space fix-up commit c6727932cfdb13501108b16c38463c09d5ec7a74 upstream. UBIFS has a feature called "empty space fix-up" which is a quirk to work-around limitations of dumb flasher programs. Namely, of those flashers that are unable to skip NAND pages full of 0xFFs while flashing, resulting in empty space at the end of half-filled eraseblocks to be unusable for UBIFS. This feature is relatively new (introduced in v3.0). The fix-up routine (fixup_free_space()) is executed only once at the very first mount if the superblock has the 'space_fixup' flag set (can be done with -F option of mkfs.ubifs). It basically reads all the UBIFS data and metadata and writes it back to the same LEB. The routine assumes the image is pristine and does not have anything in the journal. There was a bug in 'fixup_free_space()' where it fixed up the log incorrectly. All but one LEB of the log of a pristine file-system are empty. And one contains just a commit start node. And 'fixup_free_space()' just unmapped this LEB, which resulted in wiping the commit start node. As a result, some users were unable to mount the file-system next time with the following symptom: UBIFS error (pid 1): replay_log_leb: first log node at LEB 3:0 is not CS node UBIFS error (pid 1): replay_log_leb: log error detected while replaying the log at LEB 3:0 The root-cause of this bug was that 'fixup_free_space()' wrongly assumed that the beginning of empty space in the log head (c->lhead_offs) was known on mount. However, it is not the case - it was always 0. UBIFS does not store in it the master node and finds out by scanning the log on every mount. The fix is simple - just pass commit start node size instead of 0 to 'fixup_leb()'. Signed-off-by: Artem Bityutskiy Reported-by: Iwo Mergler Tested-by: Iwo Mergler Reported-by: James Nute Signed-off-by: Greg Kroah-Hartman --- fs/ubifs/sb.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) (limited to 'fs') diff --git a/fs/ubifs/sb.c b/fs/ubifs/sb.c index c606f01..1250016 100644 --- a/fs/ubifs/sb.c +++ b/fs/ubifs/sb.c @@ -715,8 +715,12 @@ static int fixup_free_space(struct ubifs_info *c) lnum = ubifs_next_log_lnum(c, lnum); } - /* Fixup the current log head */ - err = fixup_leb(c, c->lhead_lnum, c->lhead_offs); + /* + * Fixup the log head which contains the only a CS node at the + * beginning. + */ + err = fixup_leb(c, c->lhead_lnum, + ALIGN(UBIFS_CS_NODE_SZ, c->min_io_size)); if (err) goto out; -- cgit v1.1 From 397d9c507ff1c9c5afc80c80ee245c2455d6a1db Mon Sep 17 00:00:00 2001 From: Mel Gorman Date: Thu, 12 Jan 2012 17:19:34 -0800 Subject: mm: compaction: determine if dirty pages can be migrated without blocking within ->migratepage commit b969c4ab9f182a6e1b2a0848be349f99714947b0 upstream. Stable note: Not tracked in Bugzilla. A fix aimed at preserving page aging information by reducing LRU list churning had the side-effect of reducing THP allocation success rates. This was part of a series to restore the success rates while preserving the reclaim fix. Asynchronous compaction is used when allocating transparent hugepages to avoid blocking for long periods of time. Due to reports of stalling, there was a debate on disabling synchronous compaction but this severely impacted allocation success rates. Part of the reason was that many dirty pages are skipped in asynchronous compaction by the following check; if (PageDirty(page) && !sync && mapping->a_ops->migratepage != migrate_page) rc = -EBUSY; This skips over all mapping aops using buffer_migrate_page() even though it is possible to migrate some of these pages without blocking. This patch updates the ->migratepage callback with a "sync" parameter. It is the responsibility of the callback to fail gracefully if migration would block. Signed-off-by: Mel Gorman Reviewed-by: Rik van Riel Cc: Andrea Arcangeli Cc: Minchan Kim Cc: Dave Jones Cc: Jan Kara Cc: Andy Isaacson Cc: Nai Xia Cc: Johannes Weiner Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Mel Gorman Signed-off-by: Greg Kroah-Hartman --- fs/btrfs/disk-io.c | 4 ++-- fs/hugetlbfs/inode.c | 3 ++- fs/nfs/internal.h | 2 +- fs/nfs/write.c | 4 ++-- 4 files changed, 7 insertions(+), 6 deletions(-) (limited to 'fs') diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 1ac8db5d..522cb2a 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -801,7 +801,7 @@ static int btree_submit_bio_hook(struct inode *inode, int rw, struct bio *bio, #ifdef CONFIG_MIGRATION static int btree_migratepage(struct address_space *mapping, - struct page *newpage, struct page *page) + struct page *newpage, struct page *page, bool sync) { /* * we can't safely write a btree page from here, @@ -816,7 +816,7 @@ static int btree_migratepage(struct address_space *mapping, if (page_has_private(page) && !try_to_release_page(page, GFP_KERNEL)) return -EAGAIN; - return migrate_page(mapping, newpage, page); + return migrate_page(mapping, newpage, page, sync); } #endif diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 8b0c875..6ca608b 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -568,7 +568,8 @@ static int hugetlbfs_set_page_dirty(struct page *page) } static int hugetlbfs_migrate_page(struct address_space *mapping, - struct page *newpage, struct page *page) + struct page *newpage, struct page *page, + bool sync) { int rc; diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h index 2a55347..a74442a 100644 --- a/fs/nfs/internal.h +++ b/fs/nfs/internal.h @@ -315,7 +315,7 @@ void nfs_commit_release_pages(struct nfs_write_data *data); #ifdef CONFIG_MIGRATION extern int nfs_migrate_page(struct address_space *, - struct page *, struct page *); + struct page *, struct page *, bool); #else #define nfs_migrate_page NULL #endif diff --git a/fs/nfs/write.c b/fs/nfs/write.c index f2f80c0..22a48fd 100644 --- a/fs/nfs/write.c +++ b/fs/nfs/write.c @@ -1662,7 +1662,7 @@ out_error: #ifdef CONFIG_MIGRATION int nfs_migrate_page(struct address_space *mapping, struct page *newpage, - struct page *page) + struct page *page, bool sync) { /* * If PagePrivate is set, then the page is currently associated with @@ -1677,7 +1677,7 @@ int nfs_migrate_page(struct address_space *mapping, struct page *newpage, nfs_fscache_release_page(page, GFP_KERNEL); - return migrate_page(mapping, newpage, page); + return migrate_page(mapping, newpage, page, sync); } #endif -- cgit v1.1 From f869774c37710ef2b773d167d184b9936988d07f Mon Sep 17 00:00:00 2001 From: Mel Gorman Date: Thu, 12 Jan 2012 17:19:43 -0800 Subject: mm: compaction: introduce sync-light migration for use by compaction commit a6bc32b899223a877f595ef9ddc1e89ead5072b8 upstream. Stable note: Not tracked in Buzilla. This was part of a series that reduced interactivity stalls experienced when THP was enabled. These stalls were particularly noticable when copying data to a USB stick but the experiences for users varied a lot. This patch adds a lightweight sync migrate operation MIGRATE_SYNC_LIGHT mode that avoids writing back pages to backing storage. Async compaction maps to MIGRATE_ASYNC while sync compaction maps to MIGRATE_SYNC_LIGHT. For other migrate_pages users such as memory hotplug, MIGRATE_SYNC is used. This avoids sync compaction stalling for an excessive length of time, particularly when copying files to a USB stick where there might be a large number of dirty pages backed by a filesystem that does not support ->writepages. [aarcange@redhat.com: This patch is heavily based on Andrea's work] [akpm@linux-foundation.org: fix fs/nfs/write.c build] [akpm@linux-foundation.org: fix fs/btrfs/disk-io.c build] Signed-off-by: Mel Gorman Reviewed-by: Rik van Riel Cc: Andrea Arcangeli Cc: Minchan Kim Cc: Dave Jones Cc: Jan Kara Cc: Andy Isaacson Cc: Nai Xia Cc: Johannes Weiner Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Mel Gorman Signed-off-by: Greg Kroah-Hartman --- fs/btrfs/disk-io.c | 5 +++-- fs/hugetlbfs/inode.c | 2 +- fs/nfs/internal.h | 2 +- fs/nfs/write.c | 4 ++-- 4 files changed, 7 insertions(+), 6 deletions(-) (limited to 'fs') diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 522cb2a..57106a9 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -801,7 +801,8 @@ static int btree_submit_bio_hook(struct inode *inode, int rw, struct bio *bio, #ifdef CONFIG_MIGRATION static int btree_migratepage(struct address_space *mapping, - struct page *newpage, struct page *page, bool sync) + struct page *newpage, struct page *page, + enum migrate_mode mode) { /* * we can't safely write a btree page from here, @@ -816,7 +817,7 @@ static int btree_migratepage(struct address_space *mapping, if (page_has_private(page) && !try_to_release_page(page, GFP_KERNEL)) return -EAGAIN; - return migrate_page(mapping, newpage, page, sync); + return migrate_page(mapping, newpage, page, mode); } #endif diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 6ca608b..6327a06 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -569,7 +569,7 @@ static int hugetlbfs_set_page_dirty(struct page *page) static int hugetlbfs_migrate_page(struct address_space *mapping, struct page *newpage, struct page *page, - bool sync) + enum migrate_mode mode) { int rc; diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h index a74442a..4f10d81 100644 --- a/fs/nfs/internal.h +++ b/fs/nfs/internal.h @@ -315,7 +315,7 @@ void nfs_commit_release_pages(struct nfs_write_data *data); #ifdef CONFIG_MIGRATION extern int nfs_migrate_page(struct address_space *, - struct page *, struct page *, bool); + struct page *, struct page *, enum migrate_mode); #else #define nfs_migrate_page NULL #endif diff --git a/fs/nfs/write.c b/fs/nfs/write.c index 22a48fd..58bb999 100644 --- a/fs/nfs/write.c +++ b/fs/nfs/write.c @@ -1662,7 +1662,7 @@ out_error: #ifdef CONFIG_MIGRATION int nfs_migrate_page(struct address_space *mapping, struct page *newpage, - struct page *page, bool sync) + struct page *page, enum migrate_mode mode) { /* * If PagePrivate is set, then the page is currently associated with @@ -1677,7 +1677,7 @@ int nfs_migrate_page(struct address_space *mapping, struct page *newpage, nfs_fscache_release_page(page, GFP_KERNEL); - return migrate_page(mapping, newpage, page, sync); + return migrate_page(mapping, newpage, page, mode); } #endif -- cgit v1.1 From dc525df9895f810ba777feec1540c7b822512c04 Mon Sep 17 00:00:00 2001 From: "J. Bruce Fields" Date: Mon, 23 Jul 2012 15:17:17 -0400 Subject: locks: fix checking of fcntl_setlease argument commit 0ec4f431eb56d633da3a55da67d5c4b88886ccc7 upstream. The only checks of the long argument passed to fcntl(fd,F_SETLEASE,.) are done after converting the long to an int. Thus some illegal values may be let through and cause problems in later code. [ They actually *don't* cause problems in mainline, as of Dave Jones's commit 8d657eb3b438 "Remove easily user-triggerable BUG from generic_setlease", but we should fix this anyway. And this patch will be necessary to fix real bugs on earlier kernels. ] Signed-off-by: J. Bruce Fields Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- fs/locks.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'fs') diff --git a/fs/locks.c b/fs/locks.c index b286539..35388d5 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -315,7 +315,7 @@ static int flock_make_lock(struct file *filp, struct file_lock **lock, return 0; } -static int assign_type(struct file_lock *fl, int type) +static int assign_type(struct file_lock *fl, long type) { switch (type) { case F_RDLCK: @@ -452,7 +452,7 @@ static const struct lock_manager_operations lease_manager_ops = { /* * Initialize a lease, use the default lock manager operations */ -static int lease_init(struct file *filp, int type, struct file_lock *fl) +static int lease_init(struct file *filp, long type, struct file_lock *fl) { if (assign_type(fl, type) != 0) return -EINVAL; @@ -470,7 +470,7 @@ static int lease_init(struct file *filp, int type, struct file_lock *fl) } /* Allocate a file_lock initialised to this type of lease */ -static struct file_lock *lease_alloc(struct file *filp, int type) +static struct file_lock *lease_alloc(struct file *filp, long type) { struct file_lock *fl = locks_alloc_lock(); int error = -ENOMEM; -- cgit v1.1 From 4ffd3692dd84a5979310c5e86bcf942e4b46c9ce Mon Sep 17 00:00:00 2001 From: Chris Mason Date: Wed, 25 Jul 2012 15:57:13 -0400 Subject: Btrfs: call the ordered free operation without any locks held commit e9fbcb42201c862fd6ab45c48ead4f47bb2dea9d upstream. Each ordered operation has a free callback, and this was called with the worker spinlock held. Josef made the free callback also call iput, which we can't do with the spinlock. This drops the spinlock for the free operation and grabs it again before moving through the rest of the list. We'll circle back around to this and find a cleaner way that doesn't bounce the lock around so much. Signed-off-by: Chris Mason Signed-off-by: Greg Kroah-Hartman --- fs/btrfs/async-thread.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c index 7ec1409..8006a28 100644 --- a/fs/btrfs/async-thread.c +++ b/fs/btrfs/async-thread.c @@ -212,10 +212,17 @@ static noinline int run_ordered_completions(struct btrfs_workers *workers, work->ordered_func(work); - /* now take the lock again and call the freeing code */ + /* now take the lock again and drop our item from the list */ spin_lock(&workers->order_lock); list_del(&work->order_list); + spin_unlock(&workers->order_lock); + + /* + * we don't want to call the ordered free functions + * with the lock held though + */ work->ordered_free(work); + spin_lock(&workers->order_lock); } spin_unlock(&workers->order_lock); -- cgit v1.1 From 9d0ed6ec043bb8cbb895d13ce780878e2e50df80 Mon Sep 17 00:00:00 2001 From: "J. Bruce Fields" Date: Tue, 5 Jun 2012 16:52:06 -0400 Subject: nfsd4: our filesystems are normally case sensitive commit 2930d381d22b9c56f40dd4c63a8fa59719ca2c3c upstream. Actually, xfs and jfs can optionally be case insensitive; we'll handle that case in later patches. Signed-off-by: J. Bruce Fields Signed-off-by: Greg Kroah-Hartman --- fs/nfsd/nfs4xdr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 6c74097..f91d589 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -2010,7 +2010,7 @@ out_acl: if (bmval0 & FATTR4_WORD0_CASE_INSENSITIVE) { if ((buflen -= 4) < 0) goto out_resource; - WRITE32(1); + WRITE32(0); } if (bmval0 & FATTR4_WORD0_CASE_PRESERVING) { if ((buflen -= 4) < 0) -- cgit v1.1 From eb65b85e1bce06b665b3568c19a249cd886fa6ff Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Mon, 23 Jul 2012 13:58:51 -0400 Subject: nfs: skip commit in releasepage if we're freeing memory for fs-related reasons commit 5cf02d09b50b1ee1c2d536c9cf64af5a7d433f56 upstream. We've had some reports of a deadlock where rpciod ends up with a stack trace like this: PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14" #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9 #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs] #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8 #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs] #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs] #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670 #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271 #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638 #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942 #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9 #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808 #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6 #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7 #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc] #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc] #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0 #24 [ffff8810343bfee8] kthread at ffffffff8108dd96 #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca rpciod is trying to allocate memory for a new socket to talk to the server. The VM ends up calling ->releasepage to get more memory, and it tries to do a blocking commit. That commit can't succeed however without a connected socket, so we deadlock. Fix this by setting PF_FSTRANS on the workqueue task prior to doing the socket allocation, and having nfs_release_page check for that flag when deciding whether to do a commit call. Also, set PF_FSTRANS unconditionally in rpc_async_schedule since that function can also do allocations sometimes. Signed-off-by: Jeff Layton Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman --- fs/nfs/file.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) (limited to 'fs') diff --git a/fs/nfs/file.c b/fs/nfs/file.c index dd2f130..6c6e2c4 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -493,8 +493,11 @@ static int nfs_release_page(struct page *page, gfp_t gfp) dfprintk(PAGECACHE, "NFS: release_page(%p)\n", page); - /* Only do I/O if gfp is a superset of GFP_KERNEL */ - if (mapping && (gfp & GFP_KERNEL) == GFP_KERNEL) { + /* Only do I/O if gfp is a superset of GFP_KERNEL, and we're not + * doing this memory reclaim for a fs-related allocation. + */ + if (mapping && (gfp & GFP_KERNEL) == GFP_KERNEL && + !(current->flags & PF_FSTRANS)) { int how = FLUSH_SYNC; /* Don't let kswapd deadlock waiting for OOM RPC calls */ -- cgit v1.1 From 6ff2c41b81bd0778aa44ffcfce0ea623fa660887 Mon Sep 17 00:00:00 2001 From: Theodore Ts'o Date: Sat, 30 Jun 2012 19:14:57 -0400 Subject: ext4: pass a char * to ext4_count_free() instead of a buffer_head ptr commit f6fb99cadcd44660c68e13f6eab28333653621e6 upstream. Make it possible for ext4_count_free to operate on buffers and not just data in buffer_heads. Signed-off-by: "Theodore Ts'o" Signed-off-by: Greg Kroah-Hartman --- fs/ext4/balloc.c | 3 ++- fs/ext4/bitmap.c | 8 +++----- fs/ext4/ext4.h | 2 +- fs/ext4/ialloc.c | 3 ++- 4 files changed, 8 insertions(+), 8 deletions(-) (limited to 'fs') diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c index 264f694..ebe95f5 100644 --- a/fs/ext4/balloc.c +++ b/fs/ext4/balloc.c @@ -514,7 +514,8 @@ ext4_fsblk_t ext4_count_free_blocks(struct super_block *sb) if (bitmap_bh == NULL) continue; - x = ext4_count_free(bitmap_bh, sb->s_blocksize); + x = ext4_count_free(bitmap_bh->b_data, + EXT4_BLOCKS_PER_GROUP(sb) / 8); printk(KERN_DEBUG "group %u: stored = %d, counted = %u\n", i, ext4_free_blks_count(sb, gdp), x); bitmap_count += x; diff --git a/fs/ext4/bitmap.c b/fs/ext4/bitmap.c index fa3af81..012faaa 100644 --- a/fs/ext4/bitmap.c +++ b/fs/ext4/bitmap.c @@ -15,15 +15,13 @@ static const int nibblemap[] = {4, 3, 3, 2, 3, 2, 2, 1, 3, 2, 2, 1, 2, 1, 1, 0}; -unsigned int ext4_count_free(struct buffer_head *map, unsigned int numchars) +unsigned int ext4_count_free(char *bitmap, unsigned int numchars) { unsigned int i, sum = 0; - if (!map) - return 0; for (i = 0; i < numchars; i++) - sum += nibblemap[map->b_data[i] & 0xf] + - nibblemap[(map->b_data[i] >> 4) & 0xf]; + sum += nibblemap[bitmap[i] & 0xf] + + nibblemap[(bitmap[i] >> 4) & 0xf]; return sum; } diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 1a34c1c..e0113aa 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1713,7 +1713,7 @@ struct mmpd_data { # define NORET_AND noreturn, /* bitmap.c */ -extern unsigned int ext4_count_free(struct buffer_head *, unsigned); +extern unsigned int ext4_count_free(char *bitmap, unsigned numchars); /* balloc.c */ extern unsigned int ext4_block_group(struct super_block *sb, diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c index 412469b..29272de 100644 --- a/fs/ext4/ialloc.c +++ b/fs/ext4/ialloc.c @@ -1193,7 +1193,8 @@ unsigned long ext4_count_free_inodes(struct super_block *sb) if (!bitmap_bh) continue; - x = ext4_count_free(bitmap_bh, EXT4_INODES_PER_GROUP(sb) / 8); + x = ext4_count_free(bitmap_bh->b_data, + EXT4_INODES_PER_GROUP(sb) / 8); printk(KERN_DEBUG "group %lu: stored = %d, counted = %lu\n", (unsigned long) i, ext4_free_inodes_count(sb, gdp), x); bitmap_count += x; -- cgit v1.1 From b4cbf953e071121a4082e35c9820c3c3c10aeb55 Mon Sep 17 00:00:00 2001 From: Brian Foster Date: Sun, 22 Jul 2012 23:59:40 -0400 Subject: ext4: don't let i_reserved_meta_blocks go negative commit 97795d2a5b8d3c8dc4365d4bd3404191840453ba upstream. If we hit a condition where we have allocated metadata blocks that were not appropriately reserved, we risk underflow of ei->i_reserved_meta_blocks. In turn, this can throw sbi->s_dirtyclusters_counter significantly out of whack and undermine the nondelalloc fallback logic in ext4_nonda_switch(). Warn if this occurs and set i_allocated_meta_blocks to avoid this problem. This condition is reproduced by xfstests 270 against ext2 with delalloc enabled: Mar 28 08:58:02 localhost kernel: [ 171.526344] EXT4-fs (loop1): delayed block allocation failed for inode 14 at logical offset 64486 with max blocks 64 with error -28 Mar 28 08:58:02 localhost kernel: [ 171.526346] EXT4-fs (loop1): This should not happen!! Data will be lost 270 ultimately fails with an inconsistent filesystem and requires an fsck to repair. The cause of the error is an underflow in ext4_da_update_reserve_space() due to an unreserved meta block allocation. Signed-off-by: Brian Foster Signed-off-by: "Theodore Ts'o" Signed-off-by: Greg Kroah-Hartman --- fs/ext4/inode.c | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'fs') diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index c1e6a72..18fee6d 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -1134,6 +1134,15 @@ void ext4_da_update_reserve_space(struct inode *inode, used = ei->i_reserved_data_blocks; } + if (unlikely(ei->i_allocated_meta_blocks > ei->i_reserved_meta_blocks)) { + ext4_msg(inode->i_sb, KERN_NOTICE, "%s: ino %lu, allocated %d " + "with only %d reserved metadata blocks\n", __func__, + inode->i_ino, ei->i_allocated_meta_blocks, + ei->i_reserved_meta_blocks); + WARN_ON(1); + ei->i_allocated_meta_blocks = ei->i_reserved_meta_blocks; + } + /* Update per-inode reservations */ ei->i_reserved_data_blocks -= used; ei->i_reserved_meta_blocks -= ei->i_allocated_meta_blocks; -- cgit v1.1 From 85e937dcf1cc31e7f649c200ac3c3def01b54766 Mon Sep 17 00:00:00 2001 From: Ryusuke Konishi Date: Mon, 30 Jul 2012 14:42:07 -0700 Subject: nilfs2: fix deadlock issue between chcp and thaw ioctls commit 572d8b3945a31bee7c40d21556803e4807fd9141 upstream. An fs-thaw ioctl causes deadlock with a chcp or mkcp -s command: chcp D ffff88013870f3d0 0 1325 1324 0x00000004 ... Call Trace: nilfs_transaction_begin+0x11c/0x1a0 [nilfs2] wake_up_bit+0x20/0x20 copy_from_user+0x18/0x30 [nilfs2] nilfs_ioctl_change_cpmode+0x7d/0xcf [nilfs2] nilfs_ioctl+0x252/0x61a [nilfs2] do_page_fault+0x311/0x34c get_unmapped_area+0x132/0x14e do_vfs_ioctl+0x44b/0x490 __set_task_blocked+0x5a/0x61 vm_mmap_pgoff+0x76/0x87 __set_current_blocked+0x30/0x4a sys_ioctl+0x4b/0x6f system_call_fastpath+0x16/0x1b thaw D ffff88013870d890 0 1352 1351 0x00000004 ... Call Trace: rwsem_down_failed_common+0xdb/0x10f call_rwsem_down_write_failed+0x13/0x20 down_write+0x25/0x27 thaw_super+0x13/0x9e do_vfs_ioctl+0x1f5/0x490 vm_mmap_pgoff+0x76/0x87 sys_ioctl+0x4b/0x6f filp_close+0x64/0x6c system_call_fastpath+0x16/0x1b where the thaw ioctl deadlocked at thaw_super() when called while chcp was waiting at nilfs_transaction_begin() called from nilfs_ioctl_change_cpmode(). This deadlock is 100% reproducible. This is because nilfs_ioctl_change_cpmode() first locks sb->s_umount in read mode and then waits for unfreezing in nilfs_transaction_begin(), whereas thaw_super() locks sb->s_umount in write mode. The locking of sb->s_umount here was intended to make snapshot mounts and the downgrade of snapshots to checkpoints exclusive. This fixes the deadlock issue by replacing the sb->s_umount usage in nilfs_ioctl_change_cpmode() with a dedicated mutex which protects snapshot mounts. Signed-off-by: Ryusuke Konishi Cc: Fernando Luis Vazquez Cao Tested-by: Ryusuke Konishi Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- fs/nilfs2/ioctl.c | 4 ++-- fs/nilfs2/super.c | 3 +++ fs/nilfs2/the_nilfs.c | 1 + fs/nilfs2/the_nilfs.h | 2 ++ 4 files changed, 8 insertions(+), 2 deletions(-) (limited to 'fs') diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c index 3e65427..0d1c9bd 100644 --- a/fs/nilfs2/ioctl.c +++ b/fs/nilfs2/ioctl.c @@ -182,7 +182,7 @@ static int nilfs_ioctl_change_cpmode(struct inode *inode, struct file *filp, if (copy_from_user(&cpmode, argp, sizeof(cpmode))) goto out; - down_read(&inode->i_sb->s_umount); + mutex_lock(&nilfs->ns_snapshot_mount_mutex); nilfs_transaction_begin(inode->i_sb, &ti, 0); ret = nilfs_cpfile_change_cpmode( @@ -192,7 +192,7 @@ static int nilfs_ioctl_change_cpmode(struct inode *inode, struct file *filp, else nilfs_transaction_commit(inode->i_sb); /* never fails */ - up_read(&inode->i_sb->s_umount); + mutex_unlock(&nilfs->ns_snapshot_mount_mutex); out: mnt_drop_write(filp->f_path.mnt); return ret; diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c index 8351c44..97bfbdd 100644 --- a/fs/nilfs2/super.c +++ b/fs/nilfs2/super.c @@ -951,6 +951,8 @@ static int nilfs_attach_snapshot(struct super_block *s, __u64 cno, struct nilfs_root *root; int ret; + mutex_lock(&nilfs->ns_snapshot_mount_mutex); + down_read(&nilfs->ns_segctor_sem); ret = nilfs_cpfile_is_snapshot(nilfs->ns_cpfile, cno); up_read(&nilfs->ns_segctor_sem); @@ -975,6 +977,7 @@ static int nilfs_attach_snapshot(struct super_block *s, __u64 cno, ret = nilfs_get_root_dentry(s, root, root_dentry); nilfs_put_root(root); out: + mutex_unlock(&nilfs->ns_snapshot_mount_mutex); return ret; } diff --git a/fs/nilfs2/the_nilfs.c b/fs/nilfs2/the_nilfs.c index 35a8970..1c98f53 100644 --- a/fs/nilfs2/the_nilfs.c +++ b/fs/nilfs2/the_nilfs.c @@ -76,6 +76,7 @@ struct the_nilfs *alloc_nilfs(struct block_device *bdev) nilfs->ns_bdev = bdev; atomic_set(&nilfs->ns_ndirtyblks, 0); init_rwsem(&nilfs->ns_sem); + mutex_init(&nilfs->ns_snapshot_mount_mutex); INIT_LIST_HEAD(&nilfs->ns_dirty_files); INIT_LIST_HEAD(&nilfs->ns_gc_inodes); spin_lock_init(&nilfs->ns_inode_lock); diff --git a/fs/nilfs2/the_nilfs.h b/fs/nilfs2/the_nilfs.h index 9992b11..de7435f 100644 --- a/fs/nilfs2/the_nilfs.h +++ b/fs/nilfs2/the_nilfs.h @@ -47,6 +47,7 @@ enum { * @ns_flags: flags * @ns_bdev: block device * @ns_sem: semaphore for shared states + * @ns_snapshot_mount_mutex: mutex to protect snapshot mounts * @ns_sbh: buffer heads of on-disk super blocks * @ns_sbp: pointers to super block data * @ns_sbwtime: previous write time of super block @@ -99,6 +100,7 @@ struct the_nilfs { struct block_device *ns_bdev; struct rw_semaphore ns_sem; + struct mutex ns_snapshot_mount_mutex; /* * used for -- cgit v1.1 From bd697182ee0264c6a6c9ac108d5d765e24b82724 Mon Sep 17 00:00:00 2001 From: Zach Brown Date: Tue, 24 Jul 2012 12:10:11 -0700 Subject: fuse: verify all ioctl retry iov elements commit fb6ccff667712c46b4501b920ea73a326e49626a upstream. Commit 7572777eef78ebdee1ecb7c258c0ef94d35bad16 attempted to verify that the total iovec from the client doesn't overflow iov_length() but it only checked the first element. The iovec could still overflow by starting with a small element. The obvious fix is to check all the elements. The overflow case doesn't look dangerous to the kernel as the copy is limited by the length after the overflow. This fix restores the intention of returning an error instead of successfully copying less than the iovec represented. I found this by code inspection. I built it but don't have a test case. I'm cc:ing stable because the initial commit did as well. Signed-off-by: Zach Brown Signed-off-by: Miklos Szeredi Signed-off-by: Greg Kroah-Hartman --- fs/fuse/file.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 82a6646..79fca8d 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -1710,7 +1710,7 @@ static int fuse_verify_ioctl_iov(struct iovec *iov, size_t count) size_t n; u32 max = FUSE_MAX_PAGES_PER_REQ << PAGE_SHIFT; - for (n = 0; n < count; n++) { + for (n = 0; n < count; n++, iov++) { if (iov->iov_len > (size_t) max) return -ENOMEM; max -= iov->iov_len; -- cgit v1.1 From b1aa47aec9ca0de65c67d811c291153503972a08 Mon Sep 17 00:00:00 2001 From: Theodore Ts'o Date: Sun, 5 Aug 2012 23:28:16 -0400 Subject: ext4: avoid kmemcheck complaint from reading uninitialized memory commit 7e731bc9a12339f344cddf82166b82633d99dd86 upstream. Commit 03179fe923 introduced a kmemcheck complaint in ext4_da_get_block_prep() because we save and restore ei->i_da_metadata_calc_last_lblock even though it is left uninitialized in the case where i_da_metadata_calc_len is zero. This doesn't hurt anything, but silencing the kmemcheck complaint makes it easier for people to find real bugs. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=45631 (which is marked as a regression). Signed-off-by: "Theodore Ts'o" Signed-off-by: Greg Kroah-Hartman --- fs/ext4/super.c | 1 + 1 file changed, 1 insertion(+) (limited to 'fs') diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 113b107..489d406 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -860,6 +860,7 @@ static struct inode *ext4_alloc_inode(struct super_block *sb) ei->i_reserved_meta_blocks = 0; ei->i_allocated_meta_blocks = 0; ei->i_da_metadata_calc_len = 0; + ei->i_da_metadata_calc_last_lblock = 0; spin_lock_init(&(ei->i_block_reservation_lock)); #ifdef CONFIG_QUOTA ei->i_reserved_quota = 0; -- cgit v1.1 From 57dba9b60a1ad31042a8e90fafe2a58fba789177 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Mon, 20 Aug 2012 15:28:00 +0100 Subject: vfs: missed source of ->f_pos races commit 0e665d5d1125f9f4ccff56a75e814f10f88861a2 upstream. compat_sys_{read,write}v() need the same "pass a copy of file->f_pos" thing as sys_{read,write}{,v}(). Signed-off-by: Al Viro Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- fs/compat.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) (limited to 'fs') diff --git a/fs/compat.c b/fs/compat.c index 0ea0083..e5358c2 100644 --- a/fs/compat.c +++ b/fs/compat.c @@ -1177,11 +1177,14 @@ compat_sys_readv(unsigned long fd, const struct compat_iovec __user *vec, struct file *file; int fput_needed; ssize_t ret; + loff_t pos; file = fget_light(fd, &fput_needed); if (!file) return -EBADF; - ret = compat_readv(file, vec, vlen, &file->f_pos); + pos = file->f_pos; + ret = compat_readv(file, vec, vlen, &pos); + file->f_pos = pos; fput_light(file, fput_needed); return ret; } @@ -1236,11 +1239,14 @@ compat_sys_writev(unsigned long fd, const struct compat_iovec __user *vec, struct file *file; int fput_needed; ssize_t ret; + loff_t pos; file = fget_light(fd, &fput_needed); if (!file) return -EBADF; - ret = compat_writev(file, vec, vlen, &file->f_pos); + pos = file->f_pos; + ret = compat_writev(file, vec, vlen, &pos); + file->f_pos = pos; fput_light(file, fput_needed); return ret; } -- cgit v1.1 From 3db5984ef1f9ca0f47be7751a26ce776eba20b1b Mon Sep 17 00:00:00 2001 From: Miklos Szeredi Date: Wed, 15 Aug 2012 13:01:24 +0200 Subject: vfs: canonicalize create mode in build_open_flags() commit e68726ff72cf7ba5e7d789857fcd9a75ca573f03 upstream. Userspace can pass weird create mode in open(2) that we canonicalize to "(mode & S_IALLUGO) | S_IFREG" in vfs_create(). The problem is that we use the uncanonicalized mode before calling vfs_create() with unforseen consequences. So do the canonicalization early in build_open_flags(). Signed-off-by: Miklos Szeredi Tested-by: Richard W.M. Jones Signed-off-by: Greg Kroah-Hartman --- fs/open.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) (limited to 'fs') diff --git a/fs/open.c b/fs/open.c index 7e18c4d..bf00a86 100644 --- a/fs/open.c +++ b/fs/open.c @@ -900,9 +900,10 @@ static inline int build_open_flags(int flags, int mode, struct open_flags *op) int lookup_flags = 0; int acc_mode; - if (!(flags & O_CREAT)) - mode = 0; - op->mode = mode; + if (flags & O_CREAT) + op->mode = (mode & S_IALLUGO) | S_IFREG; + else + op->mode = 0; /* Must never be set by userspace */ flags &= ~FMODE_NONOTIFY; -- cgit v1.1 From 9516c03e553f8947ab63592a42e0a6b381a690bf Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Mon, 20 Aug 2012 12:42:15 -0400 Subject: NFSv3: Ensure that do_proc_get_root() reports errors correctly commit 086600430493e04b802bee6e5b3ce0458e4eb77f upstream. If the rpc call to NFS3PROC_FSINFO fails, then we need to report that error so that the mount fails. Otherwise we can end up with a superblock with completely unusable values for block sizes, maxfilesize, etc. Reported-by: Yuanming Chen Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman --- fs/nfs/nfs3proc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/nfs/nfs3proc.c b/fs/nfs/nfs3proc.c index 771741f..edfca53 100644 --- a/fs/nfs/nfs3proc.c +++ b/fs/nfs/nfs3proc.c @@ -68,7 +68,7 @@ do_proc_get_root(struct rpc_clnt *client, struct nfs_fh *fhandle, nfs_fattr_init(info->fattr); status = rpc_call_sync(client, &msg, 0); dprintk("%s: reply fsinfo: %d\n", __func__, status); - if (!(info->fattr->valid & NFS_ATTR_FATTR)) { + if (status == 0 && !(info->fattr->valid & NFS_ATTR_FATTR)) { msg.rpc_proc = &nfs3_procedures[NFS3PROC_GETATTR]; msg.rpc_resp = info->fattr; status = rpc_call_sync(client, &msg, 0); -- cgit v1.1 From 4ad55ffb366e07fdaefc2372e77b98569c9b4619 Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Wed, 8 Aug 2012 16:03:13 -0400 Subject: NFSv4.1: Remove a bogus BUG_ON() in nfs4_layoutreturn_done commit 47fbf7976e0b7d9dcdd799e2a1baba19064d9631 upstream. Ever since commit 0a57cdac3f (NFSv4.1 send layoutreturn to fence disconnected data server) we've been sending layoutreturn calls while there is potentially still outstanding I/O to the data servers. The reason we do this is to avoid races between replayed writes to the MDS and the original writes to the DS. When this happens, the BUG_ON() in nfs4_layoutreturn_done can be triggered because it assumes that we would never call layoutreturn without knowing that all I/O to the DS is finished. The fix is to remove the BUG_ON() now that the assumptions behind the test are obsolete. Reported-by: Boaz Harrosh Reported-by: Tigran Mkrtchyan Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman --- fs/nfs/nfs4proc.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) (limited to 'fs') diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index b7a7e5f..3da1166 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -5766,12 +5766,8 @@ static void nfs4_layoutreturn_done(struct rpc_task *task, void *calldata) return; } spin_lock(&lo->plh_inode->i_lock); - if (task->tk_status == 0) { - if (lrp->res.lrs_present) { - pnfs_set_layout_stateid(lo, &lrp->res.stateid, true); - } else - BUG_ON(!list_empty(&lo->plh_segs)); - } + if (task->tk_status == 0 && lrp->res.lrs_present) + pnfs_set_layout_stateid(lo, &lrp->res.stateid, true); lo->plh_block_lgets--; spin_unlock(&lo->plh_inode->i_lock); dprintk("<-- %s\n", __func__); -- cgit v1.1 From 002d4127ed4f52323b0dc6573df907a5c2c7cef0 Mon Sep 17 00:00:00 2001 From: "bjschuma@gmail.com" Date: Wed, 8 Aug 2012 13:57:10 -0400 Subject: NFS: Alias the nfs module to nfs4 commit 425e776d93a7a5070b77d4f458a5bab0f924652c upstream. This allows distros to remove the line from their modprobe configuration. Signed-off-by: Bryan Schumaker Signed-off-by: Trond Myklebust [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- fs/nfs/super.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'fs') diff --git a/fs/nfs/super.c b/fs/nfs/super.c index 8e7b61d..d709112 100644 --- a/fs/nfs/super.c +++ b/fs/nfs/super.c @@ -3096,4 +3096,6 @@ static struct dentry *nfs4_referral_mount(struct file_system_type *fs_type, return res; } +MODULE_ALIAS("nfs4"); + #endif /* CONFIG_NFS_V4 */ -- cgit v1.1 From 72013257f37b157958a9b3a9a102fc6d3f7dde0a Mon Sep 17 00:00:00 2001 From: Hugh Dickins Date: Thu, 23 Aug 2012 12:17:36 +0200 Subject: block: replace __getblk_slow misfix by grow_dev_page fix commit 676ce6d5ca3098339c028d44fe0427d1566a4d2d upstream. Commit 91f68c89d8f3 ("block: fix infinite loop in __getblk_slow") is not good: a successful call to grow_buffers() cannot guarantee that the page won't be reclaimed before the immediate next call to __find_get_block(), which is why there was always a loop there. Yesterday I got "EXT4-fs error (device loop0): __ext4_get_inode_loc:3595: inode #19278: block 664: comm cc1: unable to read itable block" on console, which pointed to this commit. I've been trying to bisect for weeks, why kbuild-on-ext4-on-loop-on-tmpfs sometimes fails from a missing header file, under memory pressure on ppc G5. I've never seen this on x86, and I've never seen it on 3.5-rc7 itself, despite that commit being in there: bisection pointed to an irrelevant pinctrl merge, but hard to tell when failure takes between 18 minutes and 38 hours (but so far it's happened quicker on 3.6-rc2). (I've since found such __ext4_get_inode_loc errors in /var/log/messages from previous weeks: why the message never appeared on console until yesterday morning is a mystery for another day.) Revert 91f68c89d8f3, restoring __getblk_slow() to how it was (plus a checkpatch nitfix). Simplify the interface between grow_buffers() and grow_dev_page(), and avoid the infinite loop beyond end of device by instead checking init_page_buffers()'s end_block there (I presume that's more efficient than a repeated call to blkdev_max_block()), returning -ENXIO to __getblk_slow() in that case. And remove akpm's ten-year-old "__getblk() cannot fail ... weird" comment, but that is worrying: are all users of __getblk() really now prepared for a NULL bh beyond end of device, or will some oops?? Signed-off-by: Hugh Dickins Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman --- fs/buffer.c | 66 ++++++++++++++++++++++++++++--------------------------------- 1 file changed, 30 insertions(+), 36 deletions(-) (limited to 'fs') diff --git a/fs/buffer.c b/fs/buffer.c index d421626..166028b 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -961,7 +961,7 @@ link_dev_buffers(struct page *page, struct buffer_head *head) /* * Initialise the state of a blockdev page's buffers. */ -static void +static sector_t init_page_buffers(struct page *page, struct block_device *bdev, sector_t block, int size) { @@ -983,33 +983,41 @@ init_page_buffers(struct page *page, struct block_device *bdev, block++; bh = bh->b_this_page; } while (bh != head); + + /* + * Caller needs to validate requested block against end of device. + */ + return end_block; } /* * Create the page-cache page that contains the requested block. * - * This is user purely for blockdev mappings. + * This is used purely for blockdev mappings. */ -static struct page * +static int grow_dev_page(struct block_device *bdev, sector_t block, - pgoff_t index, int size) + pgoff_t index, int size, int sizebits) { struct inode *inode = bdev->bd_inode; struct page *page; struct buffer_head *bh; + sector_t end_block; + int ret = 0; /* Will call free_more_memory() */ page = find_or_create_page(inode->i_mapping, index, (mapping_gfp_mask(inode->i_mapping) & ~__GFP_FS)|__GFP_MOVABLE); if (!page) - return NULL; + return ret; BUG_ON(!PageLocked(page)); if (page_has_buffers(page)) { bh = page_buffers(page); if (bh->b_size == size) { - init_page_buffers(page, bdev, block, size); - return page; + end_block = init_page_buffers(page, bdev, + index << sizebits, size); + goto done; } if (!try_to_free_buffers(page)) goto failed; @@ -1029,15 +1037,15 @@ grow_dev_page(struct block_device *bdev, sector_t block, */ spin_lock(&inode->i_mapping->private_lock); link_dev_buffers(page, bh); - init_page_buffers(page, bdev, block, size); + end_block = init_page_buffers(page, bdev, index << sizebits, size); spin_unlock(&inode->i_mapping->private_lock); - return page; +done: + ret = (block < end_block) ? 1 : -ENXIO; failed: - BUG(); unlock_page(page); page_cache_release(page); - return NULL; + return ret; } /* @@ -1047,7 +1055,6 @@ failed: static int grow_buffers(struct block_device *bdev, sector_t block, int size) { - struct page *page; pgoff_t index; int sizebits; @@ -1071,22 +1078,14 @@ grow_buffers(struct block_device *bdev, sector_t block, int size) bdevname(bdev, b)); return -EIO; } - block = index << sizebits; + /* Create a page with the proper size buffers.. */ - page = grow_dev_page(bdev, block, index, size); - if (!page) - return 0; - unlock_page(page); - page_cache_release(page); - return 1; + return grow_dev_page(bdev, block, index, size, sizebits); } static struct buffer_head * __getblk_slow(struct block_device *bdev, sector_t block, int size) { - int ret; - struct buffer_head *bh; - /* Size must be multiple of hard sectorsize */ if (unlikely(size & (bdev_logical_block_size(bdev)-1) || (size < 512 || size > PAGE_SIZE))) { @@ -1099,21 +1098,20 @@ __getblk_slow(struct block_device *bdev, sector_t block, int size) return NULL; } -retry: - bh = __find_get_block(bdev, block, size); - if (bh) - return bh; + for (;;) { + struct buffer_head *bh; + int ret; - ret = grow_buffers(bdev, block, size); - if (ret == 0) { - free_more_memory(); - goto retry; - } else if (ret > 0) { bh = __find_get_block(bdev, block, size); if (bh) return bh; + + ret = grow_buffers(bdev, block, size); + if (ret < 0) + return NULL; + if (ret == 0) + free_more_memory(); } - return NULL; } /* @@ -1369,10 +1367,6 @@ EXPORT_SYMBOL(__find_get_block); * which corresponds to the passed block_device, block and size. The * returned buffer has its reference count incremented. * - * __getblk() cannot fail - it just keeps trying. If you pass it an - * illegal block number, __getblk() will happily return a buffer_head - * which represents the non-existent block. Very weird. - * * __getblk() will lock up the machine if grow_dev_page's try_to_free_buffers() * attempt is failing. FIXME, perhaps? */ -- cgit v1.1 From 31147bc619c3379e335726e071ea012784ad9877 Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Wed, 5 Sep 2012 15:48:23 +0200 Subject: udf: Fix data corruption for files in ICB commit 9c2fc0de1a6e638fe58c354a463f544f42a90a09 upstream. When a file is stored in ICB (inode), we overwrite part of the file, and the page containing file's data is not in page cache, we end up corrupting file's data by overwriting them with zeros. The problem is we use simple_write_begin() which simply zeroes parts of the page which are not written to. The problem has been introduced by be021ee4 (udf: convert to new aops). Fix the problem by providing a ->write_begin function which makes the page properly uptodate. Reported-by: Ian Abbott Signed-off-by: Jan Kara Signed-off-by: Greg Kroah-Hartman --- fs/udf/file.c | 35 +++++++++++++++++++++++++++++------ 1 file changed, 29 insertions(+), 6 deletions(-) (limited to 'fs') diff --git a/fs/udf/file.c b/fs/udf/file.c index 3438b00..8eb9628 100644 --- a/fs/udf/file.c +++ b/fs/udf/file.c @@ -39,20 +39,24 @@ #include "udf_i.h" #include "udf_sb.h" -static int udf_adinicb_readpage(struct file *file, struct page *page) +static void __udf_adinicb_readpage(struct page *page) { struct inode *inode = page->mapping->host; char *kaddr; struct udf_inode_info *iinfo = UDF_I(inode); - BUG_ON(!PageLocked(page)); - kaddr = kmap(page); - memset(kaddr, 0, PAGE_CACHE_SIZE); memcpy(kaddr, iinfo->i_ext.i_data + iinfo->i_lenEAttr, inode->i_size); + memset(kaddr + inode->i_size, 0, PAGE_CACHE_SIZE - inode->i_size); flush_dcache_page(page); SetPageUptodate(page); kunmap(page); +} + +static int udf_adinicb_readpage(struct file *file, struct page *page) +{ + BUG_ON(!PageLocked(page)); + __udf_adinicb_readpage(page); unlock_page(page); return 0; @@ -77,6 +81,25 @@ static int udf_adinicb_writepage(struct page *page, return 0; } +static int udf_adinicb_write_begin(struct file *file, + struct address_space *mapping, loff_t pos, + unsigned len, unsigned flags, struct page **pagep, + void **fsdata) +{ + struct page *page; + + if (WARN_ON_ONCE(pos >= PAGE_CACHE_SIZE)) + return -EIO; + page = grab_cache_page_write_begin(mapping, 0, flags); + if (!page) + return -ENOMEM; + *pagep = page; + + if (!PageUptodate(page) && len != PAGE_CACHE_SIZE) + __udf_adinicb_readpage(page); + return 0; +} + static int udf_adinicb_write_end(struct file *file, struct address_space *mapping, loff_t pos, unsigned len, unsigned copied, @@ -98,8 +121,8 @@ static int udf_adinicb_write_end(struct file *file, const struct address_space_operations udf_adinicb_aops = { .readpage = udf_adinicb_readpage, .writepage = udf_adinicb_writepage, - .write_begin = simple_write_begin, - .write_end = udf_adinicb_write_end, + .write_begin = udf_adinicb_write_begin, + .write_end = udf_adinicb_write_end, }; static ssize_t udf_file_aio_write(struct kiocb *iocb, const struct iovec *iov, -- cgit v1.1 From 04234b36211285e5242794b75137f42f177e0ef5 Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Mon, 3 Sep 2012 16:50:42 +0200 Subject: ext3: Fix fdatasync() for files with only i_size changes commit 156bddd8e505b295540f3ca0e27dda68cb0d49aa upstream. Code tracking when transaction needs to be committed on fdatasync(2) forgets to handle a situation when only inode's i_size is changed. Thus in such situations fdatasync(2) doesn't force transaction with new i_size to disk and that can result in wrong i_size after a crash. Fix the issue by updating inode's i_datasync_tid whenever its size is updated. Reported-by: Kristian Nielsen Signed-off-by: Jan Kara Signed-off-by: Greg Kroah-Hartman --- fs/ext3/inode.c | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-) (limited to 'fs') diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c index db9ba1a..0aedb27 100644 --- a/fs/ext3/inode.c +++ b/fs/ext3/inode.c @@ -3013,6 +3013,8 @@ static int ext3_do_update_inode(handle_t *handle, struct ext3_inode_info *ei = EXT3_I(inode); struct buffer_head *bh = iloc->bh; int err = 0, rc, block; + int need_datasync = 0; + __le32 disksize; again: /* we can't allow multiple procs in here at once, its a bit racey */ @@ -3050,7 +3052,11 @@ again: raw_inode->i_gid_high = 0; } raw_inode->i_links_count = cpu_to_le16(inode->i_nlink); - raw_inode->i_size = cpu_to_le32(ei->i_disksize); + disksize = cpu_to_le32(ei->i_disksize); + if (disksize != raw_inode->i_size) { + need_datasync = 1; + raw_inode->i_size = disksize; + } raw_inode->i_atime = cpu_to_le32(inode->i_atime.tv_sec); raw_inode->i_ctime = cpu_to_le32(inode->i_ctime.tv_sec); raw_inode->i_mtime = cpu_to_le32(inode->i_mtime.tv_sec); @@ -3066,8 +3072,11 @@ again: if (!S_ISREG(inode->i_mode)) { raw_inode->i_dir_acl = cpu_to_le32(ei->i_dir_acl); } else { - raw_inode->i_size_high = - cpu_to_le32(ei->i_disksize >> 32); + disksize = cpu_to_le32(ei->i_disksize >> 32); + if (disksize != raw_inode->i_size_high) { + raw_inode->i_size_high = disksize; + need_datasync = 1; + } if (ei->i_disksize > 0x7fffffffULL) { struct super_block *sb = inode->i_sb; if (!EXT3_HAS_RO_COMPAT_FEATURE(sb, @@ -3120,6 +3129,8 @@ again: ext3_clear_inode_state(inode, EXT3_STATE_NEW); atomic_set(&ei->i_sync_tid, handle->h_transaction->t_tid); + if (need_datasync) + atomic_set(&ei->i_datasync_tid, handle->h_transaction->t_tid); out_brelse: brelse (bh); ext3_std_error(inode->i_sb, err); -- cgit v1.1 From fd63204e4873e9cbabd819ffdc15f3bcd85c6674 Mon Sep 17 00:00:00 2001 From: Miklos Szeredi Date: Tue, 4 Sep 2012 18:45:54 +0200 Subject: fuse: fix retrieve length commit c9e67d483776d8d2a5f3f70491161b205930ffe1 upstream. In some cases fuse_retrieve() would return a short byte count if offset was non-zero. The data returned was correct, though. Signed-off-by: Miklos Szeredi Signed-off-by: Greg Kroah-Hartman --- fs/fuse/dev.c | 1 + 1 file changed, 1 insertion(+) (limited to 'fs') diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 2aaf3ea..5c029fb 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -1524,6 +1524,7 @@ static int fuse_retrieve(struct fuse_conn *fc, struct inode *inode, req->pages[req->num_pages] = page; req->num_pages++; + offset = 0; num -= this_num; total_len += this_num; index++; -- cgit v1.1 From c168d49dbb4bc6cbd58b855166e5f98585479307 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Fri, 14 Sep 2012 14:48:21 -0700 Subject: vfs: make O_PATH file descriptors usable for 'fstat()' MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 55815f70147dcfa3ead5738fd56d3574e2e3c1c2 upstream. We already use them for openat() and friends, but fstat() also wants to be able to use O_PATH file descriptors. This should make it more directly comparable to the O_SEARCH of Solaris. Note that you could already do the same thing with "fstatat()" and an empty path, but just doing "fstat()" directly is simpler and faster, so there is no reason not to just allow it directly. See also commit 332a2e1244bd, which did the same thing for fchdir, for the same reasons. Reported-by: ольга крыжановская Cc: Al Viro Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- fs/stat.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/stat.c b/fs/stat.c index 02a6061..aec24ec 100644 --- a/fs/stat.c +++ b/fs/stat.c @@ -57,7 +57,7 @@ EXPORT_SYMBOL(vfs_getattr); int vfs_fstat(unsigned int fd, struct kstat *stat) { - struct file *f = fget(fd); + struct file *f = fget_raw(fd); int error = -EBADF; if (f) { -- cgit v1.1 From 8b2b69f4e7a4cd7768323f84f17f09ab608e620d Mon Sep 17 00:00:00 2001 From: Miklos Szeredi Date: Mon, 17 Sep 2012 22:31:38 +0200 Subject: vfs: dcache: use DCACHE_DENTRY_KILLED instead of DCACHE_DISCONNECTED in d_kill() commit b161dfa6937ae46d50adce8a7c6b12233e96e7bd upstream. IBM reported a soft lockup after applying the fix for the rename_lock deadlock. Commit c83ce989cb5f ("VFS: Fix the nfs sillyrename regression in kernel 2.6.38") was found to be the culprit. The nfs sillyrename fix used DCACHE_DISCONNECTED to indicate that the dentry was killed. This flag can be set on non-killed dentries too, which results in infinite retries when trying to traverse the dentry tree. This patch introduces a separate flag: DCACHE_DENTRY_KILLED, which is only set in d_kill() and makes try_to_ascend() test only this flag. IBM reported successful test results with this patch. Signed-off-by: Miklos Szeredi Cc: Trond Myklebust Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- fs/dcache.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'fs') diff --git a/fs/dcache.c b/fs/dcache.c index 0b51cfc..bd8aaf6 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -290,7 +290,7 @@ static struct dentry *d_kill(struct dentry *dentry, struct dentry *parent) * Inform try_to_ascend() that we are no longer attached to the * dentry tree */ - dentry->d_flags |= DCACHE_DISCONNECTED; + dentry->d_flags |= DCACHE_DENTRY_KILLED; if (parent) spin_unlock(&parent->d_lock); dentry_iput(dentry); @@ -1015,7 +1015,7 @@ static struct dentry *try_to_ascend(struct dentry *old, int locked, unsigned seq * or deletion */ if (new != old->d_parent || - (old->d_flags & DCACHE_DISCONNECTED) || + (old->d_flags & DCACHE_DENTRY_KILLED) || (!locked && read_seqretry(&rename_lock, seq))) { spin_unlock(&new->d_lock); new = NULL; -- cgit v1.1 From 047b8d01518246de438ba76a957889b6e661638e Mon Sep 17 00:00:00 2001 From: Tyler Hicks Date: Thu, 13 Sep 2012 12:00:56 -0700 Subject: eCryptfs: Copy up attributes of the lower target inode after rename commit 8335eafc2859e1a26282bef7c3d19f3d68868b8a upstream. After calling into the lower filesystem to do a rename, the lower target inode's attributes were not copied up to the eCryptfs target inode. This resulted in the eCryptfs target inode staying around, rather than being evicted, because i_nlink was not updated for the eCryptfs inode. This also meant that eCryptfs didn't do the final iput() on the lower target inode so it stayed around, as well. This would result in a failure to free up space occupied by the target file in the rename() operation. Both target inodes would eventually be evicted when the eCryptfs filesystem was unmounted. This patch calls fsstack_copy_attr_all() after the lower filesystem does its ->rename() so that important inode attributes, such as i_nlink, are updated at the eCryptfs layer. ecryptfs_evict_inode() is now called and eCryptfs can drop its final reference on the lower inode. http://launchpad.net/bugs/561129 Signed-off-by: Tyler Hicks Tested-by: Colin Ian King Signed-off-by: Greg Kroah-Hartman --- fs/ecryptfs/inode.c | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'fs') diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c index 2717329..4a91a05 100644 --- a/fs/ecryptfs/inode.c +++ b/fs/ecryptfs/inode.c @@ -653,6 +653,7 @@ ecryptfs_rename(struct inode *old_dir, struct dentry *old_dentry, struct dentry *lower_old_dir_dentry; struct dentry *lower_new_dir_dentry; struct dentry *trap = NULL; + struct inode *target_inode; lower_old_dentry = ecryptfs_dentry_to_lower(old_dentry); lower_new_dentry = ecryptfs_dentry_to_lower(new_dentry); @@ -660,6 +661,7 @@ ecryptfs_rename(struct inode *old_dir, struct dentry *old_dentry, dget(lower_new_dentry); lower_old_dir_dentry = dget_parent(lower_old_dentry); lower_new_dir_dentry = dget_parent(lower_new_dentry); + target_inode = new_dentry->d_inode; trap = lock_rename(lower_old_dir_dentry, lower_new_dir_dentry); /* source should not be ancestor of target */ if (trap == lower_old_dentry) { @@ -675,6 +677,9 @@ ecryptfs_rename(struct inode *old_dir, struct dentry *old_dentry, lower_new_dir_dentry->d_inode, lower_new_dentry); if (rc) goto out_lock; + if (target_inode) + fsstack_copy_attr_all(target_inode, + ecryptfs_inode_to_lower(target_inode)); fsstack_copy_attr_all(new_dir, lower_new_dir_dentry->d_inode); if (new_dir != old_dir) fsstack_copy_attr_all(old_dir, lower_old_dir_dentry->d_inode); -- cgit v1.1 From 839e17b7fb531eec48714abf5962b0a1310f71d9 Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Mon, 3 Sep 2012 14:56:02 -0400 Subject: NFS: Fix the initialisation of the readdir 'cookieverf' array commit c3f52af3e03013db5237e339c817beaae5ec9e3a upstream. When the NFS_COOKIEVERF helper macro was converted into a static inline function in commit 99fadcd764 (nfs: convert NFS_*(inode) helpers to static inline), we broke the initialisation of the readdir cookies, since that depended on doing a memset with an argument of 'sizeof(NFS_COOKIEVERF(inode))' which therefore changed from sizeof(be32 cookieverf[2]) to sizeof(be32 *). At this point, NFS_COOKIEVERF seems to be more of an obfuscation than a helper, so the best thing would be to just get rid of it. Also see: https://bugzilla.kernel.org/show_bug.cgi?id=46881 Reported-by: Andi Kleen Reported-by: David Binderman Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman --- fs/nfs/inode.c | 2 +- fs/nfs/nfs3proc.c | 2 +- fs/nfs/nfs4proc.c | 4 ++-- 3 files changed, 4 insertions(+), 4 deletions(-) (limited to 'fs') diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c index c48f9f6..873c6f2 100644 --- a/fs/nfs/inode.c +++ b/fs/nfs/inode.c @@ -150,7 +150,7 @@ static void nfs_zap_caches_locked(struct inode *inode) nfsi->attrtimeo = NFS_MINATTRTIMEO(inode); nfsi->attrtimeo_timestamp = jiffies; - memset(NFS_COOKIEVERF(inode), 0, sizeof(NFS_COOKIEVERF(inode))); + memset(NFS_I(inode)->cookieverf, 0, sizeof(NFS_I(inode)->cookieverf)); if (S_ISREG(mode) || S_ISDIR(mode) || S_ISLNK(mode)) nfsi->cache_validity |= NFS_INO_INVALID_ATTR|NFS_INO_INVALID_DATA|NFS_INO_INVALID_ACCESS|NFS_INO_INVALID_ACL|NFS_INO_REVAL_PAGECACHE; else diff --git a/fs/nfs/nfs3proc.c b/fs/nfs/nfs3proc.c index edfca53..f0a6990 100644 --- a/fs/nfs/nfs3proc.c +++ b/fs/nfs/nfs3proc.c @@ -633,7 +633,7 @@ nfs3_proc_readdir(struct dentry *dentry, struct rpc_cred *cred, u64 cookie, struct page **pages, unsigned int count, int plus) { struct inode *dir = dentry->d_inode; - __be32 *verf = NFS_COOKIEVERF(dir); + __be32 *verf = NFS_I(dir)->cookieverf; struct nfs3_readdirargs arg = { .fh = NFS_FH(dir), .cookie = cookie, diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index 3da1166..c722905 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -3018,11 +3018,11 @@ static int _nfs4_proc_readdir(struct dentry *dentry, struct rpc_cred *cred, dentry->d_parent->d_name.name, dentry->d_name.name, (unsigned long long)cookie); - nfs4_setup_readdir(cookie, NFS_COOKIEVERF(dir), dentry, &args); + nfs4_setup_readdir(cookie, NFS_I(dir)->cookieverf, dentry, &args); res.pgbase = args.pgbase; status = nfs4_call_sync(NFS_SERVER(dir)->client, NFS_SERVER(dir), &msg, &args.seq_args, &res.seq_res, 0); if (status >= 0) { - memcpy(NFS_COOKIEVERF(dir), res.verifier.data, NFS4_VERIFIER_SIZE); + memcpy(NFS_I(dir)->cookieverf, res.verifier.data, NFS4_VERIFIER_SIZE); status += args.pgbase; } -- cgit v1.1 From d351ebe91e308cebfefc6808fd96a53c9d8f0c7b Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Tue, 4 Sep 2012 11:05:07 -0400 Subject: NFS: Fix a problem with the legacy binary mount code commit 872ece86ea5c367aa92f44689c2d01a1c767aeb3 upstream. Apparently, am-utils is still using the legacy binary mountdata interface, and is having trouble parsing /proc/mounts due to the 'port=' field being incorrectly set. The following patch should fix up the regression. Reported-by: Marius Tolzmann Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman --- fs/nfs/super.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'fs') diff --git a/fs/nfs/super.c b/fs/nfs/super.c index d709112..a1f3d6e 100644 --- a/fs/nfs/super.c +++ b/fs/nfs/super.c @@ -1815,6 +1815,7 @@ static int nfs_validate_mount_data(void *options, memcpy(sap, &data->addr, sizeof(data->addr)); args->nfs_server.addrlen = sizeof(data->addr); + args->nfs_server.port = ntohs(data->addr.sin_port); if (!nfs_verify_server_address(sap)) goto out_no_address; @@ -2528,6 +2529,7 @@ static int nfs4_validate_mount_data(void *options, return -EFAULT; if (!nfs_verify_server_address(sap)) goto out_no_address; + args->nfs_server.port = ntohs(((struct sockaddr_in *)sap)->sin_port); if (data->auth_flavourlen) { if (data->auth_flavourlen > 1) -- cgit v1.1 From 863f36bf5ad7edde2ec1196b098fec951fca16d3 Mon Sep 17 00:00:00 2001 From: Weston Andros Adamson Date: Thu, 6 Sep 2012 15:54:27 -0400 Subject: NFS: return error from decode_getfh in decode open commit 01913b49cf1dc6409a07dd2a4cc6af2e77f3c410 upstream. If decode_getfh failed, nfs4_xdr_dec_open would return 0 since the last decode_* call must have succeeded. Signed-off-by: Weston Andros Adamson Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman --- fs/nfs/nfs4xdr.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c index fc97fd5..5fcc67b 100644 --- a/fs/nfs/nfs4xdr.c +++ b/fs/nfs/nfs4xdr.c @@ -5745,7 +5745,8 @@ static int nfs4_xdr_dec_open(struct rpc_rqst *rqstp, struct xdr_stream *xdr, status = decode_open(xdr, res); if (status) goto out; - if (decode_getfh(xdr, &res->fh) != 0) + status = decode_getfh(xdr, &res->fh); + if (status) goto out; if (decode_getfattr(xdr, res->f_attr, res->server, !RPC_IS_ASYNC(rqstp->rq_task)) != 0) -- cgit v1.1 From 9523d5244a925f514e5daf04e5030a68e91f55cd Mon Sep 17 00:00:00 2001 From: Phillip Lougher Date: Mon, 2 Jan 2012 17:47:14 +0000 Subject: Squashfs: fix mount time sanity check for corrupted superblock commit cc37f75a9ffbbfcb1c3297534f293c8284e3c5a6 upstream. A Squashfs filesystem containing nothing but an empty directory, although unusual and ultimately pointless, is still valid. The directory_table >= next_table sanity check rejects these filesystems as invalid because the directory_table is empty and equal to next_table. Signed-off-by: Phillip Lougher Cc: Geert Uytterhoeven Signed-off-by: Greg Kroah-Hartman --- fs/squashfs/super.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/squashfs/super.c b/fs/squashfs/super.c index 7438850..b5a8636 100644 --- a/fs/squashfs/super.c +++ b/fs/squashfs/super.c @@ -290,7 +290,7 @@ handle_fragments: check_directory_table: /* Sanity check directory_table */ - if (msblk->directory_table >= next_table) { + if (msblk->directory_table > next_table) { err = -EINVAL; goto failed_mount; } -- cgit v1.1 From d2212d278786f0e0a1942796c7e5549a49fa5b34 Mon Sep 17 00:00:00 2001 From: Miklos Szeredi Date: Mon, 17 Sep 2012 22:23:30 +0200 Subject: vfs: dcache: fix deadlock in tree traversal commit 8110e16d42d587997bcaee0c864179e6d93603fe upstream. IBM reported a deadlock in select_parent(). This was found to be caused by taking rename_lock when already locked when restarting the tree traversal. There are two cases when the traversal needs to be restarted: 1) concurrent d_move(); this can only happen when not already locked, since taking rename_lock protects against concurrent d_move(). 2) racing with final d_put() on child just at the moment of ascending to parent; rename_lock doesn't protect against this rare race, so it can happen when already locked. Because of case 2, we need to be able to handle restarting the traversal when rename_lock is already held. This patch fixes all three callers of try_to_ascend(). IBM reported that the deadlock is gone with this patch. [ I rewrote the patch to be smaller and just do the "goto again" if the lock was already held, but credit goes to Miklos for the real work. - Linus ] Signed-off-by: Miklos Szeredi Cc: Al Viro Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- fs/dcache.c | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'fs') diff --git a/fs/dcache.c b/fs/dcache.c index bd8aaf6..8b64f38 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -1101,6 +1101,8 @@ positive: return 1; rename_retry: + if (locked) + goto again; locked = 1; write_seqlock(&rename_lock); goto again; @@ -1203,6 +1205,8 @@ out: rename_retry: if (found) return found; + if (locked) + goto again; locked = 1; write_seqlock(&rename_lock); goto again; @@ -2990,6 +2994,8 @@ resume: return; rename_retry: + if (locked) + goto again; locked = 1; write_seqlock(&rename_lock); goto again; -- cgit v1.1 From 7c36d46d0852fa548eed324772a117fd1eda6eb5 Mon Sep 17 00:00:00 2001 From: Denys Vlasenko Date: Wed, 26 Sep 2012 11:34:50 +1000 Subject: coredump: prevent double-free on an error path in core dumper commit f34f9d186df35e5c39163444c43b4fc6255e39c5 upstream. In !CORE_DUMP_USE_REGSET case, if elf_note_info_init fails to allocate memory for info->fields, it frees already allocated stuff and returns error to its caller, fill_note_info. Which in turn returns error to its caller, elf_core_dump. Which jumps to cleanup label and calls free_note_info, which will happily try to free all info->fields again. BOOM. This is the fix. Signed-off-by: Oleg Nesterov Signed-off-by: Denys Vlasenko Cc: Venu Byravarasu Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman --- fs/binfmt_elf.c | 19 ++++--------------- 1 file changed, 4 insertions(+), 15 deletions(-) (limited to 'fs') diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index 618493e..7e8299f 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -1669,30 +1669,19 @@ static int elf_note_info_init(struct elf_note_info *info) return 0; info->psinfo = kmalloc(sizeof(*info->psinfo), GFP_KERNEL); if (!info->psinfo) - goto notes_free; + return 0; info->prstatus = kmalloc(sizeof(*info->prstatus), GFP_KERNEL); if (!info->prstatus) - goto psinfo_free; + return 0; info->fpu = kmalloc(sizeof(*info->fpu), GFP_KERNEL); if (!info->fpu) - goto prstatus_free; + return 0; #ifdef ELF_CORE_COPY_XFPREGS info->xfpu = kmalloc(sizeof(*info->xfpu), GFP_KERNEL); if (!info->xfpu) - goto fpu_free; + return 0; #endif return 1; -#ifdef ELF_CORE_COPY_XFPREGS - fpu_free: - kfree(info->fpu); -#endif - prstatus_free: - kfree(info->prstatus); - psinfo_free: - kfree(info->psinfo); - notes_free: - kfree(info->notes); - return 0; } static int fill_note_info(struct elfhdr *elf, int phdrs, -- cgit v1.1 From 48fa0772b93d6e2482d23a41a9c6474cfa2e5e35 Mon Sep 17 00:00:00 2001 From: Dmitry Monakhov Date: Wed, 26 Sep 2012 12:32:54 -0400 Subject: ext4: online defrag is not supported for journaled files commit f066055a3449f0e5b0ae4f3ceab4445bead47638 upstream. Proper block swap for inodes with full journaling enabled is truly non obvious task. In order to be on a safe side let's explicitly disable it for now. Signed-off-by: Dmitry Monakhov Signed-off-by: "Theodore Ts'o" Signed-off-by: Greg Kroah-Hartman --- fs/ext4/move_extent.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c index f57455a..72f9732 100644 --- a/fs/ext4/move_extent.c +++ b/fs/ext4/move_extent.c @@ -1209,7 +1209,12 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp, orig_inode->i_ino, donor_inode->i_ino); return -EINVAL; } - + /* TODO: This is non obvious task to swap blocks for inodes with full + jornaling enabled */ + if (ext4_should_journal_data(orig_inode) || + ext4_should_journal_data(donor_inode)) { + return -EINVAL; + } /* Protect orig and donor inodes against a truncate */ ret1 = mext_inode_double_lock(orig_inode, donor_inode); if (ret1 < 0) -- cgit v1.1 From 985f704d74944dc66b5185aa9ccebcb936f2b8e0 Mon Sep 17 00:00:00 2001 From: Bernd Schubert Date: Wed, 26 Sep 2012 21:24:57 -0400 Subject: ext4: always set i_op in ext4_mknod() commit 6a08f447facb4f9e29fcc30fb68060bb5a0d21c2 upstream. ext4_special_inode_operations have their own ifdef CONFIG_EXT4_FS_XATTR to mask those methods. And ext4_iget also always sets it, so there is an inconsistency. Signed-off-by: Bernd Schubert Signed-off-by: "Theodore Ts'o" Signed-off-by: Greg Kroah-Hartman --- fs/ext4/namei.c | 2 -- 1 file changed, 2 deletions(-) (limited to 'fs') diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index 3d36d5a..78585fc 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -1799,9 +1799,7 @@ retry: err = PTR_ERR(inode); if (!IS_ERR(inode)) { init_special_inode(inode, inode->i_mode, rdev); -#ifdef CONFIG_EXT4_FS_XATTR inode->i_op = &ext4_special_inode_operations; -#endif err = ext4_add_nondir(handle, dentry, inode); } ext4_journal_stop(handle); -- cgit v1.1 From a6c0070c1f5a6c7b0bba5bb5be44b1dabe88af56 Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Wed, 26 Sep 2012 21:52:20 -0400 Subject: ext4: fix fdatasync() for files with only i_size changes commit b71fc079b5d8f42b2a52743c8d2f1d35d655b1c5 upstream. Code tracking when transaction needs to be committed on fdatasync(2) forgets to handle a situation when only inode's i_size is changed. Thus in such situations fdatasync(2) doesn't force transaction with new i_size to disk and that can result in wrong i_size after a crash. Fix the issue by updating inode's i_datasync_tid whenever its size is updated. Reported-by: Kristian Nielsen Signed-off-by: Jan Kara Signed-off-by: Greg Kroah-Hartman --- fs/ext4/inode.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) (limited to 'fs') diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 18fee6d..1dbf758 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -5151,6 +5151,7 @@ static int ext4_do_update_inode(handle_t *handle, struct ext4_inode_info *ei = EXT4_I(inode); struct buffer_head *bh = iloc->bh; int err = 0, rc, block; + int need_datasync = 0; /* For fields not not tracking in the in-memory inode, * initialise them to zero for new inodes. */ @@ -5199,7 +5200,10 @@ static int ext4_do_update_inode(handle_t *handle, raw_inode->i_file_acl_high = cpu_to_le16(ei->i_file_acl >> 32); raw_inode->i_file_acl_lo = cpu_to_le32(ei->i_file_acl); - ext4_isize_set(raw_inode, ei->i_disksize); + if (ei->i_disksize != ext4_isize(raw_inode)) { + ext4_isize_set(raw_inode, ei->i_disksize); + need_datasync = 1; + } if (ei->i_disksize > 0x7fffffffULL) { struct super_block *sb = inode->i_sb; if (!EXT4_HAS_RO_COMPAT_FEATURE(sb, @@ -5252,7 +5256,7 @@ static int ext4_do_update_inode(handle_t *handle, err = rc; ext4_clear_inode_state(inode, EXT4_STATE_NEW); - ext4_update_inode_fsync_trans(handle, inode, 0); + ext4_update_inode_fsync_trans(handle, inode, need_datasync); out_brelse: brelse(bh); ext4_std_error(inode->i_sb, err); -- cgit v1.1 From 12d63702c53bc2230dfc997e91ca891f39cb6446 Mon Sep 17 00:00:00 2001 From: Stanislav Kinsbursky Date: Tue, 18 Sep 2012 13:37:18 +0400 Subject: lockd: use rpc client's cl_nodename for id encoding commit 303a7ce92064c285a04c870f2dc0192fdb2968cb upstream. Taking hostname from uts namespace if not safe, because this cuold be performind during umount operation on child reaper death. And in this case current->nsproxy is NULL already. Signed-off-by: Stanislav Kinsbursky Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman --- fs/lockd/mon.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/lockd/mon.c b/fs/lockd/mon.c index 23d7451..df753a1 100644 --- a/fs/lockd/mon.c +++ b/fs/lockd/mon.c @@ -40,6 +40,7 @@ struct nsm_args { u32 proc; char *mon_name; + char *nodename; }; struct nsm_res { @@ -93,6 +94,7 @@ static int nsm_mon_unmon(struct nsm_handle *nsm, u32 proc, struct nsm_res *res) .vers = 3, .proc = NLMPROC_NSM_NOTIFY, .mon_name = nsm->sm_mon_name, + .nodename = utsname()->nodename, }; struct rpc_message msg = { .rpc_argp = &args, @@ -429,7 +431,7 @@ static void encode_my_id(struct xdr_stream *xdr, const struct nsm_args *argp) { __be32 *p; - encode_nsm_string(xdr, utsname()->nodename); + encode_nsm_string(xdr, argp->nodename); p = xdr_reserve_space(xdr, 4 + 4 + 4); *p++ = cpu_to_be32(argp->prog); *p++ = cpu_to_be32(argp->vers); -- cgit v1.1 From f38039a248831d279cca77ab1dab773684a96c1e Mon Sep 17 00:00:00 2001 From: Hugh Dickins Date: Sun, 7 Oct 2012 20:32:51 -0700 Subject: tmpfs,ceph,gfs2,isofs,reiserfs,xfs: fix fh_len checking commit 35c2a7f4908d404c9124c2efc6ada4640ca4d5d5 upstream. Fuzzing with trinity oopsed on the 1st instruction of shmem_fh_to_dentry(), u64 inum = fid->raw[2]; which is unhelpfully reported as at the end of shmem_alloc_inode(): BUG: unable to handle kernel paging request at ffff880061cd3000 IP: [] shmem_alloc_inode+0x40/0x40 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Call Trace: [] ? exportfs_decode_fh+0x79/0x2d0 [] do_handle_open+0x163/0x2c0 [] sys_open_by_handle_at+0xc/0x10 [] tracesys+0xe1/0xe6 Right, tmpfs is being stupid to access fid->raw[2] before validating that fh_len includes it: the buffer kmalloc'ed by do_sys_name_to_handle() may fall at the end of a page, and the next page not be present. But some other filesystems (ceph, gfs2, isofs, reiserfs, xfs) are being careless about fh_len too, in fh_to_dentry() and/or fh_to_parent(), and could oops in the same way: add the missing fh_len checks to those. Reported-by: Sasha Levin Signed-off-by: Hugh Dickins Cc: Al Viro Cc: Sage Weil Cc: Steven Whitehouse Cc: Christoph Hellwig Signed-off-by: Al Viro Signed-off-by: Greg Kroah-Hartman --- fs/ceph/export.c | 18 ++++++++++++++---- fs/gfs2/export.c | 4 ++++ fs/isofs/export.c | 2 +- fs/reiserfs/inode.c | 6 +++++- fs/xfs/linux-2.6/xfs_export.c | 3 +++ 5 files changed, 27 insertions(+), 6 deletions(-) (limited to 'fs') diff --git a/fs/ceph/export.c b/fs/ceph/export.c index f67b687..a080779 100644 --- a/fs/ceph/export.c +++ b/fs/ceph/export.c @@ -84,7 +84,7 @@ static int ceph_encode_fh(struct dentry *dentry, u32 *rawfh, int *max_len, * FIXME: we should try harder by querying the mds for the ino. */ static struct dentry *__fh_to_dentry(struct super_block *sb, - struct ceph_nfs_fh *fh) + struct ceph_nfs_fh *fh, int fh_len) { struct ceph_mds_client *mdsc = ceph_sb_to_client(sb)->mdsc; struct inode *inode; @@ -92,6 +92,9 @@ static struct dentry *__fh_to_dentry(struct super_block *sb, struct ceph_vino vino; int err; + if (fh_len < sizeof(*fh) / 4) + return ERR_PTR(-ESTALE); + dout("__fh_to_dentry %llx\n", fh->ino); vino.ino = fh->ino; vino.snap = CEPH_NOSNAP; @@ -136,7 +139,7 @@ static struct dentry *__fh_to_dentry(struct super_block *sb, * convert connectable fh to dentry */ static struct dentry *__cfh_to_dentry(struct super_block *sb, - struct ceph_nfs_confh *cfh) + struct ceph_nfs_confh *cfh, int fh_len) { struct ceph_mds_client *mdsc = ceph_sb_to_client(sb)->mdsc; struct inode *inode; @@ -144,6 +147,9 @@ static struct dentry *__cfh_to_dentry(struct super_block *sb, struct ceph_vino vino; int err; + if (fh_len < sizeof(*cfh) / 4) + return ERR_PTR(-ESTALE); + dout("__cfh_to_dentry %llx (%llx/%x)\n", cfh->ino, cfh->parent_ino, cfh->parent_name_hash); @@ -193,9 +199,11 @@ static struct dentry *ceph_fh_to_dentry(struct super_block *sb, struct fid *fid, int fh_len, int fh_type) { if (fh_type == 1) - return __fh_to_dentry(sb, (struct ceph_nfs_fh *)fid->raw); + return __fh_to_dentry(sb, (struct ceph_nfs_fh *)fid->raw, + fh_len); else - return __cfh_to_dentry(sb, (struct ceph_nfs_confh *)fid->raw); + return __cfh_to_dentry(sb, (struct ceph_nfs_confh *)fid->raw, + fh_len); } /* @@ -216,6 +224,8 @@ static struct dentry *ceph_fh_to_parent(struct super_block *sb, if (fh_type == 1) return ERR_PTR(-ESTALE); + if (fh_len < sizeof(*cfh) / 4) + return ERR_PTR(-ESTALE); pr_debug("fh_to_parent %llx/%d\n", cfh->parent_ino, cfh->parent_name_hash); diff --git a/fs/gfs2/export.c b/fs/gfs2/export.c index fe9945f..5235d6e 100644 --- a/fs/gfs2/export.c +++ b/fs/gfs2/export.c @@ -167,6 +167,8 @@ static struct dentry *gfs2_fh_to_dentry(struct super_block *sb, struct fid *fid, case GFS2_SMALL_FH_SIZE: case GFS2_LARGE_FH_SIZE: case GFS2_OLD_FH_SIZE: + if (fh_len < GFS2_SMALL_FH_SIZE) + return NULL; this.no_formal_ino = ((u64)be32_to_cpu(fh[0])) << 32; this.no_formal_ino |= be32_to_cpu(fh[1]); this.no_addr = ((u64)be32_to_cpu(fh[2])) << 32; @@ -186,6 +188,8 @@ static struct dentry *gfs2_fh_to_parent(struct super_block *sb, struct fid *fid, switch (fh_type) { case GFS2_LARGE_FH_SIZE: case GFS2_OLD_FH_SIZE: + if (fh_len < GFS2_LARGE_FH_SIZE) + return NULL; parent.no_formal_ino = ((u64)be32_to_cpu(fh[4])) << 32; parent.no_formal_ino |= be32_to_cpu(fh[5]); parent.no_addr = ((u64)be32_to_cpu(fh[6])) << 32; diff --git a/fs/isofs/export.c b/fs/isofs/export.c index dd4687f..516eb21 100644 --- a/fs/isofs/export.c +++ b/fs/isofs/export.c @@ -179,7 +179,7 @@ static struct dentry *isofs_fh_to_parent(struct super_block *sb, { struct isofs_fid *ifid = (struct isofs_fid *)fid; - if (fh_type != 2) + if (fh_len < 2 || fh_type != 2) return NULL; return isofs_export_iget(sb, diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c index 4fd5bb3..0363aa4 100644 --- a/fs/reiserfs/inode.c +++ b/fs/reiserfs/inode.c @@ -1568,8 +1568,10 @@ struct dentry *reiserfs_fh_to_dentry(struct super_block *sb, struct fid *fid, reiserfs_warning(sb, "reiserfs-13077", "nfsd/reiserfs, fhtype=%d, len=%d - odd", fh_type, fh_len); - fh_type = 5; + fh_type = fh_len; } + if (fh_len < 2) + return NULL; return reiserfs_get_dentry(sb, fid->raw[0], fid->raw[1], (fh_type == 3 || fh_type >= 5) ? fid->raw[2] : 0); @@ -1578,6 +1580,8 @@ struct dentry *reiserfs_fh_to_dentry(struct super_block *sb, struct fid *fid, struct dentry *reiserfs_fh_to_parent(struct super_block *sb, struct fid *fid, int fh_len, int fh_type) { + if (fh_type > fh_len) + fh_type = fh_len; if (fh_type < 4) return NULL; diff --git a/fs/xfs/linux-2.6/xfs_export.c b/fs/xfs/linux-2.6/xfs_export.c index fed3f3c..844b22b 100644 --- a/fs/xfs/linux-2.6/xfs_export.c +++ b/fs/xfs/linux-2.6/xfs_export.c @@ -195,6 +195,9 @@ xfs_fs_fh_to_parent(struct super_block *sb, struct fid *fid, struct xfs_fid64 *fid64 = (struct xfs_fid64 *)fid; struct inode *inode = NULL; + if (fh_len < xfs_fileid_length(fileid_type)) + return NULL; + switch (fileid_type) { case FILEID_INO32_GEN_PARENT: inode = xfs_nfs_get_inode(sb, fid->i32.parent_ino, -- cgit v1.1 From 3a738a8aa3dd4fae6998b4cbc7a1043f44086035 Mon Sep 17 00:00:00 2001 From: Ian Kent Date: Thu, 11 Oct 2012 08:00:33 +0800 Subject: autofs4 - fix reset pending flag on mount fail commit 49999ab27eab6289a8e4f450e148bdab521361b2 upstream. In autofs4_d_automount(), if a mount fail occurs the AUTOFS_INF_PENDING mount pending flag is not cleared. One effect of this is when using the "browse" option, directory entry attributes show up with all "?"s due to the incorrect callback and subsequent failure return (when in fact no callback should be made). Signed-off-by: Ian Kent Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- fs/autofs4/root.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) (limited to 'fs') diff --git a/fs/autofs4/root.c b/fs/autofs4/root.c index f55ae23..790fa63 100644 --- a/fs/autofs4/root.c +++ b/fs/autofs4/root.c @@ -392,10 +392,12 @@ static struct vfsmount *autofs4_d_automount(struct path *path) ino->flags |= AUTOFS_INF_PENDING; spin_unlock(&sbi->fs_lock); status = autofs4_mount_wait(dentry); - if (status) - return ERR_PTR(status); spin_lock(&sbi->fs_lock); ino->flags &= ~AUTOFS_INF_PENDING; + if (status) { + spin_unlock(&sbi->fs_lock); + return ERR_PTR(status); + } } done: if (!(ino->flags & AUTOFS_INF_EXPIRING)) { -- cgit v1.1 From b88ac13a3f1ea5666872c343e54ffb3a9667d3f2 Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Wed, 11 Jul 2012 23:16:25 +0200 Subject: jbd: Fix assertion failure in commit code due to lacking transaction credits commit 09e05d4805e6c524c1af74e524e5d0528bb3fef3 upstream. ext3 users of data=journal mode with blocksize < pagesize were occasionally hitting assertion failure in journal_commit_transaction() checking whether the transaction has at least as many credits reserved as buffers attached. The core of the problem is that when a file gets truncated, buffers that still need checkpointing or that are attached to the committing transaction are left with buffer_mapped set. When this happens to buffers beyond i_size attached to a page stradding i_size, subsequent write extending the file will see these buffers and as they are mapped (but underlying blocks were freed) things go awry from here. The assertion failure just coincidentally (and in this case luckily as we would start corrupting filesystem) triggers due to journal_head not being properly cleaned up as well. Under some rare circumstances this bug could even hit data=ordered mode users. There the assertion won't trigger and we would end up corrupting the filesystem. We fix the problem by unmapping buffers if possible (in lots of cases we just need a buffer attached to a transaction as a place holder but it must not be written out anyway). And in one case, we just have to bite the bullet and wait for transaction commit to finish. Reviewed-by: Josef Bacik Signed-off-by: Jan Kara Signed-off-by: Greg Kroah-Hartman --- fs/jbd/commit.c | 45 +++++++++++++++++++++++++++--------- fs/jbd/transaction.c | 64 ++++++++++++++++++++++++++++++++++++---------------- 2 files changed, 78 insertions(+), 31 deletions(-) (limited to 'fs') diff --git a/fs/jbd/commit.c b/fs/jbd/commit.c index 72ffa97..dcd23f8 100644 --- a/fs/jbd/commit.c +++ b/fs/jbd/commit.c @@ -85,7 +85,12 @@ nope: static void release_data_buffer(struct buffer_head *bh) { if (buffer_freed(bh)) { + WARN_ON_ONCE(buffer_dirty(bh)); clear_buffer_freed(bh); + clear_buffer_mapped(bh); + clear_buffer_new(bh); + clear_buffer_req(bh); + bh->b_bdev = NULL; release_buffer_page(bh); } else put_bh(bh); @@ -840,17 +845,35 @@ restart_loop: * there's no point in keeping a checkpoint record for * it. */ - /* A buffer which has been freed while still being - * journaled by a previous transaction may end up still - * being dirty here, but we want to avoid writing back - * that buffer in the future after the "add to orphan" - * operation been committed, That's not only a performance - * gain, it also stops aliasing problems if the buffer is - * left behind for writeback and gets reallocated for another - * use in a different page. */ - if (buffer_freed(bh) && !jh->b_next_transaction) { - clear_buffer_freed(bh); - clear_buffer_jbddirty(bh); + /* + * A buffer which has been freed while still being journaled by + * a previous transaction. + */ + if (buffer_freed(bh)) { + /* + * If the running transaction is the one containing + * "add to orphan" operation (b_next_transaction != + * NULL), we have to wait for that transaction to + * commit before we can really get rid of the buffer. + * So just clear b_modified to not confuse transaction + * credit accounting and refile the buffer to + * BJ_Forget of the running transaction. If the just + * committed transaction contains "add to orphan" + * operation, we can completely invalidate the buffer + * now. We are rather throughout in that since the + * buffer may be still accessible when blocksize < + * pagesize and it is attached to the last partial + * page. + */ + jh->b_modified = 0; + if (!jh->b_next_transaction) { + clear_buffer_freed(bh); + clear_buffer_jbddirty(bh); + clear_buffer_mapped(bh); + clear_buffer_new(bh); + clear_buffer_req(bh); + bh->b_bdev = NULL; + } } if (buffer_jbddirty(bh)) { diff --git a/fs/jbd/transaction.c b/fs/jbd/transaction.c index f7ee81a..b0161a6 100644 --- a/fs/jbd/transaction.c +++ b/fs/jbd/transaction.c @@ -1837,15 +1837,16 @@ static int __dispose_buffer(struct journal_head *jh, transaction_t *transaction) * We're outside-transaction here. Either or both of j_running_transaction * and j_committing_transaction may be NULL. */ -static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh) +static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh, + int partial_page) { transaction_t *transaction; struct journal_head *jh; int may_free = 1; - int ret; BUFFER_TRACE(bh, "entry"); +retry: /* * It is safe to proceed here without the j_list_lock because the * buffers cannot be stolen by try_to_free_buffers as long as we are @@ -1873,10 +1874,18 @@ static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh) * clear the buffer dirty bit at latest at the moment when the * transaction marking the buffer as freed in the filesystem * structures is committed because from that moment on the - * buffer can be reallocated and used by a different page. + * block can be reallocated and used by a different page. * Since the block hasn't been freed yet but the inode has * already been added to orphan list, it is safe for us to add * the buffer to BJ_Forget list of the newest transaction. + * + * Also we have to clear buffer_mapped flag of a truncated buffer + * because the buffer_head may be attached to the page straddling + * i_size (can happen only when blocksize < pagesize) and thus the + * buffer_head can be reused when the file is extended again. So we end + * up keeping around invalidated buffers attached to transactions' + * BJ_Forget list just to stop checkpointing code from cleaning up + * the transaction this buffer was modified in. */ transaction = jh->b_transaction; if (transaction == NULL) { @@ -1903,13 +1912,9 @@ static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh) * committed, the buffer won't be needed any * longer. */ JBUFFER_TRACE(jh, "checkpointed: add to BJ_Forget"); - ret = __dispose_buffer(jh, + may_free = __dispose_buffer(jh, journal->j_running_transaction); - journal_put_journal_head(jh); - spin_unlock(&journal->j_list_lock); - jbd_unlock_bh_state(bh); - spin_unlock(&journal->j_state_lock); - return ret; + goto zap_buffer; } else { /* There is no currently-running transaction. So the * orphan record which we wrote for this file must have @@ -1917,13 +1922,9 @@ static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh) * the committing transaction, if it exists. */ if (journal->j_committing_transaction) { JBUFFER_TRACE(jh, "give to committing trans"); - ret = __dispose_buffer(jh, + may_free = __dispose_buffer(jh, journal->j_committing_transaction); - journal_put_journal_head(jh); - spin_unlock(&journal->j_list_lock); - jbd_unlock_bh_state(bh); - spin_unlock(&journal->j_state_lock); - return ret; + goto zap_buffer; } else { /* The orphan record's transaction has * committed. We can cleanse this buffer */ @@ -1944,10 +1945,24 @@ static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh) } /* * The buffer is committing, we simply cannot touch - * it. So we just set j_next_transaction to the - * running transaction (if there is one) and mark - * buffer as freed so that commit code knows it should - * clear dirty bits when it is done with the buffer. + * it. If the page is straddling i_size we have to wait + * for commit and try again. + */ + if (partial_page) { + tid_t tid = journal->j_committing_transaction->t_tid; + + journal_put_journal_head(jh); + spin_unlock(&journal->j_list_lock); + jbd_unlock_bh_state(bh); + spin_unlock(&journal->j_state_lock); + log_wait_commit(journal, tid); + goto retry; + } + /* + * OK, buffer won't be reachable after truncate. We just set + * j_next_transaction to the running transaction (if there is + * one) and mark buffer as freed so that commit code knows it + * should clear dirty bits when it is done with the buffer. */ set_buffer_freed(bh); if (journal->j_running_transaction && buffer_jbddirty(bh)) @@ -1970,6 +1985,14 @@ static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh) } zap_buffer: + /* + * This is tricky. Although the buffer is truncated, it may be reused + * if blocksize < pagesize and it is attached to the page straddling + * EOF. Since the buffer might have been added to BJ_Forget list of the + * running transaction, journal_get_write_access() won't clear + * b_modified and credit accounting gets confused. So clear b_modified + * here. */ + jh->b_modified = 0; journal_put_journal_head(jh); zap_buffer_no_jh: spin_unlock(&journal->j_list_lock); @@ -2018,7 +2041,8 @@ void journal_invalidatepage(journal_t *journal, if (offset <= curr_off) { /* This block is wholly outside the truncation point */ lock_buffer(bh); - may_free &= journal_unmap_buffer(journal, bh); + may_free &= journal_unmap_buffer(journal, bh, + offset > 0); unlock_buffer(bh); } curr_off = next_off; -- cgit v1.1 From b08d7dbc33f4821d23de1ec921146aca004f46ee Mon Sep 17 00:00:00 2001 From: Nikola Pajkovsky Date: Wed, 15 Aug 2012 00:38:08 +0200 Subject: udf: fix retun value on error path in udf_load_logicalvol commit 68766a2edcd5cd744262a70a2f67a320ac944760 upstream. In case we detect a problem and bail out, we fail to set "ret" to a nonzero value, and udf_load_logicalvol will mistakenly report success. Signed-off-by: Nikola Pajkovsky Signed-off-by: Jan Kara Signed-off-by: Greg Kroah-Hartman --- fs/udf/super.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/udf/super.c b/fs/udf/super.c index a8e867a..b0c7b53 100644 --- a/fs/udf/super.c +++ b/fs/udf/super.c @@ -1316,6 +1316,7 @@ static int udf_load_logicalvol(struct super_block *sb, sector_t block, udf_error(sb, __func__, "error loading logical volume descriptor: " "Partition table too long (%u > %lu)\n", table_len, sb->s_blocksize - sizeof(*lvd)); + ret = 1; goto out_bh; } @@ -1360,8 +1361,10 @@ static int udf_load_logicalvol(struct super_block *sb, sector_t block, UDF_ID_SPARABLE, strlen(UDF_ID_SPARABLE))) { if (udf_load_sparable_map(sb, map, - (struct sparablePartitionMap *)gpm) < 0) + (struct sparablePartitionMap *)gpm) < 0) { + ret = 1; goto out_bh; + } } else if (!strncmp(upm2->partIdent.ident, UDF_ID_METADATA, strlen(UDF_ID_METADATA))) { -- cgit v1.1 From c303f82bbe250dc60ef94d1f282a49118fabcaed Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Sat, 13 Oct 2012 00:30:28 -0400 Subject: NLM: nlm_lookup_file() may return NLMv4-specific error codes commit cd0b16c1c3cda12dbed1f8de8f1a9b0591990724 upstream. If the filehandle is stale, or open access is denied for some reason, nlm_fopen() may return one of the NLMv4-specific error codes nlm4_stale_fh or nlm4_failed. These get passed right through nlm_lookup_file(), and so when nlmsvc_retrieve_args() calls the latter, it needs to filter the result through the cast_status() machinery. Failure to do so, will trigger the BUG_ON() in encode_nlm_stat... Signed-off-by: Trond Myklebust Reported-by: Larry McVoy Signed-off-by: J. Bruce Fields Signed-off-by: Greg Kroah-Hartman --- fs/lockd/clntxdr.c | 2 +- fs/lockd/svcproc.c | 3 ++- 2 files changed, 3 insertions(+), 2 deletions(-) (limited to 'fs') diff --git a/fs/lockd/clntxdr.c b/fs/lockd/clntxdr.c index 36057ce..6e2a2d5 100644 --- a/fs/lockd/clntxdr.c +++ b/fs/lockd/clntxdr.c @@ -223,7 +223,7 @@ static void encode_nlm_stat(struct xdr_stream *xdr, { __be32 *p; - BUG_ON(be32_to_cpu(stat) > NLM_LCK_DENIED_GRACE_PERIOD); + WARN_ON_ONCE(be32_to_cpu(stat) > NLM_LCK_DENIED_GRACE_PERIOD); p = xdr_reserve_space(xdr, 4); *p = stat; } diff --git a/fs/lockd/svcproc.c b/fs/lockd/svcproc.c index d27aab1..d413af3 100644 --- a/fs/lockd/svcproc.c +++ b/fs/lockd/svcproc.c @@ -67,7 +67,8 @@ nlmsvc_retrieve_args(struct svc_rqst *rqstp, struct nlm_args *argp, /* Obtain file pointer. Not used by FREE_ALL call. */ if (filp != NULL) { - if ((error = nlm_lookup_file(rqstp, &file, &lock->fh)) != 0) + error = cast_status(nlm_lookup_file(rqstp, &file, &lock->fh)); + if (error != 0) goto no_locks; *filp = file; -- cgit v1.1 From 0fc01fa3b5f98b7add96834efc56a09b398429b3 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Tue, 18 Sep 2012 13:37:18 +0400 Subject: Revert: lockd: use rpc client's cl_nodename for id encoding This reverts 12d63702c53bc2230dfc997e91ca891f39cb6446 which was commit 303a7ce92064c285a04c870f2dc0192fdb2968cb upstream. Taking hostname from uts namespace if not safe, because this cuold be performind during umount operation on child reaper death. And in this case current->nsproxy is NULL already. Signed-off-by: Greg Kroah-Hartman Cc: Stanislav Kinsbursky Cc: Trond Myklebust --- fs/lockd/mon.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) (limited to 'fs') diff --git a/fs/lockd/mon.c b/fs/lockd/mon.c index df753a1..23d7451 100644 --- a/fs/lockd/mon.c +++ b/fs/lockd/mon.c @@ -40,7 +40,6 @@ struct nsm_args { u32 proc; char *mon_name; - char *nodename; }; struct nsm_res { @@ -94,7 +93,6 @@ static int nsm_mon_unmon(struct nsm_handle *nsm, u32 proc, struct nsm_res *res) .vers = 3, .proc = NLMPROC_NSM_NOTIFY, .mon_name = nsm->sm_mon_name, - .nodename = utsname()->nodename, }; struct rpc_message msg = { .rpc_argp = &args, @@ -431,7 +429,7 @@ static void encode_my_id(struct xdr_stream *xdr, const struct nsm_args *argp) { __be32 *p; - encode_nsm_string(xdr, argp->nodename); + encode_nsm_string(xdr, utsname()->nodename); p = xdr_reserve_space(xdr, 4 + 4 + 4); *p++ = cpu_to_be32(argp->prog); *p++ = cpu_to_be32(argp->vers); -- cgit v1.1 From ab41bb2e47540a714271dfa2943566b1fd4afa9c Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Thu, 25 Oct 2012 13:38:16 -0700 Subject: fs/compat_ioctl.c: VIDEO_SET_SPU_PALETTE missing error check commit 12176503366885edd542389eed3aaf94be163fdb upstream. The compat ioctl for VIDEO_SET_SPU_PALETTE was missing an error check while converting ioctl arguments. This could lead to leaking kernel stack contents into userspace. Patch extracted from existing fix in grsecurity. Signed-off-by: Kees Cook Cc: David Miller Cc: Brad Spengler Cc: PaX Team Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- fs/compat_ioctl.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'fs') diff --git a/fs/compat_ioctl.c b/fs/compat_ioctl.c index 61abb63..3deb58d 100644 --- a/fs/compat_ioctl.c +++ b/fs/compat_ioctl.c @@ -208,6 +208,8 @@ static int do_video_set_spu_palette(unsigned int fd, unsigned int cmd, err = get_user(palp, &up->palette); err |= get_user(length, &up->length); + if (err) + return -EFAULT; up_native = compat_alloc_user_space(sizeof(struct video_spu_palette)); err = put_user(compat_ptr(palp), &up_native->palette); -- cgit v1.1 From e17ce2ec38fd766e8f9707701e47f4332d8bb630 Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Sat, 29 Sep 2012 22:23:19 +0200 Subject: sysfs: sysfs_pathname/sysfs_add_one: Use strlcat() instead of strcat() commit 66081a72517a131430dcf986775f3268aafcb546 upstream. The warning check for duplicate sysfs entries can cause a buffer overflow when printing the warning, as strcat() doesn't check buffer sizes. Use strlcat() instead. Since strlcat() doesn't return a pointer to the passed buffer, unlike strcat(), I had to convert the nested concatenation in sysfs_add_one() to an admittedly more obscure comma operator construct, to avoid emitting code for the concatenation if CONFIG_BUG is disabled. Signed-off-by: Geert Uytterhoeven Signed-off-by: Greg Kroah-Hartman --- fs/sysfs/dir.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) (limited to 'fs') diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c index ea9120a..567b3db 100644 --- a/fs/sysfs/dir.c +++ b/fs/sysfs/dir.c @@ -404,20 +404,18 @@ int __sysfs_add_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd) /** * sysfs_pathname - return full path to sysfs dirent * @sd: sysfs_dirent whose path we want - * @path: caller allocated buffer + * @path: caller allocated buffer of size PATH_MAX * * Gives the name "/" to the sysfs_root entry; any path returned * is relative to wherever sysfs is mounted. - * - * XXX: does no error checking on @path size */ static char *sysfs_pathname(struct sysfs_dirent *sd, char *path) { if (sd->s_parent) { sysfs_pathname(sd->s_parent, path); - strcat(path, "/"); + strlcat(path, "/", PATH_MAX); } - strcat(path, sd->s_name); + strlcat(path, sd->s_name, PATH_MAX); return path; } @@ -450,9 +448,11 @@ int sysfs_add_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd) char *path = kzalloc(PATH_MAX, GFP_KERNEL); WARN(1, KERN_WARNING "sysfs: cannot create duplicate filename '%s'\n", - (path == NULL) ? sd->s_name : - strcat(strcat(sysfs_pathname(acxt->parent_sd, path), "/"), - sd->s_name)); + (path == NULL) ? sd->s_name + : (sysfs_pathname(acxt->parent_sd, path), + strlcat(path, "/", PATH_MAX), + strlcat(path, sd->s_name, PATH_MAX), + path)); kfree(path); } -- cgit v1.1 From e4648b149c10256dbc4d4ce5d1c052e733d224f5 Mon Sep 17 00:00:00 2001 From: Scott Mayhew Date: Tue, 16 Oct 2012 13:22:19 -0400 Subject: nfsv3: Make v3 mounts fail with ETIMEDOUTs instead EIO on mountd timeouts commit acce94e68a0f346115fd41cdc298197d2d5a59ad upstream. In very busy v3 environment, rpc.mountd can respond to the NULL procedure but not the MNT procedure in a timely manner causing the MNT procedure to time out. The problem is the mount system call returns EIO which causes the mount to fail, instead of ETIMEDOUT, which would cause the mount to be retried. This patch sets the RPC_TASK_SOFT|RPC_TASK_TIMEOUT flags to the rpc_call_sync() call in nfs_mount() which causes ETIMEDOUT to be returned on timed out connections. Signed-off-by: Steve Dickson Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman --- fs/nfs/mount_clnt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/nfs/mount_clnt.c b/fs/nfs/mount_clnt.c index d4c2d6b..3d93216 100644 --- a/fs/nfs/mount_clnt.c +++ b/fs/nfs/mount_clnt.c @@ -181,7 +181,7 @@ int nfs_mount(struct nfs_mount_request *info) else msg.rpc_proc = &mnt_clnt->cl_procinfo[MOUNTPROC_MNT]; - status = rpc_call_sync(mnt_clnt, &msg, 0); + status = rpc_call_sync(mnt_clnt, &msg, RPC_TASK_SOFT|RPC_TASK_TIMEOUT); rpc_shutdown_client(mnt_clnt); if (status < 0) -- cgit v1.1 From 6fbd3cdb9365377e3d0fb5b4c86e16d71678103b Mon Sep 17 00:00:00 2001 From: Ben Hutchings Date: Sun, 21 Oct 2012 19:23:52 +0100 Subject: nfs: Show original device name verbatim in /proc/*/mount{s,info} commit 97a54868262da1629a3e65121e65b8e8c4419d9f upstream. Since commit c7f404b ('vfs: new superblock methods to override /proc/*/mount{s,info}'), nfs_path() is used to generate the mounted device name reported back to userland. nfs_path() always generates a trailing slash when the given dentry is the root of an NFS mount, but userland may expect the original device name to be returned verbatim (as it used to be). Make this canonicalisation optional and change the callers accordingly. [jrnieder@gmail.com: use flag instead of bool argument] Reported-and-tested-by: Chris Hiestand Reference: http://bugs.debian.org/669314 Signed-off-by: Ben Hutchings Signed-off-by: Jonathan Nieder Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman --- fs/nfs/internal.h | 5 +++-- fs/nfs/namespace.c | 19 ++++++++++++++----- fs/nfs/nfs4namespace.c | 3 ++- fs/nfs/super.c | 2 +- 4 files changed, 20 insertions(+), 9 deletions(-) (limited to 'fs') diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h index 4f10d81..399a505 100644 --- a/fs/nfs/internal.h +++ b/fs/nfs/internal.h @@ -274,8 +274,9 @@ extern void nfs_sb_active(struct super_block *sb); extern void nfs_sb_deactive(struct super_block *sb); /* namespace.c */ +#define NFS_PATH_CANONICAL 1 extern char *nfs_path(char **p, struct dentry *dentry, - char *buffer, ssize_t buflen); + char *buffer, ssize_t buflen, unsigned flags); extern struct vfsmount *nfs_d_automount(struct path *path); /* getroot.c */ @@ -349,7 +350,7 @@ static inline char *nfs_devname(struct dentry *dentry, char *buffer, ssize_t buflen) { char *dummy; - return nfs_path(&dummy, dentry, buffer, buflen); + return nfs_path(&dummy, dentry, buffer, buflen, NFS_PATH_CANONICAL); } /* diff --git a/fs/nfs/namespace.c b/fs/nfs/namespace.c index 1f063ba..d6122ef 100644 --- a/fs/nfs/namespace.c +++ b/fs/nfs/namespace.c @@ -37,6 +37,7 @@ static struct vfsmount *nfs_do_submount(struct dentry *dentry, * @dentry - pointer to dentry * @buffer - result buffer * @buflen - length of buffer + * @flags - options (see below) * * Helper function for constructing the server pathname * by arbitrary hashed dentry. @@ -44,8 +45,14 @@ static struct vfsmount *nfs_do_submount(struct dentry *dentry, * This is mainly for use in figuring out the path on the * server side when automounting on top of an existing partition * and in generating /proc/mounts and friends. + * + * Supported flags: + * NFS_PATH_CANONICAL: ensure there is exactly one slash after + * the original device (export) name + * (if unset, the original name is returned verbatim) */ -char *nfs_path(char **p, struct dentry *dentry, char *buffer, ssize_t buflen) +char *nfs_path(char **p, struct dentry *dentry, char *buffer, ssize_t buflen, + unsigned flags) { char *end; int namelen; @@ -78,7 +85,7 @@ rename_retry: rcu_read_unlock(); goto rename_retry; } - if (*end != '/') { + if ((flags & NFS_PATH_CANONICAL) && *end != '/') { if (--buflen < 0) { spin_unlock(&dentry->d_lock); rcu_read_unlock(); @@ -95,9 +102,11 @@ rename_retry: return end; } namelen = strlen(base); - /* Strip off excess slashes in base string */ - while (namelen > 0 && base[namelen - 1] == '/') - namelen--; + if (flags & NFS_PATH_CANONICAL) { + /* Strip off excess slashes in base string */ + while (namelen > 0 && base[namelen - 1] == '/') + namelen--; + } buflen -= namelen; if (buflen < 0) { spin_unlock(&dentry->d_lock); diff --git a/fs/nfs/nfs4namespace.c b/fs/nfs/nfs4namespace.c index bb80c49..96f2b67 100644 --- a/fs/nfs/nfs4namespace.c +++ b/fs/nfs/nfs4namespace.c @@ -57,7 +57,8 @@ Elong: static char *nfs4_path(struct dentry *dentry, char *buffer, ssize_t buflen) { char *limit; - char *path = nfs_path(&limit, dentry, buffer, buflen); + char *path = nfs_path(&limit, dentry, buffer, buflen, + NFS_PATH_CANONICAL); if (!IS_ERR(path)) { char *colon = strchr(path, ':'); if (colon && colon < limit) diff --git a/fs/nfs/super.c b/fs/nfs/super.c index a1f3d6e..23f0223 100644 --- a/fs/nfs/super.c +++ b/fs/nfs/super.c @@ -763,7 +763,7 @@ static int nfs_show_devname(struct seq_file *m, struct vfsmount *mnt) int err = 0; if (!page) return -ENOMEM; - devname = nfs_path(&dummy, mnt->mnt_root, page, PAGE_SIZE); + devname = nfs_path(&dummy, mnt->mnt_root, page, PAGE_SIZE, 0); if (IS_ERR(devname)) err = PTR_ERR(devname); else -- cgit v1.1 From f354d0c0ca37522c2efcced954ade7cdca479a98 Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Mon, 29 Oct 2012 18:53:23 -0400 Subject: NFSv4: nfs4_locku_done must release the sequence id commit 2b1bc308f492589f7d49012ed24561534ea2be8c upstream. If the state recovery machinery is triggered by the call to nfs4_async_handle_error() then we can deadlock. Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman --- fs/nfs/nfs4proc.c | 1 + 1 file changed, 1 insertion(+) (limited to 'fs') diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index c722905..d5d5400 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -4116,6 +4116,7 @@ static void nfs4_locku_done(struct rpc_task *task, void *data) nfs_restart_rpc(task, calldata->server->nfs_client); } + nfs_release_seqid(calldata->arg.seqid); } static void nfs4_locku_prepare(struct rpc_task *task, void *data) -- cgit v1.1 From 110d3a25cc089af6a8b7ddee3f01e65aeb55ae49 Mon Sep 17 00:00:00 2001 From: "J. Bruce Fields" Date: Tue, 12 Jun 2012 16:54:16 -0400 Subject: nfsd: add get_uint for u32's commit a007c4c3e943ecc054a806c259d95420a188754b upstream. I don't think there's a practical difference for the range of values these interfaces should see, but it would be safer to be unambiguous. Signed-off-by: J. Bruce Fields Cc: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- fs/nfsd/export.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'fs') diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c index 4b470f6..d7c4f02 100644 --- a/fs/nfsd/export.c +++ b/fs/nfsd/export.c @@ -403,7 +403,7 @@ fsloc_parse(char **mesg, char *buf, struct nfsd4_fs_locations *fsloc) int migrated, i, err; /* listsize */ - err = get_int(mesg, &fsloc->locations_count); + err = get_uint(mesg, &fsloc->locations_count); if (err) return err; if (fsloc->locations_count > MAX_FS_LOCATIONS) @@ -461,7 +461,7 @@ static int secinfo_parse(char **mesg, char *buf, struct svc_export *exp) return -EINVAL; for (f = exp->ex_flavors; f < exp->ex_flavors + listsize; f++) { - err = get_int(mesg, &f->pseudoflavor); + err = get_uint(mesg, &f->pseudoflavor); if (err) return err; /* @@ -470,7 +470,7 @@ static int secinfo_parse(char **mesg, char *buf, struct svc_export *exp) * problem at export time instead of when a client fails * to authenticate. */ - err = get_int(mesg, &f->flags); + err = get_uint(mesg, &f->flags); if (err) return err; /* Only some flags are allowed to differ between flavors: */ -- cgit v1.1 From 095a51831f6e6956b80bd5f921f412a6530f1bc2 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Wed, 31 Oct 2012 12:16:01 +1100 Subject: NFS: fix bug in legacy DNS resolver. commit 8d96b10639fb402357b75b055b1e82a65ff95050 upstream. The DNS resolver's use of the sunrpc cache involves a 'ttl' number (relative) rather that a timeout (absolute). This confused me when I wrote commit c5b29f885afe890f953f7f23424045cdad31d3e4 "sunrpc: use seconds since boot in expiry cache" and I managed to break it. The effect is that any TTL is interpreted as 0, and nothing useful gets into the cache. This patch removes the use of get_expiry() - which really expects an expiry time - and uses get_uint() instead, treating the int correctly as a ttl. This fixes a regression that has been present since 2.6.37, causing certain NFS accesses in certain environments to incorrectly fail. Reported-by: Chuck Lever Tested-by: Chuck Lever Signed-off-by: NeilBrown Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman --- fs/nfs/dns_resolve.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'fs') diff --git a/fs/nfs/dns_resolve.c b/fs/nfs/dns_resolve.c index a6e711a..ee02db5 100644 --- a/fs/nfs/dns_resolve.c +++ b/fs/nfs/dns_resolve.c @@ -213,7 +213,7 @@ static int nfs_dns_parse(struct cache_detail *cd, char *buf, int buflen) { char buf1[NFS_DNS_HOSTNAME_MAXLEN+1]; struct nfs_dns_ent key, *item; - unsigned long ttl; + unsigned int ttl; ssize_t len; int ret = -EINVAL; @@ -236,7 +236,8 @@ static int nfs_dns_parse(struct cache_detail *cd, char *buf, int buflen) key.namelen = len; memset(&key.h, 0, sizeof(key.h)); - ttl = get_expiry(&buf); + if (get_uint(&buf, &ttl) < 0) + goto out; if (ttl == 0) goto out; key.h.expiry_time = ttl + seconds_since_boot(); -- cgit v1.1 From 4d02840c82b479b4a4c316bd279e1a83649e59ed Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Wed, 22 Aug 2012 16:08:17 -0400 Subject: NFS: Fix Oopses in nfs_lookup_revalidate and nfs4_lookup_revalidate [Fixed upstream as part of 0b728e1911c, but that's a much larger patch, this is only the nfs portion backported as needed.] Fix the following Oops in 3.5.1: BUG: unable to handle kernel NULL pointer dereference at 0000000000000038 IP: [] nfs_lookup_revalidate+0x2d/0x480 [nfs] PGD 337c63067 PUD 0 Oops: 0000 [#1] SMP CPU 5 Modules linked in: nfs fscache nfsd lockd nfs_acl auth_rpcgss sunrpc af_packet binfmt_misc cpufreq_conservative cpufreq_userspace cpufreq_powersave dm_mod acpi_cpufreq mperf coretemp gpio_ich kvm_intel joydev kvm ioatdma hid_generic igb lpc_ich i7core_edac edac_core ptp serio_raw dca pcspkr i2c_i801 mfd_core sg pps_core usbhid crc32c_intel microcode button autofs4 uhci_hcd ttm drm_kms_helper drm i2c_algo_bit sysimgblt sysfillrect syscopyarea ehci_hcd usbcore usb_common scsi_dh_rdac scsi_dh_emc scsi_dh_hp_sw scsi_dh_alua scsi_dh edd fan ata_piix thermal processor thermal_sys Pid: 30431, comm: java Not tainted 3.5.1-2-default #1 Supermicro X8DTT/X8DTT RIP: 0010:[] [] nfs_lookup_revalidate+0x2d/0x480 [nfs] RSP: 0018:ffff8801b418bd38 EFLAGS: 00010292 RAX: 00000000fffffff6 RBX: ffff88032016d800 RCX: 0000000000000020 RDX: ffffffff00000000 RSI: 0000000000000000 RDI: ffff8801824a7b00 RBP: ffff8801b418bdf8 R08: 7fffff0034323030 R09: fffffffff04c03ed R10: ffff8801824a7b00 R11: 0000000000000002 R12: ffff8801824a7b00 R13: ffff8801824a7b00 R14: 0000000000000000 R15: ffff8803201725d0 FS: 00002b53a46cb700(0000) GS:ffff88033fc20000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000038 CR3: 000000020a426000 CR4: 00000000000007e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process java (pid: 30431, threadinfo ffff8801b418a000, task ffff8801b5d20600) Stack: ffff8801b418be44 ffff88032016d800 ffff8801b418bdf8 0000000000000000 ffff8801824a7b00 ffff8801b418bdd7 ffff8803201725d0 ffffffff8116a9c0 ffff8801b5c38dc0 0000000000000007 ffff88032016d800 0000000000000000 Call Trace: [] lookup_dcache+0x80/0xe0 [] __lookup_hash+0x23/0x90 [] lookup_one_len+0xc5/0x100 [] nfs_sillyrename+0xe3/0x210 [nfs] [] vfs_unlink.part.25+0x7f/0xe0 [] do_unlinkat+0x1ac/0x1d0 [] system_call_fastpath+0x16/0x1b [<00002b5348b5f527>] 0x2b5348b5f526 Code: ec 38 b8 f6 ff ff ff 4c 89 64 24 18 4c 89 74 24 28 49 89 fc 48 89 5c 24 08 48 89 6c 24 10 49 89 f6 4c 89 6c 24 20 4c 89 7c 24 30 46 38 40 0f 85 d1 00 00 00 e8 c4 c4 df e0 48 8b 58 30 49 89 RIP [] nfs_lookup_revalidate+0x2d/0x480 [nfs] RSP CR2: 0000000000000038 ---[ end trace 845113ed191985dd ]--- This Oops affects 3.5 kernels and older, and is due to lookup_one_len() calling down to the dentry revalidation code with a NULL pointer to struct nameidata. It is fixed upstream by commit 0b728e1911c (stop passing nameidata * to ->d_revalidate()) Reported-by: Richard Ems Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman --- fs/nfs/dir.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'fs') diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c index 462a006..9515bc5 100644 --- a/fs/nfs/dir.c +++ b/fs/nfs/dir.c @@ -1100,7 +1100,7 @@ static int nfs_lookup_revalidate(struct dentry *dentry, struct nameidata *nd) struct nfs_fattr *fattr = NULL; int error; - if (nd->flags & LOOKUP_RCU) + if (nd && (nd->flags & LOOKUP_RCU)) return -ECHILD; parent = dget_parent(dentry); @@ -1498,7 +1498,7 @@ static int nfs_open_revalidate(struct dentry *dentry, struct nameidata *nd) struct nfs_open_context *ctx; int openflags, ret = 0; - if (nd->flags & LOOKUP_RCU) + if (nd && (nd->flags & LOOKUP_RCU)) return -ECHILD; inode = dentry->d_inode; -- cgit v1.1 From 11b34bb13aebb075ba47e3a292443ab3cd9638f7 Mon Sep 17 00:00:00 2001 From: Dave Chinner Date: Fri, 2 Nov 2012 11:38:44 +1100 Subject: xfs: fix reading of wrapped log data commit 6ce377afd1755eae5c93410ca9a1121dfead7b87 upstream. Commit 4439647 ("xfs: reset buffer pointers before freeing them") in 3.0-rc1 introduced a regression when recovering log buffers that wrapped around the end of log. The second part of the log buffer at the start of the physical log was being read into the header buffer rather than the data buffer, and hence recovery was seeing garbage in the data buffer when it got to the region of the log buffer that was incorrectly read. Reported-by: Torsten Kaiser Signed-off-by: Dave Chinner Reviewed-by: Christoph Hellwig Reviewed-by: Mark Tinguely Signed-off-by: Ben Myers Signed-off-by: Greg Kroah-Hartman --- fs/xfs/xfs_log_recover.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c index b75fd67..6a5a1af 100644 --- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -3512,7 +3512,7 @@ xlog_do_recovery_pass( * - order is important. */ error = xlog_bread_offset(log, 0, - bblks - split_bblks, hbp, + bblks - split_bblks, dbp, offset + BBTOB(split_bblks)); if (error) goto bread_err2; -- cgit v1.1 From 5a5ffbe9623739953af7cbec40ae8d33472cb638 Mon Sep 17 00:00:00 2001 From: Eric Paris Date: Thu, 8 Nov 2012 15:53:37 -0800 Subject: fanotify: fix missing break commit 848561d368751a1c0f679b9f045a02944506a801 upstream. Anders Blomdell noted in 2010 that Fanotify lost events and provided a test case. Eric Paris confirmed it was a bug and posted a fix to the list https://groups.google.com/forum/?fromgroups=#!topic/linux.kernel/RrJfTfyW2BE but never applied it. Repeated attempts over time to actually get him to apply it have never had a reply from anyone who has raised it So apply it anyway Signed-off-by: Alan Cox Reported-by: Anders Blomdell Cc: Eric Paris Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- fs/notify/fanotify/fanotify.c | 1 + 1 file changed, 1 insertion(+) (limited to 'fs') diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index f35794b..a506360 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -21,6 +21,7 @@ static bool should_merge(struct fsnotify_event *old, struct fsnotify_event *new) if ((old->path.mnt == new->path.mnt) && (old->path.dentry == new->path.dentry)) return true; + break; case (FSNOTIFY_EVENT_NONE): return true; default: -- cgit v1.1 From ad33dc05bfe7c600d898e1f7857d1fd51308ba09 Mon Sep 17 00:00:00 2001 From: Artem Bityutskiy Date: Tue, 9 Oct 2012 16:20:15 +0300 Subject: UBIFS: fix mounting problems after power cuts commit a28ad42a4a0c6f302f488f26488b8b37c9b30024 upstream. This is a bugfix for a problem with the following symptoms: 1. A power cut happens 2. After reboot, we try to mount UBIFS 3. Mount fails with "No space left on device" error message UBIFS complains like this: UBIFS error (pid 28225): grab_empty_leb: could not find an empty LEB The root cause of this problem is that when we mount, not all LEBs are categorized. Only those which were read are. However, the 'ubifs_find_free_leb_for_idx()' function assumes that all LEBs were categorized and 'c->freeable_cnt' is valid, which is a false assumption. This patch fixes the problem by teaching 'ubifs_find_free_leb_for_idx()' to always fall back to LPT scanning if no freeable LEBs were found. This problem was reported by few people in the past, but Brent Taylor was able to reproduce it and send me a flash image which cannot be mounted, which made it easy to hunt the bug. Kudos to Brent. Reported-by: Brent Taylor Signed-off-by: Artem Bityutskiy Signed-off-by: Greg Kroah-Hartman --- fs/ubifs/find.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) (limited to 'fs') diff --git a/fs/ubifs/find.c b/fs/ubifs/find.c index 2559d17..5dc48ca 100644 --- a/fs/ubifs/find.c +++ b/fs/ubifs/find.c @@ -681,8 +681,16 @@ int ubifs_find_free_leb_for_idx(struct ubifs_info *c) if (!lprops) { lprops = ubifs_fast_find_freeable(c); if (!lprops) { - ubifs_assert(c->freeable_cnt == 0); - if (c->lst.empty_lebs - c->lst.taken_empty_lebs > 0) { + /* + * The first condition means the following: go scan the + * LPT if there are uncategorized lprops, which means + * there may be freeable LEBs there (UBIFS does not + * store the information about freeable LEBs in the + * master node). + */ + if (c->in_a_category_cnt != c->main_lebs || + c->lst.empty_lebs - c->lst.taken_empty_lebs > 0) { + ubifs_assert(c->freeable_cnt == 0); lprops = scan_for_leb_for_idx(c); if (IS_ERR(lprops)) { err = PTR_ERR(lprops); -- cgit v1.1 From f174b32dc1a6912576d75d77afd00096f44878d4 Mon Sep 17 00:00:00 2001 From: Artem Bityutskiy Date: Wed, 10 Oct 2012 10:55:28 +0300 Subject: UBIFS: introduce categorized lprops counter commit 98a1eebda3cb2a84ecf1f219bb3a95769033d1bf upstream. This commit is a preparation for a subsequent bugfix. We introduce a counter for categorized lprops. Signed-off-by: Artem Bityutskiy Signed-off-by: Greg Kroah-Hartman --- fs/ubifs/lprops.c | 6 ++++++ fs/ubifs/ubifs.h | 3 +++ 2 files changed, 9 insertions(+) (limited to 'fs') diff --git a/fs/ubifs/lprops.c b/fs/ubifs/lprops.c index 667884f..ae6f74a 100644 --- a/fs/ubifs/lprops.c +++ b/fs/ubifs/lprops.c @@ -300,8 +300,11 @@ void ubifs_add_to_cat(struct ubifs_info *c, struct ubifs_lprops *lprops, default: ubifs_assert(0); } + lprops->flags &= ~LPROPS_CAT_MASK; lprops->flags |= cat; + c->in_a_category_cnt += 1; + ubifs_assert(c->in_a_category_cnt <= c->main_lebs); } /** @@ -334,6 +337,9 @@ static void ubifs_remove_from_cat(struct ubifs_info *c, default: ubifs_assert(0); } + + c->in_a_category_cnt -= 1; + ubifs_assert(c->in_a_category_cnt >= 0); } /** diff --git a/fs/ubifs/ubifs.h b/fs/ubifs/ubifs.h index f79983d..62d50a9 100644 --- a/fs/ubifs/ubifs.h +++ b/fs/ubifs/ubifs.h @@ -1187,6 +1187,8 @@ struct ubifs_debug_info; * @freeable_list: list of freeable non-index LEBs (free + dirty == @leb_size) * @frdi_idx_list: list of freeable index LEBs (free + dirty == @leb_size) * @freeable_cnt: number of freeable LEBs in @freeable_list + * @in_a_category_cnt: count of lprops which are in a certain category, which + * basically meants that they were loaded from the flash * * @ltab_lnum: LEB number of LPT's own lprops table * @ltab_offs: offset of LPT's own lprops table @@ -1416,6 +1418,7 @@ struct ubifs_info { struct list_head freeable_list; struct list_head frdi_idx_list; int freeable_cnt; + int in_a_category_cnt; int ltab_lnum; int ltab_offs; -- cgit v1.1 From 34396c6b75ce93cf1ee38ddd492b542a6c8c63eb Mon Sep 17 00:00:00 2001 From: Tyler Hicks Date: Mon, 11 Jun 2012 15:42:32 -0700 Subject: eCryptfs: Copy up POSIX ACL and read-only flags from lower mount commit 069ddcda37b2cf5bb4b6031a944c0e9359213262 upstream. When the eCryptfs mount options do not include '-o acl', but the lower filesystem's mount options do include 'acl', the MS_POSIXACL flag is not flipped on in the eCryptfs super block flags. This flag is what the VFS checks in do_last() when deciding if the current umask should be applied to a newly created inode's mode or not. When a default POSIX ACL mask is set on a directory, the current umask is incorrectly applied to new inodes created in the directory. This patch ignores the MS_POSIXACL flag passed into ecryptfs_mount() and sets the flag on the eCryptfs super block depending on the flag's presence on the lower super block. Additionally, it is incorrect to allow a writeable eCryptfs mount on top of a read-only lower mount. This missing check did not allow writes to the read-only lower mount because permissions checks are still performed on the lower filesystem's objects but it is best to simply not allow a rw mount on top of ro mount. However, a ro eCryptfs mount on top of a rw mount is valid and still allowed. https://launchpad.net/bugs/1009207 Signed-off-by: Tyler Hicks Reported-by: Stefan Beller Cc: John Johansen Cc: Herton Ronaldo Krzesinski Signed-off-by: Greg Kroah-Hartman --- fs/ecryptfs/main.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/ecryptfs/main.c b/fs/ecryptfs/main.c index b4a6bef..a6c4d01 100644 --- a/fs/ecryptfs/main.c +++ b/fs/ecryptfs/main.c @@ -505,7 +505,6 @@ static struct dentry *ecryptfs_mount(struct file_system_type *fs_type, int flags goto out; } - s->s_flags = flags; rc = bdi_setup_and_register(&sbi->bdi, "ecryptfs", BDI_CAP_MAP_COPY); if (rc) goto out1; @@ -541,6 +540,15 @@ static struct dentry *ecryptfs_mount(struct file_system_type *fs_type, int flags } ecryptfs_set_superblock_lower(s, path.dentry->d_sb); + + /** + * Set the POSIX ACL flag based on whether they're enabled in the lower + * mount. Force a read-only eCryptfs mount if the lower mount is ro. + * Allow a ro eCryptfs mount even when the lower mount is rw. + */ + s->s_flags = flags & ~MS_POSIXACL; + s->s_flags |= path.dentry->d_sb->s_flags & (MS_RDONLY | MS_POSIXACL); + s->s_maxbytes = path.dentry->d_sb->s_maxbytes; s->s_blocksize = path.dentry->d_sb->s_blocksize; s->s_magic = ECRYPTFS_SUPER_MAGIC; -- cgit v1.1 From 6c0e7633180a62ae295075e044e0a821c0a934b5 Mon Sep 17 00:00:00 2001 From: Tim Sally Date: Thu, 12 Jul 2012 19:10:24 -0400 Subject: eCryptfs: check for eCryptfs cipher support at mount commit 5f5b331d5c21228a6519dcb793fc1629646c51a6 upstream. The issue occurs when eCryptfs is mounted with a cipher supported by the crypto subsystem but not by eCryptfs. The mount succeeds and an error does not occur until a write. This change checks for eCryptfs cipher support at mount time. Resolves Launchpad issue #338914, reported by Tyler Hicks in 03/2009. https://bugs.launchpad.net/ecryptfs/+bug/338914 Signed-off-by: Tim Sally Signed-off-by: Tyler Hicks Cc: Herton Ronaldo Krzesinski Signed-off-by: Greg Kroah-Hartman --- fs/ecryptfs/main.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) (limited to 'fs') diff --git a/fs/ecryptfs/main.c b/fs/ecryptfs/main.c index a6c4d01..d1125e3 100644 --- a/fs/ecryptfs/main.c +++ b/fs/ecryptfs/main.c @@ -279,6 +279,7 @@ static int ecryptfs_parse_options(struct ecryptfs_sb_info *sbi, char *options, char *fnek_src; char *cipher_key_bytes_src; char *fn_cipher_key_bytes_src; + u8 cipher_code; *check_ruid = 0; @@ -420,6 +421,18 @@ static int ecryptfs_parse_options(struct ecryptfs_sb_info *sbi, char *options, && !fn_cipher_key_bytes_set) mount_crypt_stat->global_default_fn_cipher_key_bytes = mount_crypt_stat->global_default_cipher_key_size; + + cipher_code = ecryptfs_code_for_cipher_string( + mount_crypt_stat->global_default_cipher_name, + mount_crypt_stat->global_default_cipher_key_size); + if (!cipher_code) { + ecryptfs_printk(KERN_ERR, + "eCryptfs doesn't support cipher: %s", + mount_crypt_stat->global_default_cipher_name); + rc = -EINVAL; + goto out; + } + mutex_lock(&key_tfm_list_mutex); if (!ecryptfs_tfm_exists(mount_crypt_stat->global_default_cipher_name, NULL)) { -- cgit v1.1 From 31fcdd0d4e11b4fe065277932ab7e3fe60a9ed6e Mon Sep 17 00:00:00 2001 From: Bryan Schumaker Date: Tue, 30 Oct 2012 16:06:35 -0400 Subject: NFS: Wait for session recovery to finish before returning commit 399f11c3d872bd748e1575574de265a6304c7c43 upstream. Currently, we will schedule session recovery and then return to the caller of nfs4_handle_exception. This works for most cases, but causes a hang on the following test case: Client Server ------ ------ Open file over NFS v4.1 Write to file Expire client Try to lock file The server will return NFS4ERR_BADSESSION, prompting the client to schedule recovery. However, the client will continue placing lock attempts and the open recovery never seems to be scheduled. The simplest solution is to wait for session recovery to run before retrying the lock. Signed-off-by: Bryan Schumaker Signed-off-by: Trond Myklebust [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- fs/nfs/nfs4proc.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'fs') diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index d5d5400..3720caa 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -300,8 +300,7 @@ static int nfs4_handle_exception(struct nfs_server *server, int errorcode, struc dprintk("%s ERROR: %d Reset session\n", __func__, errorcode); nfs4_schedule_session_recovery(clp->cl_session); - exception->retry = 1; - break; + goto wait_on_recovery; #endif /* defined(CONFIG_NFS_V4_1) */ case -NFS4ERR_FILE_OPEN: if (exception->timeout > HZ) { -- cgit v1.1 From daa88cb107f76cf9b563a2c0c25bbf2f47c67abf Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Tue, 13 Nov 2012 14:55:52 +0100 Subject: reiserfs: Fix lock ordering during remount commit 3bb3e1fc47aca554e7e2cc4deeddc24750987ac2 upstream. When remounting reiserfs dquot_suspend() or dquot_resume() can be called. These functions take dqonoff_mutex which ranks above write lock so we have to drop it before calling into quota code. Signed-off-by: Jan Kara Signed-off-by: Greg Kroah-Hartman --- fs/reiserfs/super.c | 27 ++++++++++++++++++++------- 1 file changed, 20 insertions(+), 7 deletions(-) (limited to 'fs') diff --git a/fs/reiserfs/super.c b/fs/reiserfs/super.c index f19dfbf..115e444 100644 --- a/fs/reiserfs/super.c +++ b/fs/reiserfs/super.c @@ -1207,7 +1207,7 @@ static int reiserfs_remount(struct super_block *s, int *mount_flags, char *arg) kfree(qf_names[i]); #endif err = -EINVAL; - goto out_err; + goto out_unlock; } #ifdef CONFIG_QUOTA handle_quota_files(s, qf_names, &qfmt); @@ -1250,7 +1250,7 @@ static int reiserfs_remount(struct super_block *s, int *mount_flags, char *arg) if (blocks) { err = reiserfs_resize(s, blocks); if (err != 0) - goto out_err; + goto out_unlock; } if (*mount_flags & MS_RDONLY) { @@ -1260,9 +1260,15 @@ static int reiserfs_remount(struct super_block *s, int *mount_flags, char *arg) /* it is read-only already */ goto out_ok; + /* + * Drop write lock. Quota will retake it when needed and lock + * ordering requires calling dquot_suspend() without it. + */ + reiserfs_write_unlock(s); err = dquot_suspend(s, -1); if (err < 0) goto out_err; + reiserfs_write_lock(s); /* try to remount file system with read-only permissions */ if (sb_umount_state(rs) == REISERFS_VALID_FS @@ -1272,7 +1278,7 @@ static int reiserfs_remount(struct super_block *s, int *mount_flags, char *arg) err = journal_begin(&th, s, 10); if (err) - goto out_err; + goto out_unlock; /* Mounting a rw partition read-only. */ reiserfs_prepare_for_journal(s, SB_BUFFER_WITH_SB(s), 1); @@ -1287,7 +1293,7 @@ static int reiserfs_remount(struct super_block *s, int *mount_flags, char *arg) if (reiserfs_is_journal_aborted(journal)) { err = journal->j_errno; - goto out_err; + goto out_unlock; } handle_data_mode(s, mount_options); @@ -1296,7 +1302,7 @@ static int reiserfs_remount(struct super_block *s, int *mount_flags, char *arg) s->s_flags &= ~MS_RDONLY; /* now it is safe to call journal_begin */ err = journal_begin(&th, s, 10); if (err) - goto out_err; + goto out_unlock; /* Mount a partition which is read-only, read-write */ reiserfs_prepare_for_journal(s, SB_BUFFER_WITH_SB(s), 1); @@ -1313,11 +1319,17 @@ static int reiserfs_remount(struct super_block *s, int *mount_flags, char *arg) SB_JOURNAL(s)->j_must_wait = 1; err = journal_end(&th, s, 10); if (err) - goto out_err; + goto out_unlock; s->s_dirt = 0; if (!(*mount_flags & MS_RDONLY)) { + /* + * Drop write lock. Quota will retake it when needed and lock + * ordering requires calling dquot_resume() without it. + */ + reiserfs_write_unlock(s); dquot_resume(s, -1); + reiserfs_write_lock(s); finish_unfinished(s); reiserfs_xattr_init(s, *mount_flags); } @@ -1327,9 +1339,10 @@ out_ok: reiserfs_write_unlock(s); return 0; +out_unlock: + reiserfs_write_unlock(s); out_err: kfree(new_opts); - reiserfs_write_unlock(s); return err; } -- cgit v1.1 From a7b8408f3c42adc601513bb7b8dae9ef05a1719e Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Tue, 13 Nov 2012 16:34:17 +0100 Subject: reiserfs: Protect reiserfs_quota_on() with write lock commit b9e06ef2e8706fe669b51f4364e3aeed58639eb2 upstream. In reiserfs_quota_on() we do quite some work - for example unpacking tail of a quota file. Thus we have to hold write lock until a moment we call back into the quota code. Signed-off-by: Jan Kara Signed-off-by: Greg Kroah-Hartman --- fs/reiserfs/super.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) (limited to 'fs') diff --git a/fs/reiserfs/super.c b/fs/reiserfs/super.c index 115e444..407e499 100644 --- a/fs/reiserfs/super.c +++ b/fs/reiserfs/super.c @@ -2072,8 +2072,11 @@ static int reiserfs_quota_on(struct super_block *sb, int type, int format_id, struct inode *inode; struct reiserfs_transaction_handle th; - if (!(REISERFS_SB(sb)->s_mount_opt & (1 << REISERFS_QUOTA))) - return -EINVAL; + reiserfs_write_lock(sb); + if (!(REISERFS_SB(sb)->s_mount_opt & (1 << REISERFS_QUOTA))) { + err = -EINVAL; + goto out; + } /* Quotafile not on the same filesystem? */ if (path->mnt->mnt_sb != sb) { @@ -2115,8 +2118,10 @@ static int reiserfs_quota_on(struct super_block *sb, int type, int format_id, if (err) goto out; } - err = dquot_quota_on(sb, type, format_id, path); + reiserfs_write_unlock(sb); + return dquot_quota_on(sb, type, format_id, path); out: + reiserfs_write_unlock(sb); return err; } -- cgit v1.1 From 2f21676d17f0c67263e87f13859682712b643323 Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Tue, 13 Nov 2012 17:05:14 +0100 Subject: reiserfs: Move quota calls out of write lock commit 7af11686933726e99af22901d622f9e161404e6b upstream. Calls into highlevel quota code cannot happen under the write lock. These calls take dqio_mutex which ranks above write lock. So drop write lock before calling back into quota code. Signed-off-by: Jan Kara Signed-off-by: Greg Kroah-Hartman --- fs/reiserfs/inode.c | 10 +++++++--- fs/reiserfs/stree.c | 4 ++++ fs/reiserfs/super.c | 18 ++++++++++++++---- 3 files changed, 25 insertions(+), 7 deletions(-) (limited to 'fs') diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c index 0363aa4..ebe8db4 100644 --- a/fs/reiserfs/inode.c +++ b/fs/reiserfs/inode.c @@ -1783,8 +1783,9 @@ int reiserfs_new_inode(struct reiserfs_transaction_handle *th, BUG_ON(!th->t_trans_id); - dquot_initialize(inode); + reiserfs_write_unlock(inode->i_sb); err = dquot_alloc_inode(inode); + reiserfs_write_lock(inode->i_sb); if (err) goto out_end_trans; if (!dir->i_nlink) { @@ -1980,8 +1981,10 @@ int reiserfs_new_inode(struct reiserfs_transaction_handle *th, out_end_trans: journal_end(th, th->t_super, th->t_blocks_allocated); + reiserfs_write_unlock(inode->i_sb); /* Drop can be outside and it needs more credits so it's better to have it outside */ dquot_drop(inode); + reiserfs_write_lock(inode->i_sb); inode->i_flags |= S_NOQUOTA; make_bad_inode(inode); @@ -3105,10 +3108,9 @@ int reiserfs_setattr(struct dentry *dentry, struct iattr *attr) /* must be turned off for recursive notify_change calls */ ia_valid = attr->ia_valid &= ~(ATTR_KILL_SUID|ATTR_KILL_SGID); - depth = reiserfs_write_lock_once(inode->i_sb); if (is_quota_modification(inode, attr)) dquot_initialize(inode); - + depth = reiserfs_write_lock_once(inode->i_sb); if (attr->ia_valid & ATTR_SIZE) { /* version 2 items will be caught by the s_maxbytes check ** done for us in vmtruncate @@ -3169,7 +3171,9 @@ int reiserfs_setattr(struct dentry *dentry, struct iattr *attr) error = journal_begin(&th, inode->i_sb, jbegin_count); if (error) goto out; + reiserfs_write_unlock_once(inode->i_sb, depth); error = dquot_transfer(inode, attr); + depth = reiserfs_write_lock_once(inode->i_sb); if (error) { journal_end(&th, inode->i_sb, jbegin_count); goto out; diff --git a/fs/reiserfs/stree.c b/fs/reiserfs/stree.c index 313d39d..3ae9926 100644 --- a/fs/reiserfs/stree.c +++ b/fs/reiserfs/stree.c @@ -1968,7 +1968,9 @@ int reiserfs_paste_into_item(struct reiserfs_transaction_handle *th, struct tree key2type(&(key->on_disk_key))); #endif + reiserfs_write_unlock(inode->i_sb); retval = dquot_alloc_space_nodirty(inode, pasted_size); + reiserfs_write_lock(inode->i_sb); if (retval) { pathrelse(search_path); return retval; @@ -2061,9 +2063,11 @@ int reiserfs_insert_item(struct reiserfs_transaction_handle *th, "reiserquota insert_item(): allocating %u id=%u type=%c", quota_bytes, inode->i_uid, head2type(ih)); #endif + reiserfs_write_unlock(inode->i_sb); /* We can't dirty inode here. It would be immediately written but * appropriate stat item isn't inserted yet... */ retval = dquot_alloc_space_nodirty(inode, quota_bytes); + reiserfs_write_lock(inode->i_sb); if (retval) { pathrelse(path); return retval; diff --git a/fs/reiserfs/super.c b/fs/reiserfs/super.c index 407e499..1c677aa 100644 --- a/fs/reiserfs/super.c +++ b/fs/reiserfs/super.c @@ -254,7 +254,9 @@ static int finish_unfinished(struct super_block *s) retval = remove_save_link_only(s, &save_link_key, 0); continue; } + reiserfs_write_unlock(s); dquot_initialize(inode); + reiserfs_write_lock(s); if (truncate && S_ISDIR(inode->i_mode)) { /* We got a truncate request for a dir which is impossible. @@ -1965,13 +1967,15 @@ static int reiserfs_write_dquot(struct dquot *dquot) REISERFS_QUOTA_TRANS_BLOCKS(dquot->dq_sb)); if (ret) goto out; + reiserfs_write_unlock(dquot->dq_sb); ret = dquot_commit(dquot); + reiserfs_write_lock(dquot->dq_sb); err = journal_end(&th, dquot->dq_sb, REISERFS_QUOTA_TRANS_BLOCKS(dquot->dq_sb)); if (!ret && err) ret = err; - out: +out: reiserfs_write_unlock(dquot->dq_sb); return ret; } @@ -1987,13 +1991,15 @@ static int reiserfs_acquire_dquot(struct dquot *dquot) REISERFS_QUOTA_INIT_BLOCKS(dquot->dq_sb)); if (ret) goto out; + reiserfs_write_unlock(dquot->dq_sb); ret = dquot_acquire(dquot); + reiserfs_write_lock(dquot->dq_sb); err = journal_end(&th, dquot->dq_sb, REISERFS_QUOTA_INIT_BLOCKS(dquot->dq_sb)); if (!ret && err) ret = err; - out: +out: reiserfs_write_unlock(dquot->dq_sb); return ret; } @@ -2007,19 +2013,21 @@ static int reiserfs_release_dquot(struct dquot *dquot) ret = journal_begin(&th, dquot->dq_sb, REISERFS_QUOTA_DEL_BLOCKS(dquot->dq_sb)); + reiserfs_write_unlock(dquot->dq_sb); if (ret) { /* Release dquot anyway to avoid endless cycle in dqput() */ dquot_release(dquot); goto out; } ret = dquot_release(dquot); + reiserfs_write_lock(dquot->dq_sb); err = journal_end(&th, dquot->dq_sb, REISERFS_QUOTA_DEL_BLOCKS(dquot->dq_sb)); if (!ret && err) ret = err; - out: reiserfs_write_unlock(dquot->dq_sb); +out: return ret; } @@ -2044,11 +2052,13 @@ static int reiserfs_write_info(struct super_block *sb, int type) ret = journal_begin(&th, sb, 2); if (ret) goto out; + reiserfs_write_unlock(sb); ret = dquot_commit_info(sb, type); + reiserfs_write_lock(sb); err = journal_end(&th, sb, 2); if (!ret && err) ret = err; - out: +out: reiserfs_write_unlock(sb); return ret; } -- cgit v1.1 From 7eebd4fb8bac3b4b69783e9333cf6e383376702d Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Tue, 13 Nov 2012 18:25:38 +0100 Subject: reiserfs: Protect reiserfs_quota_write() with write lock commit 361d94a338a3fd0cee6a4ea32bbc427ba228e628 upstream. Calls into reiserfs journalling code and reiserfs_get_block() need to be protected with write lock. We remove write lock around calls to high level quota code in the next patch so these paths would suddently become unprotected. Signed-off-by: Jan Kara Signed-off-by: Greg Kroah-Hartman --- fs/reiserfs/super.c | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'fs') diff --git a/fs/reiserfs/super.c b/fs/reiserfs/super.c index 1c677aa..7527623 100644 --- a/fs/reiserfs/super.c +++ b/fs/reiserfs/super.c @@ -2205,7 +2205,9 @@ static ssize_t reiserfs_quota_write(struct super_block *sb, int type, tocopy = sb->s_blocksize - offset < towrite ? sb->s_blocksize - offset : towrite; tmp_bh.b_state = 0; + reiserfs_write_lock(sb); err = reiserfs_get_block(inode, blk, &tmp_bh, GET_BLOCK_CREATE); + reiserfs_write_unlock(sb); if (err) goto out; if (offset || tocopy != sb->s_blocksize) @@ -2221,10 +2223,12 @@ static ssize_t reiserfs_quota_write(struct super_block *sb, int type, flush_dcache_page(bh->b_page); set_buffer_uptodate(bh); unlock_buffer(bh); + reiserfs_write_lock(sb); reiserfs_prepare_for_journal(sb, bh, 1); journal_mark_dirty(current->journal_info, sb, bh); if (!journal_quota) reiserfs_add_ordered_list(inode, bh); + reiserfs_write_unlock(sb); brelse(bh); offset = 0; towrite -= tocopy; -- cgit v1.1 From 1c4e871038f17769d943d4473c339171fed70f45 Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Fri, 23 Nov 2012 14:03:04 +0100 Subject: jbd: Fix lock ordering bug in journal_unmap_buffer() commit 25389bb207987b5774182f763b9fb65ff08761c8 upstream. Commit 09e05d48 introduced a wait for transaction commit into journal_unmap_buffer() in the case we are truncating a buffer undergoing commit in the page stradding i_size on a filesystem with blocksize < pagesize. Sadly we forgot to drop buffer lock before waiting for transaction commit and thus deadlock is possible when kjournald wants to lock the buffer. Fix the problem by dropping the buffer lock before waiting for transaction commit. Since we are still holding page lock (and that is OK), buffer cannot disappear under us. Signed-off-by: Jan Kara Signed-off-by: Greg Kroah-Hartman --- fs/jbd/transaction.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'fs') diff --git a/fs/jbd/transaction.c b/fs/jbd/transaction.c index b0161a6..d7ab092 100644 --- a/fs/jbd/transaction.c +++ b/fs/jbd/transaction.c @@ -1955,7 +1955,9 @@ retry: spin_unlock(&journal->j_list_lock); jbd_unlock_bh_state(bh); spin_unlock(&journal->j_state_lock); + unlock_buffer(bh); log_wait_commit(journal, tid); + lock_buffer(bh); goto retry; } /* -- cgit v1.1 From 28278b332bc73f45189939a6c1797cba0eb5879f Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Thu, 20 Dec 2012 15:05:16 -0800 Subject: exec: do not leave bprm->interp on stack commit b66c5984017533316fd1951770302649baf1aa33 upstream. If a series of scripts are executed, each triggering module loading via unprintable bytes in the script header, kernel stack contents can leak into the command line. Normally execution of binfmt_script and binfmt_misc happens recursively. However, when modules are enabled, and unprintable bytes exist in the bprm->buf, execution will restart after attempting to load matching binfmt modules. Unfortunately, the logic in binfmt_script and binfmt_misc does not expect to get restarted. They leave bprm->interp pointing to their local stack. This means on restart bprm->interp is left pointing into unused stack memory which can then be copied into the userspace argv areas. After additional study, it seems that both recursion and restart remains the desirable way to handle exec with scripts, misc, and modules. As such, we need to protect the changes to interp. This changes the logic to require allocation for any changes to the bprm->interp. To avoid adding a new kmalloc to every exec, the default value is left as-is. Only when passing through binfmt_script or binfmt_misc does an allocation take place. For a proof of concept, see DoTest.sh from: http://www.halfdog.net/Security/2012/LinuxKernelBinfmtScriptStackDataDisclosure/ Signed-off-by: Kees Cook Cc: halfdog Cc: P J P Cc: Alexander Viro Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- fs/binfmt_misc.c | 5 ++++- fs/binfmt_script.c | 4 +++- fs/exec.c | 15 +++++++++++++++ 3 files changed, 22 insertions(+), 2 deletions(-) (limited to 'fs') diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c index 1befe2e..5463952 100644 --- a/fs/binfmt_misc.c +++ b/fs/binfmt_misc.c @@ -176,7 +176,10 @@ static int load_misc_binary(struct linux_binprm *bprm, struct pt_regs *regs) goto _error; bprm->argc ++; - bprm->interp = iname; /* for binfmt_script */ + /* Update interp in case binfmt_script needs it. */ + retval = bprm_change_interp(iname, bprm); + if (retval < 0) + goto _error; interp_file = open_exec (iname); retval = PTR_ERR (interp_file); diff --git a/fs/binfmt_script.c b/fs/binfmt_script.c index 396a988..e39c18a 100644 --- a/fs/binfmt_script.c +++ b/fs/binfmt_script.c @@ -82,7 +82,9 @@ static int load_script(struct linux_binprm *bprm,struct pt_regs *regs) retval = copy_strings_kernel(1, &i_name, bprm); if (retval) return retval; bprm->argc++; - bprm->interp = interp; + retval = bprm_change_interp(interp, bprm); + if (retval < 0) + return retval; /* * OK, now restart the process with the interpreter's dentry. diff --git a/fs/exec.c b/fs/exec.c index 044c13f..08f3e4e 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1192,9 +1192,24 @@ void free_bprm(struct linux_binprm *bprm) mutex_unlock(¤t->signal->cred_guard_mutex); abort_creds(bprm->cred); } + /* If a binfmt changed the interp, free it. */ + if (bprm->interp != bprm->filename) + kfree(bprm->interp); kfree(bprm); } +int bprm_change_interp(char *interp, struct linux_binprm *bprm) +{ + /* If a binfmt changed the interp, free it first. */ + if (bprm->interp != bprm->filename) + kfree(bprm->interp); + bprm->interp = kstrdup(interp, GFP_KERNEL); + if (!bprm->interp) + return -ENOMEM; + return 0; +} +EXPORT_SYMBOL(bprm_change_interp); + /* * install the new credentials for this executable */ -- cgit v1.1 From 64b45c804286cb18c50b387169b7da64279bc1ed Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Thu, 13 Dec 2012 15:14:36 +1100 Subject: NFS: avoid NULL dereference in nfs_destroy_server commit f259613a1e4b44a0cf85a5dafd931be96ee7c9e5 upstream. In rare circumstances, nfs_clone_server() of a v2 or v3 server can get an error between setting server->destory (to nfs_destroy_server), and calling nfs_start_lockd (which will set server->nlm_host). If this happens, nfs_clone_server will call nfs_free_server which will call nfs_destroy_server and thence nlmclnt_done(NULL). This causes the NULL to be dereferenced. So add a guard to only call nlmclnt_done() if ->nlm_host is not NULL. The other guards there are irrelevant as nlm_host can only be non-NULL if one of these flags are set - so remove those tests. (Thanks to Trond for this suggestion). This is suitable for any stable kernel since 2.6.25. Signed-off-by: NeilBrown Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman --- fs/nfs/client.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'fs') diff --git a/fs/nfs/client.c b/fs/nfs/client.c index b3dc2b8..0cb731f 100644 --- a/fs/nfs/client.c +++ b/fs/nfs/client.c @@ -673,8 +673,7 @@ static int nfs_create_rpc_client(struct nfs_client *clp, */ static void nfs_destroy_server(struct nfs_server *server) { - if (!(server->flags & NFS_MOUNT_LOCAL_FLOCK) || - !(server->flags & NFS_MOUNT_LOCAL_FCNTL)) + if (server->nlm_host) nlmclnt_done(server->nlm_host); } -- cgit v1.1 From 9d53441a95261c71d18938a797eb96f469b9dc09 Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Fri, 14 Dec 2012 16:38:46 -0500 Subject: NFS: Fix calls to drop_nlink() commit 1f018458b30b0d5c535c94e577aa0acbb92e1395 upstream. It is almost always wrong for NFS to call drop_nlink() after removing a file. What we really want is to mark the inode's attributes for revalidation, and we want to ensure that the VFS drops it if we're reasonably sure that this is the final unlink(). Do the former using the usual cache validity flags, and the latter by testing if inode->i_nlink == 1, and clearing it in that case. This also fixes the following warning reported by Neil Brown and Jeff Layton (among others). [634155.004438] WARNING: at /home/abuild/rpmbuild/BUILD/kernel-desktop-3.5.0/lin [634155.004442] Hardware name: Latitude E6510 [634155.004577] crc_itu_t crc32c_intel snd_hwdep snd_pcm snd_timer snd soundcor [634155.004609] Pid: 13402, comm: bash Tainted: G W 3.5.0-36-desktop # [634155.004611] Call Trace: [634155.004630] [] dump_trace+0xaa/0x2b0 [634155.004641] [] dump_stack+0x69/0x6f [634155.004653] [] warn_slowpath_common+0x7b/0xc0 [634155.004662] [] drop_nlink+0x34/0x40 [634155.004687] [] nfs_dentry_iput+0x33/0x70 [nfs] [634155.004714] [] dput+0x12e/0x230 [634155.004726] [] __fput+0x170/0x230 [634155.004735] [] filp_close+0x5f/0x90 [634155.004743] [] sys_close+0x97/0x100 [634155.004754] [] system_call_fastpath+0x16/0x1b [634155.004767] [<00007f2a73a0d110>] 0x7f2a73a0d10f Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman --- fs/nfs/dir.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) (limited to 'fs') diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c index 9515bc5..4033264 100644 --- a/fs/nfs/dir.c +++ b/fs/nfs/dir.c @@ -1216,11 +1216,14 @@ static int nfs_dentry_delete(const struct dentry *dentry) } +/* Ensure that we revalidate inode->i_nlink */ static void nfs_drop_nlink(struct inode *inode) { spin_lock(&inode->i_lock); - if (inode->i_nlink > 0) - drop_nlink(inode); + /* drop the inode if we're reasonably sure this is the last link */ + if (inode->i_nlink == 1) + clear_nlink(inode); + NFS_I(inode)->cache_validity |= NFS_INO_INVALID_ATTR; spin_unlock(&inode->i_lock); } @@ -1235,8 +1238,8 @@ static void nfs_dentry_iput(struct dentry *dentry, struct inode *inode) NFS_I(inode)->cache_validity |= NFS_INO_INVALID_DATA; if (dentry->d_flags & DCACHE_NFSFS_RENAMED) { - drop_nlink(inode); nfs_complete_unlink(dentry, inode); + nfs_drop_nlink(inode); } iput(inode); } @@ -1788,10 +1791,8 @@ static int nfs_safe_remove(struct dentry *dentry) if (inode != NULL) { nfs_inode_return_delegation(inode); error = NFS_PROTO(dir)->remove(dir, &dentry->d_name); - /* The VFS may want to delete this inode */ if (error == 0) nfs_drop_nlink(inode); - nfs_mark_for_revalidate(inode); } else error = NFS_PROTO(dir)->remove(dir, &dentry->d_name); if (error == -ENOENT) -- cgit v1.1 From d6e0c42ccbb489d73ebc3973583474257a028d66 Mon Sep 17 00:00:00 2001 From: "J. Bruce Fields" Date: Tue, 4 Dec 2012 18:25:10 -0500 Subject: nfsd4: fix oops on unusual readlike compound commit d5f50b0c290431c65377c4afa1c764e2c3fe5305 upstream. If the argument and reply together exceed the maximum payload size, then a reply with a read-like operation can overlow the rq_pages array. Signed-off-by: J. Bruce Fields Signed-off-by: Greg Kroah-Hartman --- fs/nfsd/nfs4xdr.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index f91d589..ecdd18a 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -2682,11 +2682,16 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr, len = maxcount; v = 0; while (len > 0) { - pn = resp->rqstp->rq_resused++; + pn = resp->rqstp->rq_resused; + if (!resp->rqstp->rq_respages[pn]) { /* ran out of pages */ + maxcount -= len; + break; + } resp->rqstp->rq_vec[v].iov_base = page_address(resp->rqstp->rq_respages[pn]); resp->rqstp->rq_vec[v].iov_len = len < PAGE_SIZE ? len : PAGE_SIZE; + resp->rqstp->rq_resused++; v++; len -= PAGE_SIZE; } @@ -2734,6 +2739,8 @@ nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd return nfserr; if (resp->xbuf->page_len) return nfserr_resource; + if (!resp->rqstp->rq_respages[resp->rqstp->rq_resused]) + return nfserr_resource; page = page_address(resp->rqstp->rq_respages[resp->rqstp->rq_resused++]); @@ -2783,6 +2790,8 @@ nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4 return nfserr; if (resp->xbuf->page_len) return nfserr_resource; + if (!resp->rqstp->rq_respages[resp->rqstp->rq_resused]) + return nfserr_resource; RESERVE_SPACE(8); /* verifier */ savep = p; -- cgit v1.1 From 31c4e8cd42eb2f6aedccd13670e113d2ca1db39e Mon Sep 17 00:00:00 2001 From: Xi Wang Date: Fri, 4 Jan 2013 03:22:57 -0500 Subject: nfs: fix null checking in nfs_get_option_str() commit e25fbe380c4e3c09afa98bcdcd9d3921443adab8 upstream. The following null pointer check is broken. *option = match_strdup(args); return !option; The pointer `option' must be non-null, and thus `!option' is always false. Use `!*option' instead. The bug was introduced in commit c5cb09b6f8 ("Cleanup: Factor out some cut-and-paste code."). Signed-off-by: Xi Wang Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman --- fs/nfs/super.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/nfs/super.c b/fs/nfs/super.c index 23f0223..a5b2419 100644 --- a/fs/nfs/super.c +++ b/fs/nfs/super.c @@ -1052,7 +1052,7 @@ static int nfs_get_option_str(substring_t args[], char **option) { kfree(*option); *option = match_strdup(args); - return !option; + return !*option; } static int nfs_get_option_ul(substring_t args[], unsigned long *option) -- cgit v1.1 From 9d4cbf806eaefbf4a8a63de492a060e06b07f5a6 Mon Sep 17 00:00:00 2001 From: Eugene Shatokhin Date: Thu, 8 Nov 2012 15:11:11 -0500 Subject: ext4: fix memory leak in ext4_xattr_set_acl()'s error path commit 24ec19b0ae83a385ad9c55520716da671274b96c upstream. In ext4_xattr_set_acl(), if ext4_journal_start() returns an error, posix_acl_release() will not be called for 'acl' which may result in a memory leak. This patch fixes that. Reviewed-by: Lukas Czerner Signed-off-by: Eugene Shatokhin Signed-off-by: "Theodore Ts'o" Signed-off-by: Greg Kroah-Hartman --- fs/ext4/acl.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) (limited to 'fs') diff --git a/fs/ext4/acl.c b/fs/ext4/acl.c index 21eacd7..4922087 100644 --- a/fs/ext4/acl.c +++ b/fs/ext4/acl.c @@ -450,8 +450,10 @@ ext4_xattr_set_acl(struct dentry *dentry, const char *name, const void *value, retry: handle = ext4_journal_start(inode, EXT4_DATA_TRANS_BLOCKS(inode->i_sb)); - if (IS_ERR(handle)) - return PTR_ERR(handle); + if (IS_ERR(handle)) { + error = PTR_ERR(handle); + goto release_and_out; + } error = ext4_set_acl(handle, inode, type, acl); ext4_journal_stop(handle); if (error == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries)) -- cgit v1.1 From 43f5d330de7b4238cd4c0b3f287218e1734332af Mon Sep 17 00:00:00 2001 From: Alexey Khoroshilov Date: Mon, 5 Nov 2012 22:40:14 +0400 Subject: jffs2: hold erase_completion_lock on exit commit 2cbba75a56ea78e6876b4e2547a882f10b3fe72b upstream. Users of jffs2_do_reserve_space() expect they still held erase_completion_lock after call to it. But there is a path where jffs2_do_reserve_space() leaves erase_completion_lock unlocked. The patch fixes it. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov Signed-off-by: Artem Bityutskiy Signed-off-by: Greg Kroah-Hartman --- fs/jffs2/nodemgmt.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) (limited to 'fs') diff --git a/fs/jffs2/nodemgmt.c b/fs/jffs2/nodemgmt.c index 694aa5b..e304795 100644 --- a/fs/jffs2/nodemgmt.c +++ b/fs/jffs2/nodemgmt.c @@ -355,14 +355,16 @@ static int jffs2_do_reserve_space(struct jffs2_sb_info *c, uint32_t minsize, spin_unlock(&c->erase_completion_lock); ret = jffs2_prealloc_raw_node_refs(c, jeb, 1); - if (ret) - return ret; + /* Just lock it again and continue. Nothing much can change because we hold c->alloc_sem anyway. In fact, it's not entirely clear why we hold c->erase_completion_lock in the majority of this function... but that's a question for another (more caffeine-rich) day. */ spin_lock(&c->erase_completion_lock); + if (ret) + return ret; + waste = jeb->free_size; jffs2_link_node_ref(c, jeb, (jeb->offset + c->sector_size - waste) | REF_OBSOLETE, -- cgit v1.1 From 8cbe63802ddac6b242e4a4293fc338fa7808272c Mon Sep 17 00:00:00 2001 From: Forrest Liu Date: Mon, 17 Dec 2012 09:55:39 -0500 Subject: ext4: fix extent tree corruption caused by hole punch commit c36575e663e302dbaa4d16b9c72d2c9a913a9aef upstream. When depth of extent tree is greater than 1, logical start value of interior node is not correctly updated in ext4_ext_rm_idx. Signed-off-by: Forrest Liu Signed-off-by: "Theodore Ts'o" Reviewed-by: Ashish Sangwan Signed-off-by: Greg Kroah-Hartman --- fs/ext4/extents.c | 22 ++++++++++++++++++---- 1 file changed, 18 insertions(+), 4 deletions(-) (limited to 'fs') diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 611647b..680df5d 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -2140,13 +2140,14 @@ ext4_ext_in_cache(struct inode *inode, ext4_lblk_t block, * last index in the block only. */ static int ext4_ext_rm_idx(handle_t *handle, struct inode *inode, - struct ext4_ext_path *path) + struct ext4_ext_path *path, int depth) { int err; ext4_fsblk_t leaf; /* free index block */ - path--; + depth--; + path = path + depth; leaf = ext4_idx_pblock(path->p_idx); if (unlikely(path->p_hdr->eh_entries == 0)) { EXT4_ERROR_INODE(inode, "path->p_hdr->eh_entries == 0"); @@ -2162,6 +2163,19 @@ static int ext4_ext_rm_idx(handle_t *handle, struct inode *inode, ext_debug("index is empty, remove it, free block %llu\n", leaf); ext4_free_blocks(handle, inode, NULL, leaf, 1, EXT4_FREE_BLOCKS_METADATA | EXT4_FREE_BLOCKS_FORGET); + + while (--depth >= 0) { + if (path->p_idx != EXT_FIRST_INDEX(path->p_hdr)) + break; + path--; + err = ext4_ext_get_access(handle, inode, path); + if (err) + break; + path->p_idx->ei_block = (path+1)->p_idx->ei_block; + err = ext4_ext_dirty(handle, inode, path); + if (err) + break; + } return err; } @@ -2509,7 +2523,7 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode, /* if this leaf is free, then we should * remove it from index block above */ if (err == 0 && eh->eh_entries == 0 && path[depth].p_bh != NULL) - err = ext4_ext_rm_idx(handle, inode, path + depth); + err = ext4_ext_rm_idx(handle, inode, path, depth); out: return err; @@ -2639,7 +2653,7 @@ again: /* index is empty, remove it; * handle must be already prepared by the * truncatei_leaf() */ - err = ext4_ext_rm_idx(handle, inode, path + i); + err = ext4_ext_rm_idx(handle, inode, path, i); } /* root level has p_bh == NULL, brelse() eats this */ brelse(path[i].p_bh); -- cgit v1.1 From 7c558b7eb837ca7bc014af2f5509be143c3c3670 Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Fri, 21 Dec 2012 00:15:51 -0500 Subject: jbd2: fix assertion failure in jbd2_journal_flush() commit d7961c7fa4d2e3c3f12be67e21ba8799b5a7238a upstream. The following race is possible between start_this_handle() and someone calling jbd2_journal_flush(). Process A Process B start_this_handle(). if (journal->j_barrier_count) # false if (!journal->j_running_transaction) { #true read_unlock(&journal->j_state_lock); jbd2_journal_lock_updates() jbd2_journal_flush() write_lock(&journal->j_state_lock); if (journal->j_running_transaction) { # false ... wait for committing trans ... write_unlock(&journal->j_state_lock); ... write_lock(&journal->j_state_lock); if (!journal->j_running_transaction) { # true jbd2_get_transaction(journal, new_transaction); write_unlock(&journal->j_state_lock); goto repeat; # eventually blocks on j_barrier_count > 0 ... J_ASSERT(!journal->j_running_transaction); # fails We fix the race by rechecking j_barrier_count after reacquiring j_state_lock in exclusive mode. Reported-by: yjwsignal@empal.com Signed-off-by: Jan Kara Signed-off-by: "Theodore Ts'o" Signed-off-by: Greg Kroah-Hartman --- fs/jbd2/transaction.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c index 9baa39e..4ef2aae 100644 --- a/fs/jbd2/transaction.c +++ b/fs/jbd2/transaction.c @@ -178,7 +178,8 @@ repeat: if (!new_transaction) goto alloc_transaction; write_lock(&journal->j_state_lock); - if (!journal->j_running_transaction) { + if (!journal->j_running_transaction && + !journal->j_barrier_count) { jbd2_get_transaction(journal, new_transaction); new_transaction = NULL; } -- cgit v1.1 From 1ba9a27ec24b133799bc1b8a77d733c4015d0abd Mon Sep 17 00:00:00 2001 From: Michael Tokarev Date: Tue, 25 Dec 2012 14:08:16 -0500 Subject: ext4: do not try to write superblock on ro remount w/o journal commit d096ad0f79a782935d2e06ae8fb235e8c5397775 upstream. When a journal-less ext4 filesystem is mounted on a read-only block device (blockdev --setro will do), each remount (for other, unrelated, flags, like suid=>nosuid etc) results in a series of scary messages from kernel telling about I/O errors on the device. This is becauese of the following code ext4_remount(): if (sbi->s_journal == NULL) ext4_commit_super(sb, 1); at the end of remount procedure, which forces writing (flushing) of a superblock regardless whenever it is dirty or not, if the filesystem is readonly or not, and whenever the device itself is readonly or not. We only need call ext4_commit_super when the file system had been previously mounted read/write. Thanks to Eric Sandeen for help in diagnosing this issue. Signed-off-By: Michael Tokarev Signed-off-by: "Theodore Ts'o" Signed-off-by: Greg Kroah-Hartman --- fs/ext4/super.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 489d406..9ca954e 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -4438,7 +4438,7 @@ static int ext4_remount(struct super_block *sb, int *flags, char *data) } ext4_setup_system_zone(sb); - if (sbi->s_journal == NULL) + if (sbi->s_journal == NULL && !(old_sb_flags & MS_RDONLY)) ext4_commit_super(sb, 1); #ifdef CONFIG_QUOTA -- cgit v1.1 From a4202fd727f6bf57827061104a60307e78124ea2 Mon Sep 17 00:00:00 2001 From: Theodore Ts'o Date: Thu, 27 Dec 2012 01:42:48 -0500 Subject: ext4: lock i_mutex when truncating orphan inodes commit 721e3eba21e43532e438652dd8f1fcdfce3187e7 upstream. Commit c278531d39 added a warning when ext4_flush_unwritten_io() is called without i_mutex being taken. It had previously not been taken during orphan cleanup since races weren't possible at that point in the mount process, but as a result of this c278531d39, we will now see a kernel WARN_ON in this case. Take the i_mutex in ext4_orphan_cleanup() to suppress this warning. Reported-by: Alexander Beregalov Signed-off-by: "Theodore Ts'o" Reviewed-by: Zheng Liu Signed-off-by: Greg Kroah-Hartman --- fs/ext4/super.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'fs') diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 9ca954e..f1aa1a2 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -2204,7 +2204,9 @@ static void ext4_orphan_cleanup(struct super_block *sb, __func__, inode->i_ino, inode->i_size); jbd_debug(2, "truncating inode %lu to %lld bytes\n", inode->i_ino, inode->i_size); + mutex_lock(&inode->i_mutex); ext4_truncate(inode); + mutex_unlock(&inode->i_mutex); nr_truncates++; } else { ext4_msg(sb, KERN_DEBUG, -- cgit v1.1 From de3ecca3e3d55f2bb3118839a8fd709362d55678 Mon Sep 17 00:00:00 2001 From: Namjae Jeon Date: Wed, 10 Oct 2012 00:08:56 +0900 Subject: udf: fix memory leak while allocating blocks during write commit 2fb7d99d0de3fd8ae869f35ab682581d8455887a upstream. Need to brelse the buffer_head stored in cur_epos and next_epos. Signed-off-by: Namjae Jeon Signed-off-by: Ashish Sangwan Signed-off-by: Jan Kara Signed-off-by: Shuah Khan Signed-off-by: Greg Kroah-Hartman --- fs/udf/inode.c | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'fs') diff --git a/fs/udf/inode.c b/fs/udf/inode.c index 262050f..31946c5 100644 --- a/fs/udf/inode.c +++ b/fs/udf/inode.c @@ -738,6 +738,8 @@ static struct buffer_head *inode_getblk(struct inode *inode, sector_t block, goal, err); if (!newblocknum) { brelse(prev_epos.bh); + brelse(cur_epos.bh); + brelse(next_epos.bh); *err = -ENOSPC; return NULL; } @@ -768,6 +770,8 @@ static struct buffer_head *inode_getblk(struct inode *inode, sector_t block, udf_update_extents(inode, laarr, startnum, endnum, &prev_epos); brelse(prev_epos.bh); + brelse(cur_epos.bh); + brelse(next_epos.bh); newblock = udf_get_pblock(inode->i_sb, newblocknum, iinfo->i_location.partitionReferenceNum, 0); -- cgit v1.1 From cdfcbd8d74fdfd7a0d55330a1b60411b153ae71b Mon Sep 17 00:00:00 2001 From: Namjae Jeon Date: Wed, 10 Oct 2012 00:09:12 +0900 Subject: udf: don't increment lenExtents while writing to a hole commit fb719c59bdb4fca86ee1fd1f42ab3735ca12b6b2 upstream. Incrementing lenExtents even while writing to a hole is bad for performance as calls to udf_discard_prealloc and udf_truncate_tail_extent would not return from start if isize != lenExtents Signed-off-by: Namjae Jeon Signed-off-by: Ashish Sangwan Signed-off-by: Jan Kara Signed-off-by: Shuah Khan Signed-off-by: Greg Kroah-Hartman --- fs/udf/inode.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) (limited to 'fs') diff --git a/fs/udf/inode.c b/fs/udf/inode.c index 31946c5..957c974 100644 --- a/fs/udf/inode.c +++ b/fs/udf/inode.c @@ -575,6 +575,7 @@ static struct buffer_head *inode_getblk(struct inode *inode, sector_t block, struct udf_inode_info *iinfo = UDF_I(inode); int goal = 0, pgoal = iinfo->i_location.logicalBlockNum; int lastblock = 0; + bool isBeyondEOF; prev_epos.offset = udf_file_entry_alloc_offset(inode); prev_epos.block = iinfo->i_location; @@ -653,7 +654,7 @@ static struct buffer_head *inode_getblk(struct inode *inode, sector_t block, /* Are we beyond EOF? */ if (etype == -1) { int ret; - + isBeyondEOF = 1; if (count) { if (c) laarr[0] = laarr[1]; @@ -696,6 +697,7 @@ static struct buffer_head *inode_getblk(struct inode *inode, sector_t block, endnum = c + 1; lastblock = 1; } else { + isBeyondEOF = 0; endnum = startnum = ((count > 2) ? 2 : count); /* if the current extent is in position 0, @@ -743,7 +745,8 @@ static struct buffer_head *inode_getblk(struct inode *inode, sector_t block, *err = -ENOSPC; return NULL; } - iinfo->i_lenExtents += inode->i_sb->s_blocksize; + if (isBeyondEOF) + iinfo->i_lenExtents += inode->i_sb->s_blocksize; } /* if the extent the requsted block is located in contains multiple -- cgit v1.1 From 93b40265d0c287271b19320574a76070d76d5a3b Mon Sep 17 00:00:00 2001 From: Eric Wong Date: Tue, 1 Jan 2013 21:20:27 +0000 Subject: epoll: prevent missed events on EPOLL_CTL_MOD commit 128dd1759d96ad36c379240f8b9463e8acfd37a1 upstream. EPOLL_CTL_MOD sets the interest mask before calling f_op->poll() to ensure events are not missed. Since the modifications to the interest mask are not protected by the same lock as ep_poll_callback, we need to ensure the change is visible to other CPUs calling ep_poll_callback. We also need to ensure f_op->poll() has an up-to-date view of past events which occured before we modified the interest mask. So this barrier also pairs with the barrier in wq_has_sleeper(). This should guarantee either ep_poll_callback or f_op->poll() (or both) will notice the readiness of a recently-ready/modified item. This issue was encountered by Andreas Voellmy and Junchang(Jason) Wang in: http://thread.gmane.org/gmane.linux.kernel/1408782/ Signed-off-by: Eric Wong Cc: Hans Verkuil Cc: Jiri Olsa Cc: Jonathan Corbet Cc: Al Viro Cc: Davide Libenzi Cc: Hans de Goede Cc: Mauro Carvalho Chehab Cc: David Miller Cc: Eric Dumazet Cc: Andrew Morton Cc: Andreas Voellmy Tested-by: "Junchang(Jason) Wang" Cc: netdev@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Linus Torvalds [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- fs/eventpoll.c | 22 +++++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 35a852a..35a1a61 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -1197,10 +1197,30 @@ static int ep_modify(struct eventpoll *ep, struct epitem *epi, struct epoll_even * otherwise we might miss an event that happens between the * f_op->poll() call and the new event set registering. */ - epi->event.events = event->events; + epi->event.events = event->events; /* need barrier below */ epi->event.data = event->data; /* protected by mtx */ /* + * The following barrier has two effects: + * + * 1) Flush epi changes above to other CPUs. This ensures + * we do not miss events from ep_poll_callback if an + * event occurs immediately after we call f_op->poll(). + * We need this because we did not take ep->lock while + * changing epi above (but ep_poll_callback does take + * ep->lock). + * + * 2) We also need to ensure we do not miss _past_ events + * when calling f_op->poll(). This barrier also + * pairs with the barrier in wq_has_sleeper (see + * comments for wq_has_sleeper). + * + * This barrier will now guarantee ep_poll_callback or f_op->poll + * (or both) will notice the readiness of an item. + */ + smp_mb(); + + /* * Get current event bits. We can safely use the file* here because * its usage count has been increased by the caller of this function. */ -- cgit v1.1 From d668f92bf357683de292ce7538fc7b1186354004 Mon Sep 17 00:00:00 2001 From: Benjamin Marzinski Date: Wed, 7 Nov 2012 00:38:06 -0600 Subject: GFS2: Test bufdata with buffer locked and gfs2_log_lock held commit 96e5d1d3adf56f1c7eeb07258f6a1a0a7ae9c489 upstream. In gfs2_trans_add_bh(), gfs2 was testing if a there was a bd attached to the buffer without having the gfs2_log_lock held. It was then assuming it would stay attached for the rest of the function. However, without either the log lock being held of the buffer locked, __gfs2_ail_flush() could detach bd at any time. This patch moves the locking before the test. If there isn't a bd already attached, gfs2 can safely allocate one and attach it before locking. There is no way that the newly allocated bd could be on the ail list, and thus no way for __gfs2_ail_flush() to detach it. Signed-off-by: Benjamin Marzinski Signed-off-by: Steven Whitehouse [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings --- fs/gfs2/lops.c | 18 ++++-------------- fs/gfs2/trans.c | 8 ++++++++ 2 files changed, 12 insertions(+), 14 deletions(-) (limited to 'fs') diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c index 05bbb12..c133253 100644 --- a/fs/gfs2/lops.c +++ b/fs/gfs2/lops.c @@ -139,16 +139,14 @@ static void buf_lo_add(struct gfs2_sbd *sdp, struct gfs2_log_element *le) struct gfs2_meta_header *mh; struct gfs2_trans *tr; - lock_buffer(bd->bd_bh); - gfs2_log_lock(sdp); if (!list_empty(&bd->bd_list_tr)) - goto out; + return; tr = current->journal_info; tr->tr_touched = 1; tr->tr_num_buf++; list_add(&bd->bd_list_tr, &tr->tr_list_buf); if (!list_empty(&le->le_list)) - goto out; + return; set_bit(GLF_LFLUSH, &bd->bd_gl->gl_flags); set_bit(GLF_DIRTY, &bd->bd_gl->gl_flags); gfs2_meta_check(sdp, bd->bd_bh); @@ -159,9 +157,6 @@ static void buf_lo_add(struct gfs2_sbd *sdp, struct gfs2_log_element *le) sdp->sd_log_num_buf++; list_add(&le->le_list, &sdp->sd_log_le_buf); tr->tr_num_buf_new++; -out: - gfs2_log_unlock(sdp); - unlock_buffer(bd->bd_bh); } static void buf_lo_before_commit(struct gfs2_sbd *sdp) @@ -528,11 +523,9 @@ static void databuf_lo_add(struct gfs2_sbd *sdp, struct gfs2_log_element *le) struct address_space *mapping = bd->bd_bh->b_page->mapping; struct gfs2_inode *ip = GFS2_I(mapping->host); - lock_buffer(bd->bd_bh); - gfs2_log_lock(sdp); if (tr) { if (!list_empty(&bd->bd_list_tr)) - goto out; + return; tr->tr_touched = 1; if (gfs2_is_jdata(ip)) { tr->tr_num_buf++; @@ -540,7 +533,7 @@ static void databuf_lo_add(struct gfs2_sbd *sdp, struct gfs2_log_element *le) } } if (!list_empty(&le->le_list)) - goto out; + return; set_bit(GLF_LFLUSH, &bd->bd_gl->gl_flags); set_bit(GLF_DIRTY, &bd->bd_gl->gl_flags); @@ -552,9 +545,6 @@ static void databuf_lo_add(struct gfs2_sbd *sdp, struct gfs2_log_element *le) } else { list_add_tail(&le->le_list, &sdp->sd_log_le_ordered); } -out: - gfs2_log_unlock(sdp); - unlock_buffer(bd->bd_bh); } static void gfs2_check_magic(struct buffer_head *bh) diff --git a/fs/gfs2/trans.c b/fs/gfs2/trans.c index 9ec73a8..e6453c3 100644 --- a/fs/gfs2/trans.c +++ b/fs/gfs2/trans.c @@ -145,14 +145,22 @@ void gfs2_trans_add_bh(struct gfs2_glock *gl, struct buffer_head *bh, int meta) struct gfs2_sbd *sdp = gl->gl_sbd; struct gfs2_bufdata *bd; + lock_buffer(bh); + gfs2_log_lock(sdp); bd = bh->b_private; if (bd) gfs2_assert(sdp, bd->bd_gl == gl); else { + gfs2_log_unlock(sdp); + unlock_buffer(bh); gfs2_attach_bufdata(gl, bh, meta); bd = bh->b_private; + lock_buffer(bh); + gfs2_log_lock(sdp); } lops_add(sdp, &bd->bd_le); + gfs2_log_unlock(sdp); + unlock_buffer(bh); } void gfs2_trans_add_revoke(struct gfs2_sbd *sdp, struct gfs2_bufdata *bd) -- cgit v1.1 From 97765f30c49d3023a6dafe7c48c7e520fca441f8 Mon Sep 17 00:00:00 2001 From: Eric Sandeen Date: Wed, 14 Nov 2012 22:22:05 -0500 Subject: ext4: init pagevec in ext4_da_block_invalidatepages commit 66bea92c69477a75a5d37b9bfed5773c92a3c4b4 upstream. ext4_da_block_invalidatepages is missing a pagevec_init(), which means that pvec->cold contains random garbage. This affects whether the page goes to the front or back of the LRU when ->cold makes it to free_hot_cold_page() Reviewed-by: Lukas Czerner Reviewed-by: Carlos Maiolino Signed-off-by: Eric Sandeen Signed-off-by: "Theodore Ts'o" Signed-off-by: CAI Qian Signed-off-by: Greg Kroah-Hartman --- fs/ext4/inode.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'fs') diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 1dbf758..7e56946 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -2199,6 +2199,8 @@ static void ext4_da_block_invalidatepages(struct mpage_da_data *mpd) index = mpd->first_page; end = mpd->next_page - 1; + + pagevec_init(&pvec, 0); while (index <= end) { nr_pages = pagevec_lookup(&pvec, mapping, index, PAGEVEC_SIZE); if (nr_pages == 0) -- cgit v1.1 From 833a2e184ad1464cda69d569788e1faa37191f37 Mon Sep 17 00:00:00 2001 From: Cong Ding Date: Tue, 22 Jan 2013 19:20:58 -0500 Subject: fs/cifs/cifs_dfs_ref.c: fix potential memory leakage commit 10b8c7dff5d3633b69e77f57d404dab54ead3787 upstream. When it goes to error through line 144, the memory allocated to *devname is not freed, and the caller doesn't free it either in line 250. So we free the memroy of *devname in function cifs_compose_mount_options() when it goes to error. Signed-off-by: Cong Ding Reviewed-by: Jeff Layton Signed-off-by: Steve French Signed-off-by: Greg Kroah-Hartman --- fs/cifs/cifs_dfs_ref.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'fs') diff --git a/fs/cifs/cifs_dfs_ref.c b/fs/cifs/cifs_dfs_ref.c index 8d8f28c..51feb1a 100644 --- a/fs/cifs/cifs_dfs_ref.c +++ b/fs/cifs/cifs_dfs_ref.c @@ -225,6 +225,8 @@ compose_mount_options_out: compose_mount_options_err: kfree(mountdata); mountdata = ERR_PTR(rc); + kfree(*devname); + *devname = NULL; goto compose_mount_options_out; } -- cgit v1.1 From 466affeee2693763724e1ee351788455cfd1ba41 Mon Sep 17 00:00:00 2001 From: Vyacheslav Dubeyko Date: Mon, 4 Feb 2013 14:28:41 -0800 Subject: nilfs2: fix fix very long mount time issue commit a9bae189542e71f91e61a4428adf6e5a7dfe8063 upstream. There exists a situation when GC can work in background alone without any other filesystem activity during significant time. The nilfs_clean_segments() method calls nilfs_segctor_construct() that updates superblocks in the case of NILFS_SC_SUPER_ROOT and THE_NILFS_DISCONTINUED flags are set. But when GC is working alone the nilfs_clean_segments() is called with unset THE_NILFS_DISCONTINUED flag. As a result, the update of superblocks doesn't occurred all this time and in the case of SPOR superblocks keep very old values of last super root placement. SYMPTOMS: Trying to mount a NILFS2 volume after SPOR in such environment ends with very long mounting time (it can achieve about several hours in some cases). REPRODUCING PATH: 1. It needs to use external USB HDD, disable automount and doesn't make any additional filesystem activity on the NILFS2 volume. 2. Generate temporary file with size about 100 - 500 GB (for example, dd if=/dev/zero of= bs=1073741824 count=200). The size of file defines duration of GC working. 3. Then it needs to delete file. 4. Start GC manually by means of command "nilfs-clean -p 0". When you start GC by means of such way then, at the end, superblocks is updated by once. So, for simulation of SPOR, it needs to wait sometime (15 - 40 minutes) and simply switch off USB HDD manually. 5. Switch on USB HDD again and try to mount NILFS2 volume. As a result, NILFS2 volume will mount during very long time. REPRODUCIBILITY: 100% FIX: This patch adds checking that superblocks need to update and set THE_NILFS_DISCONTINUED flag before nilfs_clean_segments() call. Reported-by: Sergey Alexandrov Signed-off-by: Vyacheslav Dubeyko Tested-by: Vyacheslav Dubeyko Acked-by: Ryusuke Konishi Tested-by: Ryusuke Konishi Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- fs/nilfs2/ioctl.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c index 0d1c9bd..cee648e 100644 --- a/fs/nilfs2/ioctl.c +++ b/fs/nilfs2/ioctl.c @@ -661,8 +661,11 @@ static int nilfs_ioctl_clean_segments(struct inode *inode, struct file *filp, if (ret < 0) printk(KERN_ERR "NILFS: GC failed during preparation: " "cannot read source blocks: err=%d\n", ret); - else + else { + if (nilfs_sb_need_update(nilfs)) + set_nilfs_discontinued(nilfs); ret = nilfs_clean_segments(inode->i_sb, argv, kbufs); + } nilfs_remove_all_gcinodes(nilfs); clear_nilfs_gc_running(nilfs); -- cgit v1.1 From 9f557574821b44a87d95756bf53f1b7fb76eb4d4 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Sun, 6 Jan 2013 18:21:49 +0000 Subject: tcp: fix MSG_SENDPAGE_NOTLAST logic [ Upstream commit ae62ca7b03217be5e74759dc6d7698c95df498b3 ] commit 35f9c09fe9c72e (tcp: tcp_sendpages() should call tcp_push() once) added an internal flag : MSG_SENDPAGE_NOTLAST meant to be set on all frags but the last one for a splice() call. The condition used to set the flag in pipe_to_sendpage() relied on splice() user passing the exact number of bytes present in the pipe, or a smaller one. But some programs pass an arbitrary high value, and the test fails. The effect of this bug is a lack of tcp_push() at the end of a splice(pipe -> socket) call, and possibly very slow or erratic TCP sessions. We should both test sd->total_len and fact that another fragment is in the pipe (pipe->nrbufs > 1) Many thanks to Willy for providing very clear bug report, bisection and test programs. Reported-by: Willy Tarreau Bisected-by: Willy Tarreau Tested-by: Willy Tarreau Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- fs/splice.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/splice.c b/fs/splice.c index 9d89008..ea92b7c 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -693,8 +693,10 @@ static int pipe_to_sendpage(struct pipe_inode_info *pipe, return -EINVAL; more = (sd->flags & SPLICE_F_MORE) ? MSG_MORE : 0; - if (sd->len < sd->total_len) + + if (sd->len < sd->total_len && pipe->nrbufs > 1) more |= MSG_SENDPAGE_NOTLAST; + return file->f_op->sendpage(file, buf->page, buf->offset, sd->len, &pos, more); } -- cgit v1.1