aboutsummaryrefslogtreecommitdiffstats
path: root/fs/nilfs2
Commit message (Collapse)AuthorAgeFilesLines
* nilfs2: fix sanity check of btree level in nilfs_btree_root_broken()Ryusuke Konishi2015-08-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | commit d8fd150fe3935e1692bf57c66691e17409ebb9c1 upstream. The range check for b-tree level parameter in nilfs_btree_root_broken() is wrong; it accepts the case of "level == NILFS_BTREE_LEVEL_MAX" even though the level is limited to values in the range of 0 to (NILFS_BTREE_LEVEL_MAX - 1). Since the level parameter is read from storage device and used to index nilfs_btree_path array whose element count is NILFS_BTREE_LEVEL_MAX, it can cause memory overrun during btree operations if the boundary value is set to the level parameter on device. This fixes the broken sanity check and adds a comment to clarify that the upper bound NILFS_BTREE_LEVEL_MAX is exclusive. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* nilfs2: fix deadlock of segment constructor during recoveryRyusuke Konishi2015-05-091-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 283ee1482f349d6c0c09dfb725db5880afc56813 upstream. According to a report from Yuxuan Shui, nilfs2 in kernel 3.19 got stuck during recovery at mount time. The code path that caused the deadlock was as follows: nilfs_fill_super() load_nilfs() nilfs_salvage_orphan_logs() * Do roll-forwarding, attach segment constructor for recovery, and kick it. nilfs_segctor_thread() nilfs_segctor_thread_construct() * A lock is held with nilfs_transaction_lock() nilfs_segctor_do_construct() nilfs_segctor_drop_written_files() iput() iput_final() write_inode_now() writeback_single_inode() __writeback_single_inode() do_writepages() nilfs_writepage() nilfs_construct_dsync_segment() nilfs_transaction_lock() --> deadlock This can happen if commit 7ef3ff2fea8b ("nilfs2: fix deadlock of segment constructor over I_SYNC flag") is applied and roll-forward recovery was performed at mount time. The roll-forward recovery can happen if datasync write is done and the file system crashes immediately after that. For instance, we can reproduce the issue with the following steps: < nilfs2 is mounted on /nilfs (device: /dev/sdb1) > # dd if=/dev/zero of=/nilfs/test bs=4k count=1 && sync # dd if=/dev/zero of=/nilfs/test conv=notrunc oflag=dsync bs=4k count=1 && reboot -nfh < the system will immediately reboot > # mount -t nilfs2 /dev/sdb1 /nilfs The deadlock occurs because iput() can run segment constructor through writeback_single_inode() if MS_ACTIVE flag is not set on sb->s_flags. The above commit changed segment constructor so that it calls iput() asynchronously for inodes with i_nlink == 0, but that change was imperfect. This fixes the another deadlock by deferring iput() in segment constructor even for the case that mount is not finished, that is, for the case that MS_ACTIVE flag is not set. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Reported-by: Yuxuan Shui <yshuiv7@gmail.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* nilfs2: fix potential memory overrun on inodeRyusuke Konishi2015-05-091-3/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 957ed60b53b519064a54988c4e31e0087e47d091 upstream. Each inode of nilfs2 stores a root node of a b-tree, and it turned out to have a memory overrun issue: Each b-tree node of nilfs2 stores a set of key-value pairs and the number of them (in "bn_nchildren" member of nilfs_btree_node struct), as well as a few other "bn_*" members. Since the value of "bn_nchildren" is used for operations on the key-values within the b-tree node, it can cause memory access overrun if a large number is incorrectly set to "bn_nchildren". For instance, nilfs_btree_node_lookup() function determines the range of binary search with it, and too large "bn_nchildren" leads nilfs_btree_node_get_key() in that function to overrun. As for intermediate b-tree nodes, this is prevented by a sanity check performed when each node is read from a drive, however, no sanity check has been done for root nodes stored in inodes. This patch fixes the issue by adding missing sanity check against b-tree root nodes so that it's called when on-memory inodes are read from ifile, inode metadata file. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* nilfs2: fix deadlock of segment constructor over I_SYNC flagRyusuke Konishi2015-03-063-7/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 7ef3ff2fea8bf5e4a21cef47ad87710a3d0fdb52 upstream. Nilfs2 eventually hangs in a stress test with fsstress program. This issue was caused by the following deadlock over I_SYNC flag between nilfs_segctor_thread() and writeback_sb_inodes(): nilfs_segctor_thread() nilfs_segctor_thread_construct() nilfs_segctor_unlock() nilfs_dispose_list() iput() iput_final() evict() inode_wait_for_writeback() * wait for I_SYNC flag writeback_sb_inodes() * set I_SYNC flag on inode->i_state __writeback_single_inode() do_writepages() nilfs_writepages() nilfs_construct_dsync_segment() nilfs_segctor_sync() * wait for completion of segment constructor inode_sync_complete() * clear I_SYNC flag after __writeback_single_inode() completed writeback_sb_inodes() calls do_writepages() for dirty inodes after setting I_SYNC flag on inode->i_state. do_writepages() in turn calls nilfs_writepages(), which can run segment constructor and wait for its completion. On the other hand, segment constructor calls iput(), which can call evict() and wait for the I_SYNC flag on inode_wait_for_writeback(). Since segment constructor doesn't know when I_SYNC will be set, it cannot know whether iput() will block or not unless inode->i_nlink has a non-zero count. We can prevent evict() from being called in iput() by implementing sop->drop_inode(), but it's not preferable to leave inodes with i_nlink == 0 for long periods because it even defers file truncation and inode deallocation. So, this instead resolves the deadlock by calling iput() asynchronously with a workqueue for inodes with i_nlink == 0. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Al Viro <viro@zeniv.linux.org.uk> Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* nilfs2: fix data loss with mmap()Andreas Rohner2014-11-051-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 56d7acc792c0d98f38f22058671ee715ff197023 upstream. This bug leads to reproducible silent data loss, despite the use of msync(), sync() and a clean unmount of the file system. It is easily reproducible with the following script: ----------------[BEGIN SCRIPT]-------------------- mkfs.nilfs2 -f /dev/sdb mount /dev/sdb /mnt dd if=/dev/zero bs=1M count=30 of=/mnt/testfile umount /mnt mount /dev/sdb /mnt CHECKSUM_BEFORE="$(md5sum /mnt/testfile)" /root/mmaptest/mmaptest /mnt/testfile 30 10 5 sync CHECKSUM_AFTER="$(md5sum /mnt/testfile)" umount /mnt mount /dev/sdb /mnt CHECKSUM_AFTER_REMOUNT="$(md5sum /mnt/testfile)" umount /mnt echo "BEFORE MMAP:\t$CHECKSUM_BEFORE" echo "AFTER MMAP:\t$CHECKSUM_AFTER" echo "AFTER REMOUNT:\t$CHECKSUM_AFTER_REMOUNT" ----------------[END SCRIPT]-------------------- The mmaptest tool looks something like this (very simplified, with error checking removed): ----------------[BEGIN mmaptest]-------------------- data = mmap(NULL, file_size - file_offset, PROT_READ | PROT_WRITE, MAP_SHARED, fd, file_offset); for (i = 0; i < write_count; ++i) { memcpy(data + i * 4096, buf, sizeof(buf)); msync(data, file_size - file_offset, MS_SYNC)) } ----------------[END mmaptest]-------------------- The output of the script looks something like this: BEFORE MMAP: 281ed1d5ae50e8419f9b978aab16de83 /mnt/testfile AFTER MMAP: 6604a1c31f10780331a6850371b3a313 /mnt/testfile AFTER REMOUNT: 281ed1d5ae50e8419f9b978aab16de83 /mnt/testfile So it is clear, that the changes done using mmap() do not survive a remount. This can be reproduced a 100% of the time. The problem was introduced in commit 136e8770cd5d ("nilfs2: fix issue of nilfs_set_page_dirty() for page at EOF boundary"). If the page was read with mpage_readpage() or mpage_readpages() for example, then it has no buffers attached to it. In that case page_has_buffers(page) in nilfs_set_page_dirty() will be false. Therefore nilfs_set_file_dirty() is never called and the pages are never collected and never written to disk. This patch fixes the problem by also calling nilfs_set_file_dirty() if the page has no buffers attached to it. [akpm@linux-foundation.org: s/PAGE_SHIFT/PAGE_CACHE_SHIFT/] Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net> Tested-by: Andreas Rohner <andreas.rohner@gmx.net> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* nilfs2: fix segctor bug that causes file system corruptionAndreas Rohner2014-02-151-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 70f2fe3a26248724d8a5019681a869abdaf3e89a upstream. There is a bug in the function nilfs_segctor_collect, which results in active data being written to a segment, that is marked as clean. It is possible, that this segment is selected for a later segment construction, whereby the old data is overwritten. The problem shows itself with the following kernel log message: nilfs_sufile_do_cancel_free: segment 6533 must be clean Usually a few hours later the file system gets corrupted: NILFS: bad btree node (blocknr=8748107): level = 0, flags = 0x0, nchildren = 0 NILFS error (device sdc1): nilfs_bmap_last_key: broken bmap (inode number=114660) The issue can be reproduced with a file system that is nearly full and with the cleaner running, while some IO intensive task is running. Although it is quite hard to reproduce. This is what happens: 1. The cleaner starts the segment construction 2. nilfs_segctor_collect is called 3. sc_stage is on NILFS_ST_SUFILE and segments are freed 4. sc_stage is on NILFS_ST_DAT current segment is full 5. nilfs_segctor_extend_segments is called, which allocates a new segment 6. The new segment is one of the segments freed in step 3 7. nilfs_sufile_cancel_freev is called and produces an error message 8. Loop around and the collection starts again 9. sc_stage is on NILFS_ST_SUFILE and segments are freed including the newly allocated segment, which will contain active data and can be allocated at a later time 10. A few hours later another segment construction allocates the segment and causes file system corruption This can be prevented by simply reordering the statements. If nilfs_sufile_cancel_freev is called before nilfs_segctor_extend_segments the freed segments are marked as dirty and cannot be allocated any more. Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net> Reviewed-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: Andreas Rohner <andreas.rohner@gmx.net> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* nilfs2: fix issue with race condition of competition between segments for ↵Vyacheslav Dubeyko2013-10-262-2/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dirty blocks commit 7f42ec3941560f0902fe3671e36f2c20ffd3af0a upstream. Many NILFS2 users were reported about strange file system corruption (for example): NILFS: bad btree node (blocknr=185027): level = 0, flags = 0x0, nchildren = 768 NILFS error (device sda4): nilfs_bmap_last_key: broken bmap (inode number=11540) But such error messages are consequence of file system's issue that takes place more earlier. Fortunately, Jerome Poulin <jeromepoulin@gmail.com> and Anton Eliasson <devel@antoneliasson.se> were reported about another issue not so recently. These reports describe the issue with segctor thread's crash: BUG: unable to handle kernel paging request at 0000000000004c83 IP: nilfs_end_page_io+0x12/0xd0 [nilfs2] Call Trace: nilfs_segctor_do_construct+0xf25/0x1b20 [nilfs2] nilfs_segctor_construct+0x17b/0x290 [nilfs2] nilfs_segctor_thread+0x122/0x3b0 [nilfs2] kthread+0xc0/0xd0 ret_from_fork+0x7c/0xb0 These two issues have one reason. This reason can raise third issue too. Third issue results in hanging of segctor thread with eating of 100% CPU. REPRODUCING PATH: One of the possible way or the issue reproducing was described by Jermoe me Poulin <jeromepoulin@gmail.com>: 1. init S to get to single user mode. 2. sysrq+E to make sure only my shell is running 3. start network-manager to get my wifi connection up 4. login as root and launch "screen" 5. cd /boot/log/nilfs which is a ext3 mount point and can log when NILFS dies. 6. lscp | xz -9e > lscp.txt.xz 7. mount my snapshot using mount -o cp=3360839,ro /dev/vgUbuntu/root /mnt/nilfs 8. start a screen to dump /proc/kmsg to text file since rsyslog is killed 9. start a screen and launch strace -f -o find-cat.log -t find /mnt/nilfs -type f -exec cat {} > /dev/null \; 10. start a screen and launch strace -f -o apt-get.log -t apt-get update 11. launch the last command again as it did not crash the first time 12. apt-get crashes 13. ps aux > ps-aux-crashed.log 13. sysrq+W 14. sysrq+E wait for everything to terminate 15. sysrq+SUSB Simplified way of the issue reproducing is starting kernel compilation task and "apt-get update" in parallel. REPRODUCIBILITY: The issue is reproduced not stable [60% - 80%]. It is very important to have proper environment for the issue reproducing. The critical conditions for successful reproducing: (1) It should have big modified file by mmap() way. (2) This file should have the count of dirty blocks are greater that several segments in size (for example, two or three) from time to time during processing. (3) It should be intensive background activity of files modification in another thread. INVESTIGATION: First of all, it is possible to see that the reason of crash is not valid page address: NILFS [nilfs_segctor_complete_write]:2100 bh->b_count 0, bh->b_blocknr 13895680, bh->b_size 13897727, bh->b_page 0000000000001a82 NILFS [nilfs_segctor_complete_write]:2101 segbuf->sb_segnum 6783 Moreover, value of b_page (0x1a82) is 6786. This value looks like segment number. And b_blocknr with b_size values look like block numbers. So, buffer_head's pointer points on not proper address value. Detailed investigation of the issue is discovered such picture: [-----------------------------SEGMENT 6783-------------------------------] NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect NILFS [nilfs_segctor_do_construct]:2336 nilfs_segctor_assign NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111149024, segbuf->sb_segnum 6783 [-----------------------------SEGMENT 6784-------------------------------] NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect NILFS [nilfs_lookup_dirty_data_buffers]:782 bh->b_count 1, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824 NILFS [nilfs_lookup_dirty_data_buffers]:783 bh->b_assoc_buffers.next ffff8802174a6798, bh->b_assoc_buffers.prev ffff880221cffee8 NILFS [nilfs_segctor_do_construct]:2336 nilfs_segctor_assign NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write NILFS [nilfs_segbuf_submit_bh]:575 bh->b_count 1, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824 NILFS [nilfs_segbuf_submit_bh]:576 segbuf->sb_segnum 6784 NILFS [nilfs_segbuf_submit_bh]:577 bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880218bcdf50 NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111150080, segbuf->sb_segnum 6784, segbuf->sb_nbio 0 [----------] ditto NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111164416, segbuf->sb_segnum 6784, segbuf->sb_nbio 15 [-----------------------------SEGMENT 6785-------------------------------] NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect NILFS [nilfs_lookup_dirty_data_buffers]:782 bh->b_count 2, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824 NILFS [nilfs_lookup_dirty_data_buffers]:783 bh->b_assoc_buffers.next ffff880219277e80, bh->b_assoc_buffers.prev ffff880221cffc88 NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write NILFS [nilfs_segbuf_submit_bh]:575 bh->b_count 2, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824 NILFS [nilfs_segbuf_submit_bh]:576 segbuf->sb_segnum 6785 NILFS [nilfs_segbuf_submit_bh]:577 bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880222cc7ee8 NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111165440, segbuf->sb_segnum 6785, segbuf->sb_nbio 0 [----------] ditto NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111177728, segbuf->sb_segnum 6785, segbuf->sb_nbio 12 NILFS [nilfs_segctor_do_construct]:2399 nilfs_segctor_wait NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6783 NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6784 NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6785 NILFS [nilfs_segctor_complete_write]:2100 bh->b_count 0, bh->b_blocknr 13895680, bh->b_size 13897727, bh->b_page 0000000000001a82 BUG: unable to handle kernel paging request at 0000000000001a82 IP: [<ffffffffa024d0f2>] nilfs_end_page_io+0x12/0xd0 [nilfs2] Usually, for every segment we collect dirty files in list. Then, dirty blocks are gathered for every dirty file, prepared for write and submitted by means of nilfs_segbuf_submit_bh() call. Finally, it takes place complete write phase after calling nilfs_end_bio_write() on the block layer. Buffers/pages are marked as not dirty on final phase and processed files removed from the list of dirty files. It is possible to see that we had three prepare_write and submit_bio phases before segbuf_wait and complete_write phase. Moreover, segments compete between each other for dirty blocks because on every iteration of segments processing dirty buffer_heads are added in several lists of payload_buffers: [SEGMENT 6784]: bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880218bcdf50 [SEGMENT 6785]: bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880222cc7ee8 The next pointer is the same but prev pointer has changed. It means that buffer_head has next pointer from one list but prev pointer from another. Such modification can be made several times. And, finally, it can be resulted in various issues: (1) segctor hanging, (2) segctor crashing, (3) file system metadata corruption. FIX: This patch adds: (1) setting of BH_Async_Write flag in nilfs_segctor_prepare_write() for every proccessed dirty block; (2) checking of BH_Async_Write flag in nilfs_lookup_dirty_data_buffers() and nilfs_lookup_dirty_node_buffers(); (3) clearing of BH_Async_Write flag in nilfs_segctor_complete_write(), nilfs_abort_logs(), nilfs_forget_buffer(), nilfs_clear_dirty_page(). Reported-by: Jerome Poulin <jeromepoulin@gmail.com> Reported-by: Anton Eliasson <devel@antoneliasson.se> Cc: Paul Fertser <fercerpav@gmail.com> Cc: ARAI Shun-ichi <hermes@ceres.dti.ne.jp> Cc: Piotr Szymaniak <szarpaj@grubelek.pl> Cc: Juan Barry Manuel Canham <Linux@riotingpacifist.net> Cc: Zahid Chowdhury <zahid.chowdhury@starsolutions.com> Cc: Elmer Zhang <freeboy6716@gmail.com> Cc: Kenneth Langga <klangga@gmail.com> Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com> Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [bwh: Backported to 3.2: nilfs_clear_dirty_page() has not been separated from nilfs_clear_dirty_pages()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* nilfs2: fix issue with counting number of bio requests for BIO_EOPNOTSUPP ↵Vyacheslav Dubeyko2013-09-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | error detection commit 4bf93b50fd04118ac7f33a3c2b8a0a1f9fa80bc9 upstream. Fix the issue with improper counting number of flying bio requests for BIO_EOPNOTSUPP error detection case. The sb_nbio must be incremented exactly the same number of times as complete() function was called (or will be called) because nilfs_segbuf_wait() will call wail_for_completion() for the number of times set to sb_nbio: do { wait_for_completion(&segbuf->sb_bio_event); } while (--segbuf->sb_nbio > 0); Two functions complete() and wait_for_completion() must be called the same number of times for the same sb_bio_event. Otherwise, wait_for_completion() will hang or leak. Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* nilfs2: remove double bio_put() in nilfs_end_bio_write() for BIO_EOPNOTSUPP ↵Vyacheslav Dubeyko2013-09-101-2/+1
| | | | | | | | | | | | | | | | | | error commit 2df37a19c686c2d7c4e9b4ce1505b5141e3e5552 upstream. Remove double call of bio_put() in nilfs_end_bio_write() for the case of BIO_EOPNOTSUPP error detection. The issue was found by Dan Carpenter and he suggests first version of the fix too. Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* nilfs2: fix issue of nilfs_set_page_dirty() for page at EOF boundaryRyusuke Konishi2013-05-301-4/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 136e8770cd5d1fe38b3c613100dd6dc4db6d4fa6 upstream. nilfs2: fix issue of nilfs_set_page_dirty for page at EOF boundary DESCRIPTION: There are use-cases when NILFS2 file system (formatted with block size lesser than 4 KB) can be remounted in RO mode because of encountering of "broken bmap" issue. The issue was reported by Anthony Doggett <Anthony2486@interfaces.org.uk>: "The machine I've been trialling nilfs on is running Debian Testing, Linux version 3.2.0-4-686-pae (debian-kernel@lists.debian.org) (gcc version 4.6.3 (Debian 4.6.3-14) ) #1 SMP Debian 3.2.35-2), but I've also reproduced it (identically) with Debian Unstable amd64 and Debian Experimental (using the 3.8-trunk kernel). The problematic partitions were formatted with "mkfs.nilfs2 -b 1024 -B 8192"." SYMPTOMS: (1) System log contains error messages likewise: [63102.496756] nilfs_direct_assign: invalid pointer: 0 [63102.496786] NILFS error (device dm-17): nilfs_bmap_assign: broken bmap (inode number=28) [63102.496798] [63102.524403] Remounting filesystem read-only (2) The NILFS2 file system is remounted in RO mode. REPRODUSING PATH: (1) Create volume group with name "unencrypted" by means of vgcreate utility. (2) Run script (prepared by Anthony Doggett <Anthony2486@interfaces.org.uk>): ----------------[BEGIN SCRIPT]-------------------- VG=unencrypted lvcreate --size 2G --name ntest $VG mkfs.nilfs2 -b 1024 -B 8192 /dev/mapper/$VG-ntest mkdir /var/tmp/n mkdir /var/tmp/n/ntest mount /dev/mapper/$VG-ntest /var/tmp/n/ntest mkdir /var/tmp/n/ntest/thedir cd /var/tmp/n/ntest/thedir sleep 2 date darcs init sleep 2 dmesg|tail -n 5 date darcs whatsnew || true date sleep 2 dmesg|tail -n 5 ----------------[END SCRIPT]-------------------- REPRODUCIBILITY: 100% INVESTIGATION: As it was discovered, the issue takes place during segment construction after executing such sequence of user-space operations: open("_darcs/index", O_RDWR|O_CREAT|O_NOCTTY, 0666) = 7 fstat(7, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 ftruncate(7, 60) The error message "NILFS error (device dm-17): nilfs_bmap_assign: broken bmap (inode number=28)" takes place because of trying to get block number for third block of the file with logical offset #3072 bytes. As it is possible to see from above output, the file has 60 bytes of the whole size. So, it is enough one block (1 KB in size) allocation for the whole file. Trying to operate with several blocks instead of one takes place because of discovering several dirty buffers for this file in nilfs_segctor_scan_file() method. The root cause of this issue is in nilfs_set_page_dirty function which is called just before writing to an mmapped page. When nilfs_page_mkwrite function handles a page at EOF boundary, it fills hole blocks only inside EOF through __block_page_mkwrite(). The __block_page_mkwrite() function calls set_page_dirty() after filling hole blocks, thus nilfs_set_page_dirty function (= a_ops->set_page_dirty) is called. However, the current implementation of nilfs_set_page_dirty() wrongly marks all buffers dirty even for page at EOF boundary. As a result, buffers outside EOF are inconsistently marked dirty and queued for write even though they are not mapped with nilfs_get_block function. FIX: This modifies nilfs_set_page_dirty() not to mark hole blocks dirty. Thanks to Vyacheslav Dubeyko for his effort on analysis and proposals for this issue. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Reported-by: Anthony Doggett <Anthony2486@interfaces.org.uk> Reported-by: Vyacheslav Dubeyko <slava@dubeyko.com> Cc: Vyacheslav Dubeyko <slava@dubeyko.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* nilfs2: fix fix very long mount time issueVyacheslav Dubeyko2013-02-201-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit a9bae189542e71f91e61a4428adf6e5a7dfe8063 upstream. There exists a situation when GC can work in background alone without any other filesystem activity during significant time. The nilfs_clean_segments() method calls nilfs_segctor_construct() that updates superblocks in the case of NILFS_SC_SUPER_ROOT and THE_NILFS_DISCONTINUED flags are set. But when GC is working alone the nilfs_clean_segments() is called with unset THE_NILFS_DISCONTINUED flag. As a result, the update of superblocks doesn't occurred all this time and in the case of SPOR superblocks keep very old values of last super root placement. SYMPTOMS: Trying to mount a NILFS2 volume after SPOR in such environment ends with very long mounting time (it can achieve about several hours in some cases). REPRODUCING PATH: 1. It needs to use external USB HDD, disable automount and doesn't make any additional filesystem activity on the NILFS2 volume. 2. Generate temporary file with size about 100 - 500 GB (for example, dd if=/dev/zero of=<file_name> bs=1073741824 count=200). The size of file defines duration of GC working. 3. Then it needs to delete file. 4. Start GC manually by means of command "nilfs-clean -p 0". When you start GC by means of such way then, at the end, superblocks is updated by once. So, for simulation of SPOR, it needs to wait sometime (15 - 40 minutes) and simply switch off USB HDD manually. 5. Switch on USB HDD again and try to mount NILFS2 volume. As a result, NILFS2 volume will mount during very long time. REPRODUCIBILITY: 100% FIX: This patch adds checking that superblocks need to update and set THE_NILFS_DISCONTINUED flag before nilfs_clean_segments() call. Reported-by: Sergey Alexandrov <splavgm@gmail.com> Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com> Tested-by: Vyacheslav Dubeyko <slava@dubeyko.com> Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* nilfs2: fix deadlock issue between chcp and thaw ioctlsRyusuke Konishi2012-08-104-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 572d8b3945a31bee7c40d21556803e4807fd9141 upstream. An fs-thaw ioctl causes deadlock with a chcp or mkcp -s command: chcp D ffff88013870f3d0 0 1325 1324 0x00000004 ... Call Trace: nilfs_transaction_begin+0x11c/0x1a0 [nilfs2] wake_up_bit+0x20/0x20 copy_from_user+0x18/0x30 [nilfs2] nilfs_ioctl_change_cpmode+0x7d/0xcf [nilfs2] nilfs_ioctl+0x252/0x61a [nilfs2] do_page_fault+0x311/0x34c get_unmapped_area+0x132/0x14e do_vfs_ioctl+0x44b/0x490 __set_task_blocked+0x5a/0x61 vm_mmap_pgoff+0x76/0x87 __set_current_blocked+0x30/0x4a sys_ioctl+0x4b/0x6f system_call_fastpath+0x16/0x1b thaw D ffff88013870d890 0 1352 1351 0x00000004 ... Call Trace: rwsem_down_failed_common+0xdb/0x10f call_rwsem_down_write_failed+0x13/0x20 down_write+0x25/0x27 thaw_super+0x13/0x9e do_vfs_ioctl+0x1f5/0x490 vm_mmap_pgoff+0x76/0x87 sys_ioctl+0x4b/0x6f filp_close+0x64/0x6c system_call_fastpath+0x16/0x1b where the thaw ioctl deadlocked at thaw_super() when called while chcp was waiting at nilfs_transaction_begin() called from nilfs_ioctl_change_cpmode(). This deadlock is 100% reproducible. This is because nilfs_ioctl_change_cpmode() first locks sb->s_umount in read mode and then waits for unfreezing in nilfs_transaction_begin(), whereas thaw_super() locks sb->s_umount in write mode. The locking of sb->s_umount here was intended to make snapshot mounts and the downgrade of snapshots to checkpoints exclusive. This fixes the deadlock issue by replacing the sb->s_umount usage in nilfs_ioctl_change_cpmode() with a dedicated mutex which protects snapshot mounts. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* nilfs2: ensure proper cache clearing for gc-inodesRyusuke Konishi2012-07-042-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit fbb24a3a915f105016f1c828476be11aceac8504 upstream. A gc-inode is a pseudo inode used to buffer the blocks to be moved by garbage collection. Block caches of gc-inodes must be cleared every time a garbage collection function (nilfs_clean_segments) completes. Otherwise, stale blocks buffered in the caches may be wrongly reused in successive calls of the GC function. For user files, this is not a problem because their gc-inodes are distinguished by a checkpoint number as well as an inode number. They never buffer different blocks if either an inode number, a checkpoint number, or a block offset differs. However, gc-inodes of sufile, cpfile and DAT file can store different data for the same block offset. Thus, the nilfs_clean_segments function can move incorrect block for these meta-data files if an old block is cached. I found this is really causing meta-data corruption in nilfs. This fixes the issue by ensuring cache clear of gc-inodes and resolves reported GC problems including checkpoint file corruption, b-tree corruption, and the following warning during GC. nilfs_palloc_freev: entry number 307234 already freed. ... Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* nilfs2: fix NULL pointer dereference in nilfs_load_super_block()Ryusuke Konishi2012-03-231-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit d7178c79d9b7c5518f9943188091a75fc6ce0675 upstream. According to the report from Slicky Devil, nilfs caused kernel oops at nilfs_load_super_block function during mount after he shrank the partition without resizing the filesystem: BUG: unable to handle kernel NULL pointer dereference at 00000048 IP: [<d0d7a08e>] nilfs_load_super_block+0x17e/0x280 [nilfs2] *pde = 00000000 Oops: 0000 [#1] PREEMPT SMP ... Call Trace: [<d0d7a87b>] init_nilfs+0x4b/0x2e0 [nilfs2] [<d0d6f707>] nilfs_mount+0x447/0x5b0 [nilfs2] [<c0226636>] mount_fs+0x36/0x180 [<c023d961>] vfs_kern_mount+0x51/0xa0 [<c023ddae>] do_kern_mount+0x3e/0xe0 [<c023f189>] do_mount+0x169/0x700 [<c023fa9b>] sys_mount+0x6b/0xa0 [<c04abd1f>] sysenter_do_call+0x12/0x28 Code: 53 18 8b 43 20 89 4b 18 8b 4b 24 89 53 1c 89 43 24 89 4b 20 8b 43 20 c7 43 2c 00 00 00 00 23 75 e8 8b 50 68 89 53 28 8b 54 b3 20 <8b> 72 48 8b 7a 4c 8b 55 08 89 b3 84 00 00 00 89 bb 88 00 00 00 EIP: [<d0d7a08e>] nilfs_load_super_block+0x17e/0x280 [nilfs2] SS:ESP 0068:ca9bbdcc CR2: 0000000000000048 This turned out due to a defect in an error path which runs if the calculated location of the secondary super block was invalid. This patch fixes it and eliminates the reported oops. Reported-by: Slicky Devil <slicky.dvl@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: Slicky Devil <slicky.dvl@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* nilfs2: potential integer overflow in nilfs_ioctl_clean_segments()Haogang Chen2011-12-201-0/+3
| | | | | | | | | | | | | | | | | | | | | | There is a potential integer overflow in nilfs_ioctl_clean_segments(). When a large argv[n].v_nmembs is passed from the userspace, the subsequent call to vmalloc() will allocate a buffer smaller than expected, which leads to out-of-bound access in nilfs_ioctl_move_blocks() and lfs_clean_segments(). The following check does not prevent the overflow because nsegs is also controlled by the userspace and could be very large. if (argv[n].v_nmembs > nsegs * nilfs->ns_blocks_per_segment) goto out_free; This patch clamps argv[n].v_nmembs to UINT_MAX / argv[n].v_size, and returns -EINVAL when overflow. Signed-off-by: Haogang Chen <haogangchen@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* nilfs2: unbreak compat ioctlThomas Meyer2011-12-201-0/+13
| | | | | | | | | | | | commit 828b1c50ae ("nilfs2: add compat ioctl") incidentally broke all other NILFS compat ioctls. Make them work again. Signed-off-by: Thomas Meyer <thomas@m3y3r.de> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: <stable@vger.kernel.org> [3.0+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* filesystems: add set_nlink()Miklos Szeredi2011-11-022-2/+2
| | | | | | | | | Replace remaining direct i_nlink updates with a new set_nlink() updater function. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
* filesystems: add missing nlink wrappersMiklos Szeredi2011-11-021-1/+1
| | | | | | | Replace direct i_nlink updates with the respective updater function (inc_nlink, drop_nlink, clear_nlink, inode_dec_link_count). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
* treewide: use __printf not __attribute__((format(printf,...)))Joe Perches2011-10-311-4/+4
| | | | | | | | | | | | | | | | | Standardize the style for compiler based printf format verification. Standardized the location of __printf too. Done via script and a little typing. $ grep -rPl --include=*.[ch] -w "__attribute__" * | \ grep -vP "^(tools|scripts|include/linux/compiler-gcc.h)" | \ xargs perl -n -i -e 'local $/; while (<>) { s/\b__attribute__\s*\(\s*\(\s*format\s*\(\s*printf\s*,\s*(.+)\s*,\s*(.+)\s*\)\s*\)\s*\)/__printf($1, $2)/g ; print; }' [akpm@linux-foundation.org: revert arch bits] Signed-off-by: Joe Perches <joe@perches.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlersJosef Bacik2011-07-202-3/+11
| | | | | | | | | | | | | | | Btrfs needs to be able to control how filemap_write_and_wait_range() is called in fsync to make it less of a painful operation, so push down taking i_mutex and the calling of filemap_write_and_wait() down into the ->fsync() handlers. Some file systems can drop taking the i_mutex altogether it seems, like ext3 and ocfs2. For correctness sake I just pushed everything down in all cases to make sure that we keep the current behavior the same for everybody, and then each individual fs maintainer can make up their mind about what to do from there. Thanks, Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* fs: simplify the blockdev_direct_IO prototypeChristoph Hellwig2011-07-201-2/+2
| | | | | | | | | | | | | | Simple filesystems always pass inode->i_sb_bdev as the block device argument, and never need a end_io handler. Let's simply things for them and for my grepping activity by dropping these arguments. The only thing not falling into that scheme is ext4, which passes and end_io handler without needing special flags (yet), but given how messy the direct I/O code there is use of __blockdev_direct_IO in one instead of two out of three cases isn't going to make a large difference anyway. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* fs: move inode_dio_wait calls into ->setattrChristoph Hellwig2011-07-201-0/+2
| | | | | | | | | | Let filesystems handle waiting for direct I/O requests themselves instead of doing it beforehand. This means filesystem-specific locks to prevent new dio referenes from appearing can be held. This is important to allow generalizing i_dio_count to non-DIO_LOCKING filesystems. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* make d_splice_alias(ERR_PTR(err), dentry) = ERR_PTR(err)Al Viro2011-07-201-6/+1
| | | | | | ... and simplify the living hell out of callers Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* ->permission() sanitizing: don't pass flags to ->permission()Al Viro2011-07-202-2/+2
| | | | | | not used by the instances anymore. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* ->permission() sanitizing: don't pass flags to generic_permission()Al Viro2011-07-201-1/+1
| | | | | | | redundant; all callers get it duplicated in mask & MAY_NOT_BLOCK and none of them removes that bit. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* kill check_acl callback of generic_permission()Al Viro2011-07-201-1/+1
| | | | | | | its value depends only on inode and does not change; we might as well store it in ->i_op->check_acl and be done with that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* nilfs2_permission() doesn't need to bail out in RCU modeAl Viro2011-06-201-6/+1
| | | | | | Nothing blocking except for generic_permission(). Which will DTRT. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* nilfs2: fix problem in setting checkpoint intervalRyusuke Konishi2011-06-111-1/+1
| | | | | | | | | | | | | | | | Checkpoint generation interval of nilfs goes wrong after user has changed the interval parameter with nilfs-tune tool. segctord starting. Construction interval = 5 seconds, CP frequency < 30 seconds segctord starting. Construction interval = 0 seconds, CP frequency < 30 seconds This turned out to be caused by a trivial bug in initialization code of log writer. This will fix it. Reported-by: Andrea Gelmini <andrea.gelmini@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: fix missing block address termination in btree node shrinkingRyusuke Konishi2011-06-111-7/+14
| | | | | | | | | | | nilfs_btree_delete function does not terminate part of virtual block addresses when shrinking the last remaining child node into the root node. The missing address termination causes that dead btree node blocks persist and chip away free disk space. This fixes the leak bug on the btree node deletion. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: fix incorrect block address termination in node concatenationRyusuke Konishi2011-06-111-5/+13
| | | | | | | | | | | | | nilfs_btree_delete function wrongly terminates virtual block address of the btree node held by its parent at index 0. When concatenating the index-0 node with its right sibling node, nilfs_btree_delete terminates the block address of index-0 node instead of the right sibling node which should be deleted. This bug not only wears disk space in the long run, but also causes file system corruption. This will fix it. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* nilfs2: remove unnecessary dentry_unhash from rmdir, dir renameSage Weil2011-05-281-5/+0
| | | | | | | | | nilfs2 does not have problems with references to unlinked directories. CC: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp> CC: linux-nilfs@vger.kernel.org Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* fs: pass exact type of data dirties to ->dirty_inodeChristoph Hellwig2011-05-272-2/+2
| | | | | | | | | | | | | | | | | Tell the filesystem if we just updated timestamp (I_DIRTY_SYNC) or anything else, so that the filesystem can track internally if it needs to push out a transaction for fdatasync or not. This is just the prototype change with no user for it yet. I plan to push large XFS changes for the next merge window, and getting this trivial infrastructure in this window would help a lot to avoid tree interdependencies. Also remove incorrect comments that ->dirty_inode can't block. That has been changed a long time ago, and many implementations rely on it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Merge branch 'for-linus' of ↵Linus Torvalds2011-05-261-0/+5
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (25 commits) cifs: remove unnecessary dentry_unhash on rmdir/rename_dir ocfs2: remove unnecessary dentry_unhash on rmdir/rename_dir exofs: remove unnecessary dentry_unhash on rmdir/rename_dir nfs: remove unnecessary dentry_unhash on rmdir/rename_dir ext2: remove unnecessary dentry_unhash on rmdir/rename_dir ext3: remove unnecessary dentry_unhash on rmdir/rename_dir ext4: remove unnecessary dentry_unhash on rmdir/rename_dir btrfs: remove unnecessary dentry_unhash in rmdir/rename_dir ceph: remove unnecessary dentry_unhash calls vfs: clean up vfs_rename_other vfs: clean up vfs_rename_dir vfs: clean up vfs_rmdir vfs: fix vfs_rename_dir for FS_RENAME_DOES_D_MOVE filesystems libfs: drop unneeded dentry_unhash vfs: update dentry_unhash() comment vfs: push dentry_unhash on rename_dir into file systems vfs: push dentry_unhash on rmdir into file systems vfs: remove dget() from dentry_unhash() vfs: dentry_unhash immediately prior to rmdir vfs: Block mmapped writes while the fs is frozen ...
| * vfs: push dentry_unhash on rename_dir into file systemsSage Weil2011-05-261-0/+3
| | | | | | | | | | | | | | | | | | | | Only a few file systems need this. Start by pushing it down into each rename method (except gfs2 and xfs) so that it can be dealt with on a per-fs basis. Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * vfs: push dentry_unhash on rmdir into file systemsSage Weil2011-05-261-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | Only a few file systems need this. Start by pushing it down into each fs rmdir method (except gfs2 and xfs) so it can be dealt with on a per-fs basis. This does not change behavior for any in-tree file systems. Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | nilfs2: use mark_buffer_dirty to mark btnode or meta data dirtyRyusuke Konishi2011-05-1015-86/+63
| | | | | | | | | | | | | | | | This replaces nilfs_mdt_mark_buffer_dirty and nilfs_btnode_mark_dirty macros with mark_buffer_dirty and gets rid of nilfs_mark_buffer_dirty, an own mark buffer dirty function. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* | nilfs2: always set back pointer to host inode in mapping->hostRyusuke Konishi2011-05-108-36/+14
| | | | | | | | | | | | | | | | | | | | In the current nilfs, page cache for btree nodes and meta data files do not set a valid back pointer to the host inode in mapping->host. This will change it so that every address space in nilfs uses mapping->host to hold its host inode. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* | nilfs2: get rid of NILFS_I_NILFSRyusuke Konishi2011-05-107-22/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | This replaces all references of NILFS_I_NILFS(inode)->ns_bdev with inode->i_sb->s_bdev and unfolds remaining uses of NILFS_I_NILFS inline function. Before 2.6.37, referring to a nilfs object from inodes needed a conditional judgement, and NILFS_I_NILFS was helpful to simplify it. But now we can simply do it by going through a super block instance like inode->i_sb->s_fs_info. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* | nilfs2: use list_first_entryRyusuke Konishi2011-05-102-8/+8
| | | | | | | | | | | | | | This uses list_first_entry macro instead of list_entry if it's used to get the first entry. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* | nilfs2: use empty_aops for gc-inodesRyusuke Konishi2011-05-101-4/+1
| | | | | | | | | | | | Applies empty_aops for address space operations of gc-inodes. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* | nilfs2: implement resize ioctlRyusuke Konishi2011-05-107-5/+189
| | | | | | | | | | | | This adds resize ioctl which makes online resize possible. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* | nilfs2: add truncation routine of segment usage fileRyusuke Konishi2011-05-101-0/+112
| | | | | | | | | | | | | | | | When shrinking the filesystem, segments to be truncated must be test if they are busy or not, and unneeded sufile block should be deleted. This adds routines for the truncation. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* | nilfs2: add routine to move secondary super blockRyusuke Konishi2011-05-101-0/+57
| | | | | | | | | | | | | | After resizing the filesystem, the secondary super block must be moved to a new location. This adds a helper function for this. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* | nilfs2: add ioctl which limits range of segment to be allocatedRyusuke Konishi2011-05-103-10/+98
| | | | | | | | | | | | | | | | | | | | | | | | | | This adds a new ioctl command which limits range of segment to be allocated. This is intended to gather data whithin a range of the partition before shrinking the filesystem, or to control new log location for some purpose. If a range is specified by the ioctl, segment allocator of nilfs tries to allocate new segments from the range unless no free segments are available there. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* | nilfs2: zero fill unused portion of super root blockRyusuke Konishi2011-05-102-2/+15
| | | | | | | | | | | | | | The super root block is newly-allocated each time it is written back to disk, so unused portion of the block should be cleared. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* | nilfs2: super root size should change depending on inode sizeRyusuke Konishi2011-05-102-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | The size of super root structure depends on inode size, so NILFS_SR_BYTES macro should be a function of the inode size. This fixes the issue. Even though a different size value will be written for a possible future filesystem with extended inode, but fortunately this does not break disk format compatibility. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* | nilfs2: get rid of private page allocatorRyusuke Konishi2011-05-106-207/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, nilfs was cloning pages for mmapped region to freeze their data and ensure consistency of checksum during writeback cycles. A private page allocator was used for this page cloning. But, we no longer need to do that since clear_page_dirty_for_io function sets up pte so that vm_ops->page_mkwrite function is called right before the mmapped pages are modified and nilfs_page_mkwrite function can safely wait for the pages to be written back to disk. So, this stops making a copy of mmapped pages during writeback, and eliminates the private page allocation and deallocation functions from nilfs. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* | nilfs2: merge list_del()/list_add_tail() to list_move_tail()Nicolas Kaiser2011-05-102-6/+3
| | | | | | | | | | | | | | Merge list_del() + list_add_tail() to list_move_tail(). Signed-off-by: Nicolas Kaiser <nikai@nikai.net> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* | nilfs2: fix infinite loop in nilfs_palloc_freev functionRyusuke Konishi2011-05-101-1/+1
|/ | | | | | | | | | | | | | | | | | After having applied commit 9954e7af14868b8b ("nilfs2: add free entries count only if clear bit operation succeeded"), a free routine of nilfs came to fall into an infinite loop, outputting the same message endlessly: nilfs_palloc_freev: entry number 29497 already freed nilfs_palloc_freev: entry number 29497 already freed nilfs_palloc_freev: entry number 29497 already freed nilfs_palloc_freev: entry number 29497 already freed nilfs_palloc_freev: entry number 29497 already freed ... That patch broke the routine so that a loop counter is never updated in an abnormal state. This fixes the regression. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
* fs: export empty_aopsJens Axboe2011-04-051-2/+0
| | | | | | | | | | | | With the ->sync_page() hook gone, we have a few users that add their own static address_space_operations without any functions defined. fs/inode.c already has an empty_aops that it uses for init purposes. Lets export that and use it in the places where an otherwise empty aops was defined. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>