aboutsummaryrefslogtreecommitdiffstats
path: root/fs/btrfs/super.c
Commit message (Collapse)AuthorAgeFilesLines
* Merge branches 'for-linus' and 'for-linus-3.2' of ↵Linus Torvalds2011-12-161-7/+25
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: unplug every once and a while Btrfs: deal with NULL srv_rsv in the delalloc inode reservation code Btrfs: only set cache_generation if we setup the block group Btrfs: don't panic if orphan item already exists Btrfs: fix leaked space in truncate Btrfs: fix how we do delalloc reservations and how we free reservations on error Btrfs: deal with enospc from dirtying inodes properly Btrfs: fix num_workers_starting bug and other bugs in async thread BTRFS: Establish i_ops before calling d_instantiate Btrfs: add a cond_resched() into the worker loop Btrfs: fix ctime update of on-disk inode btrfs: keep orphans for subvolume deletion Btrfs: fix inaccurate available space on raid0 profile Btrfs: fix wrong disk space information of the files Btrfs: fix wrong i_size when truncating a file to a larger size Btrfs: fix btrfs_end_bio to deal with write errors to a single mirror * 'for-linus-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: btrfs: lower the dirty balance poll interval
| * Merge branch 'for-chris' of ↵Chris Mason2011-12-151-1/+12
| |\ | | | | | | | | | | | | | | | | | | | | | | | | http://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-work into integration Conflicts: fs/btrfs/inode.c Signed-off-by: Chris Mason <chris.mason@oracle.com>
| | * Btrfs: deal with enospc from dirtying inodes properlyJosef Bacik2011-12-151-1/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we're properly keeping track of delayed inode space we've been getting a lot of warnings out of btrfs_dirty_inode() when running xfstest 83. This is because a bunch of people call mark_inode_dirty, which is void so we can't return ENOSPC. This needs to be fixed in a few areas 1) file_update_time - this updates the mtime and such when writing to a file, which will call mark_inode_dirty. So copy file_update_time into btrfs so we can call btrfs_dirty_inode directly and return an error if we get one appropriately. 2) fix symlinks to use btrfs_setattr for ->setattr. For some reason we weren't setting ->setattr for symlinks, even though we should have been. This catches one of the cases where we were getting errors in mark_inode_dirty. 3) Fix btrfs_setattr and btrfs_setsize to call btrfs_dirty_inode directly instead of mark_inode_dirty. This lets us return errors properly for truncate and chown/anything related to setattr. 4) Add a new btrfs_fs_dirty_inode which will just call btrfs_dirty_inode and print an error if we have one. The only remaining user we can't control for this is touch_atime(), but we don't really want to keep people from walking down the tree if we don't have space to save the atime update, so just complain but don't worry about it. With this patch xfstests 83 complains a handful of times instead of hundreds of times. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
| * | Btrfs: fix inaccurate available space on raid0 profileMiao Xie2011-12-151-6/+13
| |/ | | | | | | | | | | | | | | | | | | | | When we use raid0 as the data profile, df command may show us a very inaccurate value of the available space, which may be much less than the real one. It may make the users puzzled. Fix it by changing the calculation of the available space, and making it be more similar to a fake chunk allocation. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* | Merge branch 'for-linus' of ↵Linus Torvalds2011-12-011-3/+3
|\ \ | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: fix meta data raid-repair merge problem Btrfs: skip allocation attempt from empty cluster Btrfs: skip block groups without enough space for a cluster Btrfs: start search for new cluster at the beginning Btrfs: reset cluster's max_size when creating bitmap Btrfs: initialize new bitmaps' list Btrfs: fix oops when calling statfs on readonly device Btrfs: Don't error on resizing FS to same size Btrfs: fix deadlock on metadata reservation when evicting a inode Fix URL of btrfs-progs git repository in docs btrfs scrub: handle -ENOMEM from init_ipath()
| * Btrfs: fix oops when calling statfs on readonly deviceLi Zefan2011-11-301-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To reproduce this bug: # dd if=/dev/zero of=img bs=1M count=256 # mkfs.btrfs img # losetup -r /dev/loop1 img # mount /dev/loop1 /mnt OOPS!! It triggered BUG_ON(!nr_devices) in btrfs_calc_avail_data_space(). To fix this, instead of checking write-only devices, we check all open deivces: # df -h /dev/loop1 Filesystem Size Used Avail Use% Mounted on /dev/loop1 250M 28K 238M 1% /mnt Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
* | new helper: mount_subtree()Al Viro2011-11-161-29/+6
| | | | | | | | | | | | | | | | | | | | | | | | takes vfsmount and relative path, does lookup within that vfsmount (possibly triggering automounts) and returns the result as root of subtree suitable for return by ->mount() (i.e. a reference to dentry and an active reference to its superblock grabbed, superblock locked exclusive). btrfs and nfs switched to it instead of open-coding the sucker. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | switch create_mnt_ns() to saner calling conventions, fix double mntput() in nfsAl Viro2011-11-161-3/+1
| | | | | | | | | | | | | | | | Life is much saner if create_mnt_ns(mnt) drops mnt in case of error... Switch it to such calling conventions, switch callers, fix double mntput() in fs/nfs/super.c one. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | btrfs: fix double mntput() in mount_subvol()Al Viro2011-11-161-1/+0
|/ | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* btrfs: rename the option to nospace_cacheDavid Sterba2011-11-111-2/+2
| | | | | | | | | | | Rename no_space_cache option to nospace_cache to be more consistent with the rest, where the simple prefix 'no' is used to negate an option. The option has been introduced during the -rc1 cycle and there are has not been widely used, so it's safe. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: rework error handling in btrfs_mount()Ilya Dryomov2011-11-091-21/+21
| | | | | | | | | | | | Commits 6c41761f and 45ea6095 introduced the possibility of NULL pointer dereference on error paths, also we would leave all devices busy and leak fs_info with all sub-structures on error when trying to mount an already mounted fs to a different directory. Fix this by doing all allocations before trying to open any of the devices, adjust error path for mount-already-mounted-fs case. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* Btrfs: fix subvol_name leak on error in btrfs_mount()Ilya Dryomov2011-11-091-1/+3
| | | | | | | | | | | btrfs_parse_early_options() can fail due to error while scanning devices (-o device= option), but still strdup() subvol_name string: mount -o subvol=SUBV,device=BAD_DEVICE <dev> <mnt> So free subvol_name string on error. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* Btrfs: fix memory leak in btrfs_parse_early_options()Ilya Dryomov2011-11-091-0/+1
| | | | | | | | Don't leak subvol_name string in case multiple subvol= options are given. "The lastest option is effective" behavior (consistent with subvolid= and subvolrootid= options) is preserved. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* btrfs: fix double-free 'tree_root' in 'btrfs_mount()'slyich@gmail.com2011-11-071-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On error path 'tree_root' is treed in 'free_fs_info()'. No need to free it explicitely. Noticed by SLUB in debug mode: Complete reproducer under usermode linux (discovered on real machine): bdev=/dev/ubda btr_root=/btr /mkfs.btrfs $bdev mount $bdev $btr_root mkdir $btr_root/subvols/ cd $btr_root/subvols/ /btrfs su cr foo /btrfs su cr bar mount $bdev -osubvol=subvols/foo $btr_root/subvols/bar umount $btr_root/subvols/bar which gives device fsid 4d55aa28-45b1-474b-b4ec-da912322195e devid 1 transid 7 /dev/ubda ============================================================================= BUG kmalloc-2048: Object already free ----------------------------------------------------------------------------- INFO: Allocated in btrfs_mount+0x389/0x7f0 age=0 cpu=0 pid=277 INFO: Freed in btrfs_mount+0x51c/0x7f0 age=0 cpu=0 pid=277 INFO: Slab 0x0000000062886200 objects=15 used=9 fp=0x0000000070b4d2d0 flags=0x4081 INFO: Object 0x0000000070b4d2d0 @offset=21200 fp=0x0000000070b4a968 ... Call Trace: 70b31948: [<6008c522>] print_trailer+0xe2/0x130 70b31978: [<6008c5aa>] object_err+0x3a/0x50 70b319a8: [<6008e242>] free_debug_processing+0x142/0x2a0 70b319e0: [<600ebf6f>] btrfs_mount+0x55f/0x7f0 70b319f8: [<6008e5c1>] __slab_free+0x221/0x2d0 Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org> Cc: Arne Jansen <sensille@gmx.net> Cc: Chris Mason <chris.mason@oracle.com> Cc: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: add a log of past tree rootsChris Mason2011-11-061-1/+6
| | | | | | | | | | | | This takes some of the free space in the btrfs super block to record information about most of the roots in the last four commits. It also adds a -o recovery to use the root history log when we're not able to read the tree of tree roots, the extent tree root, the device tree root or the csum root. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* btrfs: separate superblock items out of fs_infoDavid Sterba2011-11-061-6/+13
| | | | | | | | | | | | fs_info has now ~9kb, more than fits into one page. This will cause mount failure when memory is too fragmented. Top space consumers are super block structures super_copy and super_for_commit, ~2.8kb each. Allocate them dynamically. fs_info will be ~3.5kb. (measured on x86_64) Add a wrapper for freeing fs_info and all of it's dynamically allocated members. Signed-off-by: David Sterba <dsterba@suse.cz>
* btrfs: do not allow mounting non-subvolumes via subvol optionDavid Sterba2011-10-241-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's a missing test whether the path passed to subvol=path option during mount is a real subvolume, allowing any directory located in default subovlume to be passed and accepted for mount. (current btrfs progs prevent this early) $ btrfs subvol snapshot . p1-snap ERROR: '.' is not a subvolume (with "is subvolume?" test bypassed) $ btrfs subvol snapshot . p1-snap Create a snapshot of '.' in './p1-snap' $ btrfs subvol list -p . ID 258 parent 5 top level 5 path subvol ID 259 parent 5 top level 5 path subvol1 ID 260 parent 5 top level 5 path default-subvol1 ID 262 parent 5 top level 5 path p1/p1-snapshot ID 263 parent 259 top level 5 path subvol1/subvol1-snap The problem I see is that this makes a false impression of snapshotting the given subvolume but in fact snapshots the default one: a user expects outcome like ID 263 but in fact gets ID 262 . This patch makes mount fail with EINVAL with a message in syslog. Signed-off-by: David Sterba <dsterba@suse.cz>
* Btrfs: fix a bug when opening seed devicesIlya Dryomov2011-10-201-1/+1
| | | | | | | Initialize fs_info->bdev_holder a bit earlier to be able to pass a correct holder id to blkdev_get() when opening seed devices with O_EXCL. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* btrfs: trivial fix, a potential memory leak in btrfs_parse_early_options()Jeff Liu2011-10-201-2/+8
| | | | Signed-off-by: Jie Liu <jeff.liu@oracle.com>
* Btrfs: introduce mount option no_space_cacheJosef Bacik2011-10-191-4/+17
| | | | | | | | | | | | | Some users have requested this and I've found I needed a way to disable cache loading without actually clearing the cache, so introduce the no_space_cache option. Before we check the super blocks cache generation field and if it was populated we always turned space caching on. Now we check this and set the space cache option on, and then parse the mount options so that if we want it off it get's turned off. Then we check the mount option all the places we do the caching work instead of checking the super's cache generation. This makes things more consistent and lets us turn space caching off. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
* Btrfs: fix how we mount subvol=<whatever>Josef Bacik2011-10-191-64/+135
| | | | | | | | We've only been able to mount with subvol=<whatever> where whatever was a subvol within whatever root we had as the default. This allows us to mount -o subvol=path/to/subvol/you/want relative from the normal fs_tree root. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
* Btrfs: use d_obtain_alias when mounting subvol/subvolidJosef Bacik2011-10-191-24/+1
| | | | | | | | | | | | | | | | | | | | | Currently what we do is just wrong. We either 1) Alloc a new "root" dentry with sb->s_root as it's parent which is just wrong as we could walk into this subvol later on via another path and hilarity could ensue. Also we don't check the return value of d_splice_alias which isn't good either. or 2) Do a d_find_alias() which we could have lost our dentry from cache at this point and found nothing. So use d_obtain_alias(). In the case that we already have the inode/dentry in cache we will get the correct dentry. If not we will get a disconnected dentry tree so if we walk into it later on everything will be connected up properly. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
* Merge branch 'for-linus' of ↵Linus Torvalds2011-07-081-0/+6
|\ | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: btrfs: fix oops when doing space balance Btrfs: don't panic if we get an error while balancing V2 btrfs: add missing options displayed in mount output
| * btrfs: add missing options displayed in mount outputDavid Sterba2011-07-061-0/+6
| | | | | | | | | | | | | | | | There are three missed mount options settable by user which are not currently displayed in mount output. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* | Merge branch 'for-linus-2' of ↵Linus Torvalds2011-06-071-1/+1
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: vfs: make unlink() and rmdir() return ENOENT in preference to EROFS lmLogOpen() broken failure exit usb: remove bad dput after dentry_unhash more conservative S_NOSEC handling
| * | more conservative S_NOSEC handlingAl Viro2011-06-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Caching "we have already removed suid/caps" was overenthusiastic as merged. On network filesystems we might have had suid/caps set on another client, silently picked by this client on revalidate, all of that *without* clearing the S_NOSEC flag. AFAICS, the only reasonably sane way to deal with that is * new superblock flag; unless set, S_NOSEC is not going to be set. * local block filesystems set it in their ->mount() (more accurately, mount_bdev() does, so does btrfs ->mount(), users of mount_bdev() other than local block ones clear it) * if any network filesystem (or a cluster one) wants to use S_NOSEC, it'll need to set MS_NOSEC in sb->s_flags *AND* take care to clear S_NOSEC when inode attribute changes are picked from other clients. It's not an earth-shattering hole (anybody that can set suid on another client will almost certainly be able to write to the file before doing that anyway), but it's a bug that needs fixing. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | Merge branch 'for-linus' of ↵Linus Torvalds2011-06-051-1/+7
|\ \ \ | |/ / |/| / | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (25 commits) btrfs: fix uninitialized variable warning btrfs: add helper for fs_info->closing Btrfs: add mount -o inode_cache btrfs: scrub: add explicit plugging btrfs: use btrfs_ino to access inode number Btrfs: don't save the inode cache if we are deleting this root btrfs: false BUG_ON when degraded Btrfs: don't save the inode cache in non-FS roots Btrfs: make sure we don't overflow the free space cache crc page Btrfs: fix uninit variable in the delayed inode code btrfs: scrub: don't reuse bios and pages Btrfs: leave spinning on lookup and map the leaf Btrfs: check for duplicate entries in the free space cache Btrfs: don't try to allocate from a block group that doesn't have enough space Btrfs: don't always do readahead Btrfs: try not to sleep as much when doing slow caching Btrfs: kill BTRFS_I(inode)->block_group Btrfs: don't look at the extent buffer level 3 times in a row Btrfs: map the node block when looking for readahead targets Btrfs: set range_start to the right start in count_range_bits ...
| * Btrfs: add mount -o inode_cacheChris Mason2011-06-041-1/+7
| | | | | | | | | | | | | | | | This makes the inode map cache default to off until we fix the overflow problem when the free space crcs don't fit inside a single page. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* | Merge branch 'for-linus' of ↵Linus Torvalds2011-05-271-5/+46
|\ \ | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (58 commits) Btrfs: use the device_list_mutex during write_dev_supers Btrfs: setup free ino caching in a more asynchronous way btrfs scrub: don't coalesce pages that are logically discontiguous Btrfs: return -ENOMEM in clear_extent_bit Btrfs: add mount -o auto_defrag Btrfs: using rcu lock in the reader side of devices list Btrfs: drop unnecessary device lock Btrfs: fix the race between remove dev and alloc chunk Btrfs: fix the race between reading and updating devices Btrfs: fix bh leak on __btrfs_open_devices path Btrfs: fix unsafe usage of merge_state Btrfs: allocate extent state and check the result properly fs/btrfs: Add missing btrfs_free_path Btrfs: check return value of btrfs_inc_extent_ref() Btrfs: return error to caller if read_one_inode() fails Btrfs: BUG_ON is deleted from the caller of btrfs_truncate_item & btrfs_extend_item Btrfs: return error code to caller when btrfs_del_item fails Btrfs: return error code to caller when btrfs_previous_item fails btrfs: fix typo 'testeing' -> 'testing' btrfs: typo: 'btrfS' -> 'btrfs' ...
| * Btrfs: add mount -o auto_defragChris Mason2011-05-261-1/+6
| | | | | | | | | | | | | | | | | | This will detect small random writes into files and queue the up for an auto defrag process. It isn't well suited to database workloads yet, but works for smaller files such as rpm, sqlite or bdb databases. Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Merge branch 'cleanups_and_fixes' into inode_numbersChris Mason2011-05-231-1/+3
| |\ | | | | | | | | | | | | | | | | | | | | | Conflicts: fs/btrfs/tree-log.c fs/btrfs/volumes.c Signed-off-by: Chris Mason <chris.mason@oracle.com>
| | * fs/btrfs: Add missing btrfs_free_pathJulia Lawall2011-05-231-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Btrfs_alloc_path should be matched with btrfs_free_path in error-handling code. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @r exists@ local idexpression struct btrfs_path * x; expression ra,rb; position p1,p2; @@ x = btrfs_alloc_path@p1(...) ... when != btrfs_free_path(x,...) when != if (...) { ... btrfs_free_path(x,...) ...} when != x = ra if(...) { ... when != x = rb when forall when != btrfs_free_path(x,...) \(return <+...x...+>; \| return@p2...; \) } @script:python@ p1 << r.p1; p2 << r.p2; @@ cocci.print_main("alloc",p1) cocci.print_secs("return",p2) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * | Merge branch 'allocator' of ↵Chris Mason2011-05-221-0/+26
| |\ \ | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/arne/btrfs-unstable-arne into inode_numbers Signed-off-by: Chris Mason <chris.mason@oracle.com>
| | * | btrfs: move btrfs_cmp_device_free_bytes to super.cArne Jansen2011-05-131-0/+26
| | |/ | | | | | | | | | | | | this function won't be used here anymore, so move it super.c where it is used for df-calculation
| * | Merge branch 'cleanups' of git://repo.or.cz/linux-2.6/btrfs-unstable into ↵Chris Mason2011-05-221-2/+2
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | inode_numbers Conflicts: fs/btrfs/extent-tree.c fs/btrfs/free-space-cache.c fs/btrfs/inode.c fs/btrfs/tree-log.c Signed-off-by: Chris Mason <chris.mason@oracle.com>
| | * | btrfs: rename variables clashing with global function namesDavid Sterba2011-05-021-2/+2
| | |/ | | | | | | | | | | | | | | | | | | reported by gcc -Wshadow: page_index, page_offset, new_inode, dev_name Signed-off-by: David Sterba <dsterba@suse.cz>
| * | btrfs: implement delayed inode items operationMiao Xie2011-05-211-1/+9
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changelog V5 -> V6: - Fix oom when the memory load is high, by storing the delayed nodes into the root's radix tree, and letting btrfs inodes go. Changelog V4 -> V5: - Fix the race on adding the delayed node to the inode, which is spotted by Chris Mason. - Merge Chris Mason's incremental patch into this patch. - Fix deadlock between readdir() and memory fault, which is reported by Itaru Kitayama. Changelog V3 -> V4: - Fix nested lock, which is reported by Itaru Kitayama, by updating space cache inode in time. Changelog V2 -> V3: - Fix the race between the delayed worker and the task which does delayed items balance, which is reported by Tsutomu Itoh. - Modify the patch address David Sterba's comment. - Fix the bug of the cpu recursion spinlock, reported by Chris Mason Changelog V1 -> V2: - break up the global rb-tree, use a list to manage the delayed nodes, which is created for every directory and file, and used to manage the delayed directory name index items and the delayed inode item. - introduce a worker to deal with the delayed nodes. Compare with Ext3/4, the performance of file creation and deletion on btrfs is very poor. the reason is that btrfs must do a lot of b+ tree insertions, such as inode item, directory name item, directory name index and so on. If we can do some delayed b+ tree insertion or deletion, we can improve the performance, so we made this patch which implemented delayed directory name index insertion/deletion and delayed inode update. Implementation: - introduce a delayed root object into the filesystem, that use two lists to manage the delayed nodes which are created for every file/directory. One is used to manage all the delayed nodes that have delayed items. And the other is used to manage the delayed nodes which is waiting to be dealt with by the work thread. - Every delayed node has two rb-tree, one is used to manage the directory name index which is going to be inserted into b+ tree, and the other is used to manage the directory name index which is going to be deleted from b+ tree. - introduce a worker to deal with the delayed operation. This worker is used to deal with the works of the delayed directory name index items insertion and deletion and the delayed inode update. When the delayed items is beyond the lower limit, we create works for some delayed nodes and insert them into the work queue of the worker, and then go back. When the delayed items is beyond the upper bound, we create works for all the delayed nodes that haven't been dealt with, and insert them into the work queue of the worker, and then wait for that the untreated items is below some threshold value. - When we want to insert a directory name index into b+ tree, we just add the information into the delayed inserting rb-tree. And then we check the number of the delayed items and do delayed items balance. (The balance policy is above.) - When we want to delete a directory name index from the b+ tree, we search it in the inserting rb-tree at first. If we look it up, just drop it. If not, add the key of it into the delayed deleting rb-tree. Similar to the delayed inserting rb-tree, we also check the number of the delayed items and do delayed items balance. (The same to inserting manipulation) - When we want to update the metadata of some inode, we cached the data of the inode into the delayed node. the worker will flush it into the b+ tree after dealing with the delayed insertion and deletion. - We will move the delayed node to the tail of the list after we access the delayed node, By this way, we can cache more delayed items and merge more inode updates. - If we want to commit transaction, we will deal with all the delayed node. - the delayed node will be freed when we free the btrfs inode. - Before we log the inode items, we commit all the directory name index items and the delayed inode update. I did a quick test by the benchmark tool[1] and found we can improve the performance of file creation by ~15%, and file deletion by ~20%. Before applying this patch: Create files: Total files: 50000 Total time: 1.096108 Average time: 0.000022 Delete files: Total files: 50000 Total time: 1.510403 Average time: 0.000030 After applying this patch: Create files: Total files: 50000 Total time: 0.932899 Average time: 0.000019 Delete files: Total files: 50000 Total time: 1.215732 Average time: 0.000024 [1] http://marc.info/?l=linux-btrfs&m=128212635122920&q=p3 Many thanks for Kitayama-san's help! Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Reviewed-by: David Sterba <dave@jikos.cz> Tested-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Tested-by: Itaru Kitayama <kitayama@cl.bb4u.ne.jp> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* | btrfs: add cleancache supportDan Magenheimer2011-05-261-0/+2
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This sixth patch of eight in this cleancache series "opts-in" cleancache for btrfs. Filesystems must explicitly enable cleancache by calling cleancache_init_fs anytime an instance of the filesystem is mounted. Btrfs uses its own readpage which must be hooked, but all other cleancache hooks are in the VFS layer including the matching cleancache_flush_fs hook which must be called on unmount. Details and a FAQ can be found in Documentation/vm/cleancache.txt [v6-v8: no changes] [v5: jeremy@goop.org: simplify init hook and any future fs init changes] Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com> Reviewed-by: Jeremy Fitzhardinge <jeremy@goop.org> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Rik Van Riel <riel@redhat.com> Cc: Jan Beulich <JBeulich@novell.com> Cc: Andreas Dilger <adilger@sun.com> Cc: Ted Ts'o <tytso@mit.edu> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <joel.becker@oracle.com> Cc: Nitin Gupta <ngupta@vflare.org>
* Btrfs: fix subvolume mount by name problem when default mount subvolume is setXin Zhong2011-04-111-9/+33
| | | | | | | | | | | | | | | | | | | | We create two subvolumes (meego_root and meego_home) in btrfs root directory. And set meego_root as default mount subvolume. After we remount btrfs, meego_root is mounted to top directory by default. Then when we try to mount meego_home (subvol=meego_home) to a subdirectory, it failed. The problem is when default mount subvolume is set to meego_root, we search meego_home in meego_root but can not find it. So the solution is to add a new mount option (subvolrootid) to specify subvol id of root and search subvol name in it. For our case, now we can use "-o subvolrootid=0,subvol=meego_home) to mount meego_home. Detail information can be found in meego bugzilla: https://bugs.meego.com/show_bug.cgi?id=15055 Signed-off-by: Zhong, Xin <xin.zhong@intel.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: fix /proc/mounts info.Tsutomu Itoh2011-04-051-2/+17
| | | | | | | | | | | | | | | | | | | | | Some mount options are not displayed by /proc/mounts. This patch displays the option such as compress_type by /proc/mounts. Ex. [before] $ mount | grep sdc2 /dev/sdc2 on /test12 type btrfs (rw,space_cache,compress=lzo) $ cat /proc/mounts | grep sdc2 /dev/sdc2 /test12 btrfs rw,relatime,compress 0 0 [after] $ mount | grep sdc2 /dev/sdc2 on /test12 type btrfs (rw,space_cache,compress=lzo) $ cat /proc/mounts | grep sdc2 /dev/sdc2 /test12 btrfs rw,relatime,compress=lzo,space_cache 0 0 Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: add initial tracepoint support for btrfsliubo2011-03-281-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tracepoints can provide insight into why btrfs hits bugs and be greatly helpful for debugging, e.g dd-7822 [000] 2121.641088: btrfs_inode_request: root = 5(FS_TREE), gen = 4, ino = 256, blocks = 8, disk_i_size = 0, last_trans = 8, logged_trans = 0 dd-7822 [000] 2121.641100: btrfs_inode_new: root = 5(FS_TREE), gen = 8, ino = 257, blocks = 0, disk_i_size = 0, last_trans = 0, logged_trans = 0 btrfs-transacti-7804 [001] 2146.935420: btrfs_cow_block: root = 2(EXTENT_TREE), refs = 2, orig_buf = 29368320 (orig_level = 0), cow_buf = 29388800 (cow_level = 0) btrfs-transacti-7804 [001] 2146.935473: btrfs_cow_block: root = 1(ROOT_TREE), refs = 2, orig_buf = 29364224 (orig_level = 0), cow_buf = 29392896 (cow_level = 0) btrfs-transacti-7804 [001] 2146.972221: btrfs_transaction_commit: root = 1(ROOT_TREE), gen = 8 flush-btrfs-2-7821 [001] 2155.824210: btrfs_chunk_alloc: root = 3(CHUNK_TREE), offset = 1103101952, size = 1073741824, num_stripes = 1, sub_stripes = 0, type = DATA flush-btrfs-2-7821 [001] 2155.824241: btrfs_cow_block: root = 2(EXTENT_TREE), refs = 2, orig_buf = 29388800 (orig_level = 0), cow_buf = 29396992 (cow_level = 0) flush-btrfs-2-7821 [001] 2155.824255: btrfs_cow_block: root = 4(DEV_TREE), refs = 2, orig_buf = 29372416 (orig_level = 0), cow_buf = 29401088 (cow_level = 0) flush-btrfs-2-7821 [000] 2155.824329: btrfs_cow_block: root = 3(CHUNK_TREE), refs = 2, orig_buf = 20971520 (orig_level = 0), cow_buf = 20975616 (cow_level = 0) btrfs-endio-wri-7800 [001] 2155.898019: btrfs_cow_block: root = 5(FS_TREE), refs = 2, orig_buf = 29384704 (orig_level = 0), cow_buf = 29405184 (cow_level = 0) btrfs-endio-wri-7800 [001] 2155.898043: btrfs_cow_block: root = 7(CSUM_TREE), refs = 2, orig_buf = 29376512 (orig_level = 0), cow_buf = 29409280 (cow_level = 0) Here is what I have added: 1) ordere_extent: btrfs_ordered_extent_add btrfs_ordered_extent_remove btrfs_ordered_extent_start btrfs_ordered_extent_put These provide critical information to understand how ordered_extents are updated. 2) extent_map: btrfs_get_extent extent_map is used in both read and write cases, and it is useful for tracking how btrfs specific IO is running. 3) writepage: __extent_writepage btrfs_writepage_end_io_hook Pages are cirtical resourses and produce a lot of corner cases during writeback, so it is valuable to know how page is written to disk. 4) inode: btrfs_inode_new btrfs_inode_request btrfs_inode_evict These can show where and when a inode is created, when a inode is evicted. 5) sync: btrfs_sync_file btrfs_sync_fs These show sync arguments. 6) transaction: btrfs_transaction_commit In transaction based filesystem, it will be useful to know the generation and who does commit. 7) back reference and cow: btrfs_delayed_tree_ref btrfs_delayed_data_ref btrfs_delayed_ref_head btrfs_cow_block Btrfs natively supports back references, these tracepoints are helpful on understanding btrfs's COW mechanism. 8) chunk: btrfs_chunk_alloc btrfs_chunk_free Chunk is a link between physical offset and logical offset, and stands for space infomation in btrfs, and these are helpful on tracing space things. 9) reserved_extent: btrfs_reserved_extent_alloc btrfs_reserved_extent_free These can show how btrfs uses its space. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstableLinus Torvalds2011-02-251-1/+6
|\ | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: fix fiemap bugs with delalloc Btrfs: set FMODE_EXCL in btrfs_device->mode Btrfs: make btrfs_rm_device() fail gracefully Btrfs: Avoid accessing unmapped kernel address Btrfs: Fix BTRFS_IOC_SUBVOL_SETFLAGS ioctl Btrfs: allow balance to explicitly allocate chunks as it relocates Btrfs: put ENOSPC debugging under a mount option
| * Btrfs: put ENOSPC debugging under a mount optionChris Mason2011-02-161-1/+6
| | | | | | | | | | | | | | | | ENOSPC in btrfs is getting to the point where the extra debugging isn't required. I've put it under mount -o enospc_debug just in case someone is having difficult problems. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstableLinus Torvalds2011-02-071-2/+7
|\ \ | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (33 commits) Btrfs: Fix page count calculation btrfs: Drop __exit attribute on btrfs_exit_compress btrfs: cleanup error handling in btrfs_unlink_inode() Btrfs: exclude super blocks when we read in block groups Btrfs: make sure search_bitmap finds something in remove_from_bitmap btrfs: fix return value check of btrfs_start_transaction() btrfs: checking NULL or not in some functions Btrfs: avoid uninit variable warnings in ordered-data.c Btrfs: catch errors from btrfs_sync_log Btrfs: make shrink_delalloc a little friendlier Btrfs: handle no memory properly in prepare_pages Btrfs: do error checking in btrfs_del_csums Btrfs: use the global block reserve if we cannot reserve space Btrfs: do not release more reserved bytes to the global_block_rsv than we need Btrfs: fix check_path_shared so it returns the right value btrfs: check return value of btrfs_start_ioctl_transaction() properly btrfs: fix return value check of btrfs_join_transaction() fs/btrfs/inode.c: Add missing IS_ERR test btrfs: fix missing break in switch phrase btrfs: fix several uncheck memory allocations ...
| * btrfs: fix return value check of btrfs_start_transaction()Tsutomu Itoh2011-02-011-0/+2
| | | | | | | | | | | | | | | | The error check of btrfs_start_transaction() is added, and the mistake of the error check on several places is corrected. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Merge branch 'bug-fixes' of git://repo.or.cz/linux-btrfs-devel into btrfs-38Chris Mason2011-01-281-2/+5
| |\
| | * Btrfs: Free correct pointer after using strsepTero Roponen2011-01-271-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | We must save and free the original kstrdup()'ed pointer because strsep() modifies its first argument. Signed-off-by: Tero Roponen <tero.roponen@gmail.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
| | * Btrfs: Fix memory leak on finding existing superIan Kent2011-01-271-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We missed a memory deallocation in commit 450ba0ea. If an existing super block is found at mount and there is no error condition then the pre-allocated tree_root and fs_info are no not used and are not freeded. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
* | | Merge branch 'for-linus' of ↵Linus Torvalds2011-01-171-18/+263
|\ \ \ | |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (25 commits) Btrfs: forced readonly mounts on errors btrfs: Require CAP_SYS_ADMIN for filesystem rebalance Btrfs: don't warn if we get ENOSPC in btrfs_block_rsv_check btrfs: Fix memory leak in btrfs_read_fs_root_no_radix() btrfs: check NULL or not btrfs: Don't pass NULL ptr to func that may deref it. btrfs: mount failure return value fix btrfs: Mem leak in btrfs_get_acl() btrfs: fix wrong free space information of btrfs btrfs: make the chunk allocator utilize the devices better btrfs: restructure find_free_dev_extent() btrfs: fix wrong calculation of stripe size btrfs: try to reclaim some space when chunk allocation fails btrfs: fix wrong data space statistics fs/btrfs: Fix build of ctree Btrfs: fix off by one while setting block groups readonly Btrfs: Add BTRFS_IOC_SUBVOL_GETFLAGS/SETFLAGS ioctls Btrfs: Add readonly snapshots support Btrfs: Refactor btrfs_ioctl_snap_create() btrfs: Extract duplicate decompress code ...
| * | Btrfs: forced readonly mounts on errorsliubo2011-01-171-0/+84
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch comes from "Forced readonly mounts on errors" ideas. As we know, this is the first step in being more fault tolerant of disk corruptions instead of just using BUG() statements. The major content: - add a framework for generating errors that should result in filesystems going readonly. - keep FS state in disk super block. - make sure that all of resource will be freed and released at umount time. - make sure that fter FS is forced readonly on error, there will be no more disk change before FS is corrected. For this, we should stop write operation. After this patch is applied, the conversion from BUG() to such a framework can happen incrementally. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>