aboutsummaryrefslogtreecommitdiffstats
path: root/fs/aio.c
Commit message (Collapse)AuthorAgeFilesLines
* vfs: make AIO use the proper rw_verify_area() area helpersLinus Torvalds2012-06-011-16/+14
| | | | | | | | | | | | | | | | | | | | | commit a70b52ec1aaeaf60f4739edb1b422827cb6f3893 upstream. We had for some reason overlooked the AIO interface, and it didn't use the proper rw_verify_area() helper function that checks (for example) mandatory locking on the file, and that the size of the access doesn't cause us to overflow the provided offset limits etc. Instead, AIO did just the security_file_permission() thing (that rw_verify_area() also does) directly. This fixes it to do all the proper helper functions, which not only means that now mandatory file locking works with AIO too, we can actually remove lines of code. Reported-by: Manish Honap <manish_honap_vit@yahoo.co.in> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* aio: fix the "too late munmap()" raceAl Viro2012-03-191-8/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit c7b285550544c22bc005ec20978472c9ac7138c6 upstream. Current code has put_ioctx() called asynchronously from aio_fput_routine(); that's done *after* we have killed the request that used to pin ioctx, so there's nothing to stop io_destroy() waiting in wait_for_all_aios() from progressing. As the result, we can end up with async call of put_ioctx() being the last one and possibly happening during exit_mmap() or elf_core_dump(), neither of which expects stray munmap() being done to them... We do need to prevent _freeing_ ioctx until aio_fput_routine() is done with that, but that's all we care about - neither io_destroy() nor exit_aio() will progress past wait_for_all_aios() until aio_fput_routine() does really_put_req(), so the ioctx teardown won't be done until then and we don't care about the contents of ioctx past that point. Since actual freeing of these suckers is RCU-delayed, we don't need to bump ioctx refcount when request goes into list for async removal. All we need is rcu_read_lock held just over the ->ctx_lock-protected area in aio_fput_routine(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Acked-by: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* aio: fix io_setup/io_destroy raceAl Viro2012-03-191-4/+4
| | | | | | | | | | | | | | | | | | | commit 86b62a2cb4fc09037bbce2959d2992962396fd7f upstream. Have ioctx_alloc() return an extra reference, so that caller would drop it on success and not bother with re-grabbing it on failure exit. The current code is obviously broken - io_destroy() from another thread that managed to guess the address io_setup() would've returned would free ioctx right under us; gets especially interesting if aio_context_t * we pass to io_setup() points to PROT_READ mapping, so put_user() fails and we end up doing io_destroy() on kioctx another thread has just got freed... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Benjamin LaHaise <bcrl@kvack.org> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* Merge branch 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds2011-03-241-70/+7
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits) Documentation/iostats.txt: bit-size reference etc. cfq-iosched: removing unnecessary think time checking cfq-iosched: Don't clear queue stats when preempt. blk-throttle: Reset group slice when limits are changed blk-cgroup: Only give unaccounted_time under debug cfq-iosched: Don't set active queue in preempt block: fix non-atomic access to genhd inflight structures block: attempt to merge with existing requests on plug flush block: NULL dereference on error path in __blkdev_get() cfq-iosched: Don't update group weights when on service tree fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away block: Require subsystems to explicitly allocate bio_set integrity mempool jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging fs: make fsync_buffers_list() plug mm: make generic_writepages() use plugging blk-cgroup: Add unaccounted time to timeslice_used. block: fixup plugging stubs for !CONFIG_BLOCK block: remove obsolete comments for blkdev_issue_zeroout. blktrace: Use rq->cmd_flags directly in blk_add_trace_rq. ... Fix up conflicts in fs/{aio.c,super.c}
| * Merge branch 'for-2.6.39/stack-plug' into for-2.6.39/coreJens Axboe2011-03-101-70/+7
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: block/blk-core.c block/blk-flush.c drivers/md/raid1.c drivers/md/raid10.c drivers/md/raid5.c fs/nilfs2/btnode.c fs/nilfs2/mdt.c Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| | * aio: remove request submission batchingJens Axboe2011-03-101-72/+3
| | | | | | | | | | | | | | | | | | | | | This should be useless now that we have on-stack plugging. So lets just kill it. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| | * fs: make aio plugShaohua Li2011-03-101-0/+4
| | | | | | | | | | | | | | | Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| | * block: remove per-queue pluggingJens Axboe2011-03-101-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | Code has been converted over to the new explicit on-stack plugging, and delay users have been converted to use the new API for that. So lets kill off the old plugging along with aops->sync_page(). Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* | | aio: wake all waiters when destroying ctxRoland Dreier2011-03-221-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The test program below will hang because io_getevents() uses add_wait_queue_exclusive(), which means the wake_up() in io_destroy() only wakes up one of the threads. Fix this by using wake_up_all() in the aio code paths where we want to make sure no one gets stuck. // t.c -- compile with gcc -lpthread -laio t.c #include <libaio.h> #include <pthread.h> #include <stdio.h> #include <unistd.h> static const int nthr = 2; void *getev(void *ctx) { struct io_event ev; io_getevents(ctx, 1, 1, &ev, NULL); printf("io_getevents returned\n"); return NULL; } int main(int argc, char *argv[]) { io_context_t ctx = 0; pthread_t thread[nthr]; int i; io_setup(1024, &ctx); for (i = 0; i < nthr; ++i) pthread_create(&thread[i], NULL, getev, ctx); sleep(1); io_destroy(ctx); for (i = 0; i < nthr; ++i) pthread_join(thread[i], NULL); return 0; } Signed-off-by: Roland Dreier <roland@purestorage.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Merge branch 'for-2.6.39' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds2011-03-161-2/+2
|\ \ \ | |/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-2.6.39' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: fix build failure introduced by s/freezeable/freezable/ workqueue: add system_freezeable_wq rds/ib: use system_wq instead of rds_ib_fmr_wq net/9p: replace p9_poll_task with a work net/9p: use system_wq instead of p9_mux_wq xfs: convert to alloc_workqueue() reiserfs: make commit_wq use the default concurrency level ocfs2: use system_wq instead of ocfs2_quota_wq ext4: convert to alloc_workqueue() scsi/scsi_tgt_lib: scsi_tgtd isn't used in memory reclaim path scsi/be2iscsi,qla2xxx: convert to alloc_workqueue() misc/iwmc3200top: use system_wq instead of dedicated workqueues i2o: use alloc_workqueue() instead of create_workqueue() acpi: kacpi*_wq don't need WQ_MEM_RECLAIM fs/aio: aio_wq isn't used in memory reclaim path input/tps6507x-ts: use system_wq instead of dedicated workqueue cpufreq: use system_wq instead of dedicated workqueues wireless/ipw2x00: use system_wq instead of dedicated workqueues arm/omap: use system_wq in mailbox workqueue: use WQ_MEM_RECLAIM instead of WQ_RESCUER
| * | fs/aio: aio_wq isn't used in memory reclaim pathTejun Heo2011-01-261-2/+2
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | aio_wq isn't used during memory reclaim. Convert to alloc_workqueue() without WQ_MEM_RECLAIM. It's possible to use system_wq but given that the number of work items is determined from userland and the work item may block, enforcing strict concurrency limit would be a good idea. Also, move fput_work to system_wq so that aio_wq is used soley to throttle the max concurrency of aio work items and fput_work doesn't interact with other work items. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Jeff Moyer <jmoyer@redhat.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: linux-aio@kvack.org
* | aio: fix race between io_destroy() and io_submit()Jan Kara2011-02-251-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A race can occur when io_submit() races with io_destroy(): CPU1 CPU2 io_submit() do_io_submit() ... ctx = lookup_ioctx(ctx_id); io_destroy() Now do_io_submit() holds the last reference to ctx. ... queue new AIO put_ioctx(ctx) - frees ctx with active AIOs We solve this issue by checking whether ctx is being destroyed in AIO submission path after adding new AIO to ctx. Then we are guaranteed that either io_destroy() waits for new AIO or we see that ctx is being destroyed and bail out. Cc: Nick Piggin <npiggin@kernel.dk> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | aio: fix rcu ioctx lookupNick Piggin2011-02-251-11/+24
|/ | | | | | | | | | | | | | | | | aio-dio-invalidate-failure GPFs in aio_put_req from io_submit. lookup_ioctx doesn't implement the rcu lookup pattern properly. rcu_read_lock does not prevent refcount going to zero, so we might take a refcount on a zero count ioctx. Fix the bug by atomically testing for zero refcount before incrementing. [jack@suse.cz: added comment into the code] Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Nick Piggin <npiggin@kernel.dk> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* aio: check return value of create_workqueue()Namhyung Kim2011-01-171-1/+1
| | | | | Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* aio: remove unused aio_run_iocbs()Jeff Moyer2011-01-131-22/+5
| | | | | | | | | aio_run_iocbs() is not used at all, so get rid of it. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* aio: remove unnecessary checkNamhyung Kim2011-01-131-1/+1
| | | | | | | | | 'nr >= min_nr >= 0' always satisfies 'nr >= 0' so the check is unnecesary. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Acked-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* new helper: ihold()Al Viro2010-10-251-3/+2
| | | | | | Clones an existing reference to inode; caller must already hold one. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* aio: bump i_count instead of using igrabChris Mason2010-10-251-1/+14
| | | | | | | | | | | | | | | | The aio batching code is using igrab to get an extra reference on the inode so it can safely batch. igrab will go ahead and take the global inode spinlock, which can be a bottleneck on large machines doing lots of AIO. In this case, igrab isn't required because we already have a reference on the file handle. It is safe to just bump the i_count directly on the inode. Benchmarking shows this patch brings IOP/s on tons of flash up by about 2.5X. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* aio: do not return ERESTARTSYS as a result of AIOJan Kara2010-09-221-1/+9
| | | | | | | | | | | | | | | | | | | OCFS2 can return ERESTARTSYS from its write function when the process is signalled while waiting for a cluster lock (and the filesystem is mounted with intr mount option). Generally, it seems reasonable to allow filesystems to return this error code from its IO functions. As we must not leak ERESTARTSYS (and similar error codes) to userspace as a result of an AIO operation, we have to properly convert it to EINTR inside AIO code (restarting the syscall isn't really an option because other AIO could have been already submitted by the same io_submit syscall). Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Zach Brown <zach.brown@oracle.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* aio: check for multiplication overflow in do_io_submitJeff Moyer2010-09-141-0/+3
| | | | | | | | | | | | | | | | | | | | | Tavis Ormandy pointed out that do_io_submit does not do proper bounds checking on the passed-in iocb array:        if (unlikely(nr < 0))                return -EINVAL;        if (unlikely(!access_ok(VERIFY_READ, iocbpp, (nr*sizeof(iocbpp)))))                return -EFAULT;                      ^^^^^^^^^^^^^^^^^^ The attached patch checks for overflow, and if it is detected, the number of iocbs submitted is scaled down to a number that will fit in the long.  This is an ok thing to do, as sys_io_submit is documented as returning the number of iocbs submitted, so callers should handle a return value of less than the 'nr' argument passed in. Reported-by: Tavis Ormandy <taviso@cmpxchg8b.com> Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* aio: fix wrong subsystem commentsSatoru Takeuchi2010-08-051-10/+11
| | | | | | | | | | | | | | - sys_io_destroy(): acutually return -EINVAL if the context pointed to is invalidIndex: linux-2.6.33-rc4/fs/aio.c - sys_io_getevents(): An argument specifying timeout is not `when', but `timeout'. - sys_io_getevents(): Should describe what is returned if this syscall succeeds. Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* get rid of the magic around f_count in aioAl Viro2010-05-271-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | __aio_put_req() plays sick games with file refcount. What it wants is fput() from atomic context; it's almost always done with f_count > 1, so they only have to deal with delayed work in rare cases when their reference happens to be the last one. Current code decrements f_count and if it hasn't hit 0, everything is fine. Otherwise it keeps a pointer to struct file (with zero f_count!) around and has delayed work do __fput() on it. Better way to do it: use atomic_long_add_unless( , -1, 1) instead of !atomic_long_dec_and_test(). IOW, decrement it only if it's not the last reference, leave refcount alone if it was. And use normal fput() in delayed work. I've made that atomic_long_add_unless call a new helper - fput_atomic(). Drops a reference to file if it's safe to do in atomic (i.e. if that's not the last one), tells if it had been able to do that. aio.c converted to it, __fput() use is gone. req->ki_file *always* contributes to refcount now. And __fput() became static. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* aio: fix the compat vectored operationsJeff Moyer2010-05-271-24/+41
| | | | | | | | | | | | | | | | | | | | | | | | The aio compat code was not converting the struct iovecs from 32bit to 64bit pointers, causing either EINVAL to be returned from io_getevents, or EFAULT as the result of the I/O. This patch passes a compat flag to io_submit to signal that pointer conversion is necessary for a given iocb array. A variant of this was tested by Michael Tokarev. I have also updated the libaio test harness to exercise this code path with good success. Further, I grabbed a copy of ltp and ran the testcases/kernel/syscall/readv and writev tests there (compiled with -m32 on my 64bit system). All seems happy, but extra eyes on this would be welcome. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: fix CONFIG_COMPAT=n build] Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Reported-by: Michael Tokarev <mjt@tls.msk.ru> Cc: Zach Brown <zach.brown@oracle.com> Cc: <stable@kernel.org> [2.6.35.1] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* aio: remove unused fieldShaohua Li2009-12-161-38/+2
| | | | | | | | | | | Don't know the reason, but it appears ki_wait field of iocb never gets used. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Zach Brown <zach.brown@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* block: move bdi/address_space unplug functions to backing-dev.hJens Axboe2009-10-291-0/+1
| | | | | | | | There's nothing block related about them, the backing device is used by things like NFS etc as well. This gets rid of the need to protect such calls by CONFIG_BLOCK. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* aio: implement request batchingJeff Moyer2009-10-281-2/+59
| | | | | | | | | | | | | | Hi, Some workloads issue batches of small I/O, and the performance is poor due to the call to blk_run_address_space for every single iocb. Nathan Roberts pointed this out, and suggested that by deferring this call until all I/Os in the iocb array are submitted to the block layer, we can realize some impressive performance gains (up to 30% for sequential 4k reads in batches of 16). Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* aio.c: move EXPORT* macros to line after functionH Hartley Sweeten2009-09-231-6/+4
| | | | | | | | | | | | | As mentioned in Documentation/CodingStyle, move EXPORT* macro's to the line immediately after the closing function brace line. Also, move the __initcall() similarly. Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Cc: Zach Brown <zach.brown@oracle.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: move use_mm/unuse_mm from aio.c to mm/Michael S. Tsirkin2009-09-221-46/+1
| | | | | | | | | | | | Anyone who wants to do copy to/from user from a kernel thread, needs use_mm (like what fs/aio has). Move that into mm/, to make reusing and exporting easier down the line, and make aio use it. Next intended user, besides aio, will be vhost-net. Acked-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* eventfd: revised interface and cleanupsDavide Libenzi2009-06-301-17/+7
| | | | | | | | | | | | | | | | | | | | | Change the eventfd interface to de-couple the eventfd memory context, from the file pointer instance. Without such change, there is no clean way to racely free handle the POLLHUP event sent when the last instance of the file* goes away. Also, now the internal eventfd APIs are using the eventfd context instead of the file*. This patch is required by KVM's IRQfd code, which is still under development. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Cc: Gregory Haskins <ghaskins@novell.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Avi Kivity <avi@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* aio: lookup_ioctx can return the wrong value when looking up a bogus contextJeff Moyer2009-03-191-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | The libaio test harness turned up a problem whereby lookup_ioctx on a bogus io context was returning the 1 valid io context from the list (harness/cases/3.p). Because of that, an extra put_iocontext was done, and when the process exited, it hit a BUG_ON in the put_iocontext macro called from exit_aio (since we expect a users count of 1 and instead get 0). The problem was introduced by "aio: make the lookup_ioctx() lockless" (commit abf137dd7712132ee56d5b3143c2ff61a72a5faa). Thanks to Zach for pointing out that hlist_for_each_entry_rcu will not return with a NULL tpos at the end of the loop, even if the entry was not found. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Acked-by: Zach Brown <zach.brown@oracle.com> Acked-by: Jens Axboe <jens.axboe@oracle.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* eventfd: remove fput() call from possible IRQ contextDavide Libenzi2009-03-191-10/+27
| | | | | | | | | | | | | | | | | | | | | Remove a source of fput() call from inside IRQ context. Myself, like Eric, wasn't able to reproduce an fput() call from IRQ context, but Jeff said he was able to, with the attached test program. Independently from this, the bug is conceptually there, so we might be better off fixing it. This patch adds an optimization similar to the one we already do on ->ki_filp, on ->ki_eventfd. Playing with ->f_count directly is not pretty in general, but the alternative here would be to add a brand new delayed fput() infrastructure, that I'm not sure is worth it. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Cc: Zach Brown <zach.brown@oracle.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [CVE-2009-0029] System call wrappers part 16Heiko Carstens2009-01-141-11/+11
| | | | Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
* aio: make the lookup_ioctx() locklessJens Axboe2008-12-291-44/+56
| | | | | | | | | | | | | | The mm->ioctx_list is currently protected by a reader-writer lock, so we always grab that lock on the read side for doing ioctx lookups. As the workload is extremely reader biased, turn this into an rcu hlist so we can make lookup_ioctx() lockless. Get rid of the rwlock and use a spinlock for providing update side exclusion. There's usually only 1 entry on this list, so it doesn't make sense to look into fancier data structures. Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* [PATCH] f_count may wrap aroundAl Viro2008-07-261-3/+3
| | | | | | | make it atomic_long_t; while we are at it, get rid of useless checks in affs, hfs and hpfs - ->open() always has it equal to 1, ->release() - to 0. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* kill PF_BORROWED_MM in favour of PF_KTHREADOleg Nesterov2008-07-251-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Kill PF_BORROWED_MM. Change use_mm/unuse_mm to not play with ->flags, and do s/PF_BORROWED_MM/PF_KTHREAD/ for a couple of other users. No functional changes yet. But this allows us to do further fixes/cleanups. oom_kill/ptrace/etc often check "p->mm != NULL" to filter out the kthreads, this is wrong because of use_mm(). The problem with PF_BORROWED_MM is that we need task_lock() to avoid races. With this patch we can check PF_KTHREAD directly, or use a simple lockless helper: /* The result must not be dereferenced !!! */ struct mm_struct *__get_task_mm(struct task_struct *tsk) { if (tsk->flags & PF_KTHREAD) return NULL; return tsk->mm; } Note also ecard_task(). It runs with ->mm != NULL, but it's the kernel thread without PF_BORROWED_MM. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* uml: activate_mm: remove the dead PF_BORROWED_MM checkOleg Nesterov2008-06-061-4/+0
| | | | | | | | | | | | | | use_mm() was changed to use switch_mm() instead of activate_mm(), since then nobody calls (and nobody should call) activate_mm() with PF_BORROWED_MM bit set. As Jeff Dike pointed out, we can also remove the "old != new" check, it is always true. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* debugobjects: add timer specific object debugging codeThomas Gleixner2008-04-301-3/+2
| | | | | | | | | | | | | | Add calls to the generic object debugging infrastructure and provide fixup functions which allow to keep the system alive when recoverable problems have been detected by the object debugging core code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Greg KH <greg@kroah.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* aio: fix misleading commentsJeff Moyer2008-04-291-4/+1
| | | | | | | | | | | The FIXME comments are inaccurate. The locking comment over lookup_ioctx() is wrong. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Zach Brown <zach.brown@oracle.com> Signed-off-by: Shen Feng <shen@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Remove duplicated unlikely() in IS_ERR()Hirofumi Nakagawa2008-04-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Some drivers have duplicated unlikely() macros. IS_ERR() already has unlikely() in itself. This patch cleans up such pointless code. Signed-off-by: Hirofumi Nakagawa <hnakagawa@miraclelinux.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Jeff Garzik <jeff@garzik.org> Cc: Paul Clements <paul.clements@steeleye.com> Cc: Richard Purdie <rpurdie@rpsys.net> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: David Brownell <david-b@pacbell.net> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: Anton Altaparmakov <aia21@cantab.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Carsten Otte <cotte@de.ibm.com> Cc: Patrick McHardy <kaber@trash.net> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Jaroslav Kysela <perex@perex.cz> Cc: Takashi Iwai <tiwai@suse.de> Acked-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs/aio.c: make 3 functions staticAdrian Bunk2008-04-291-28/+39
| | | | | | | | | | | | | | | Make the following needlessly global functions static: - __put_ioctx() - lookup_ioctx() - io_submit_one() Signed-off-by: Adrian Bunk <bunk@kernel.org> Cc: Zach Brown <zach.brown@oracle.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* aio: io_getevents() should return if io_destroy() is invokedJeff Moyer2008-04-281-1/+11
| | | | | | | | | | | | | | | | This patch wakes up a thread waiting in io_getevents if another thread destroys the context. This was tested using a small program that spawns a thread to wait in io_getevents while the parent thread destroys the io context and then waits for the getevents thread to exit. Without this patch, the program hangs indefinitely. With the patch, the program exits as expected. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Cc: Zach Brown <zach.brown@oracle.com> Cc: Christopher Smith <x@xman.org> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* eventfd/kaio integration fixDavide Libenzi2008-04-111-8/+9
| | | | | | | | | | | | | | | Jeff Roberson discovered a race when using kaio eventfd based notifications. When it occurs it can lead tomissed wakeups and hung userspace. This patch fixes the race by moving the notification inside the spinlocked section of kaio. The operation is safe since eventfd spinlock and kaio one are unrelated. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Cc: Zach Brown <zach.brown@oracle.com> Cc: Jeff Roberson <jroberson@chesapeake.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* asmlinkage_protect sys_io_geteventsRoland McGrath2008-04-101-0/+1
| | | | | | | | | | | Use asmlinkage_protect in sys_io_getevents, because GCC for i386 with CONFIG_FRAME_POINTER=n can decide to clobber an argument word on the stack, i.e. the user struct pt_regs. Here the problem is not a tail call, but just the compiler's use of the stack when it inlines and optimizes the body of the called function. This seems to avoid it. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* aio: bad AIO race in aio_complete() leads to process hangQuentin Barnes2008-03-191-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | My group ran into a AIO process hang on a 2.6.24 kernel with the process sleeping indefinitely in io_getevents(2) waiting for the last wakeup to come and it never would. We ran the tests on x86_64 SMP. The hang only occurred on a Xeon box ("Clovertown") but not a Core2Duo ("Conroe"). On the Xeon, the L2 cache isn't shared between all eight processors, but is L2 is shared between between all two processors on the Core2Duo we use. My analysis of the hang is if you go down to the second while-loop in read_events(), what happens on processor #1: 1) add_wait_queue_exclusive() adds thread to ctx->wait 2) aio_read_evt() to check tail 3) if aio_read_evt() returned 0, call [io_]schedule() and sleep In aio_complete() with processor #2: A) info->tail = tail; B) waitqueue_active(&ctx->wait) C) if waitqueue_active() returned non-0, call wake_up() The way the code is written, step 1 must be seen by all other processors before processor 1 checks for pending events in step 2 (that were recorded by step A) and step A by processor 2 must be seen by all other processors (checked in step 2) before step B is done. The race I believed I was seeing is that steps 1 and 2 were effectively swapped due to the __list_add() being delayed by the L2 cache not shared by some of the other processors. Imagine: proc 2: just before step A proc 1, step 1: adds to ctx->wait, but is not visible by other processors yet proc 1, step 2: checks tail and sees no pending events proc 2, step A: updates tail proc 1, step 3: calls [io_]schedule() and sleeps proc 2, step B: checks ctx->wait, but sees no one waiting, skips wakeup so proc 1 sleeps indefinitely My patch adds a memory barrier between steps A and B. It ensures that the update in step 1 gets seen on processor 2 before continuing. If processor 1 was just before step 1, the memory barrier makes sure that step A (update tail) gets seen by the time processor 1 makes it to step 2 (check tail). Before the patch our AIO process would hang virtually 100% of the time. After the patch, we have yet to see the process ever hang. Signed-off-by: Quentin Barnes <qbarnes+linux@yahoo-inc.com> Reviewed-by: Zach Brown <zach.brown@oracle.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: <stable@kernel.org> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> [ We should probably disallow that "if (waitqueue_active()) wake_up()" coding pattern, because it's so often buggy wrt memory ordering ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* aio: negative offset should return -EINVALRusty Russell2008-02-081-0/+4
| | | | | | | | | | | | An AIO read or write should return -EINVAL if the offset is negative. This check matches the one in pread and pwrite. This was found by the libaio test suite. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Zach Brown <zach.brown@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* aio: partial write should not return error codeRusty Russell2008-02-081-0/+7
| | | | | | | | | | | | | When an AIO write gets an error after writing some data (eg. ENOSPC), it should return the amount written already, not the error. Just like write() is supposed to. This was found by the libaio test suite. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-By: Zach Brown <zach.brown@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs: remove fastcall, it is always emptyHarvey Harrison2008-02-081-9/+8
| | | | | | | [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* core: remove last users of empty FASTCALL macroHarvey Harrison2008-01-301-1/+1
| | | | | | | | FASTCALL is always empty after the x86 removal. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* aio: only account I/O wait time in read_events if there are active requestsJeff Moyer2007-12-051-1/+6
| | | | | | | | | | | | | | | | | On 2.6.24, top started showing 100% iowait on one CPU when a UML instance was running (but completely idle). The UML code sits in io_getevents waiting for an event to be submitted and completed. Fix this by checking ctx->reqs_active before scheduling to determine whether or not we are waiting for I/O. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Cc: Zach Brown <zach.brown@oracle.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Jeff Dike <jdike@addtoit.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Remove struct task_struct::io_waitAlexey Dobriyan2007-10-181-14/+2
| | | | | | | | | | | | Hell knows what happened in commit 63b05203af57e7de4f3bb63b8b81d43bc196d32b during 2.6.9 development. Commit introduced io_wait field which remained write-only than and still remains write-only. Also garbage collect macros which "use" io_wait. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>