aboutsummaryrefslogtreecommitdiffstats
path: root/fs/splice.c
diff options
context:
space:
mode:
authorEric Dumazet <eric.dumazet@gmail.com>2012-04-24 22:12:06 -0400
committerGreg Kroah-Hartman <gregkh@linuxfoundation.org>2012-04-27 09:51:18 -0700
commit8d2228dd95c656e5fc9af2e8776f9c95e269806f (patch)
tree0656babbb4c3a0de40f97f06723e32433f18d95b /fs/splice.c
parentd2491ed1e1c5bf2db6ab44eb5bfe38b48043b2d5 (diff)
downloadkernel_samsung_smdk4412-8d2228dd95c656e5fc9af2e8776f9c95e269806f.zip
kernel_samsung_smdk4412-8d2228dd95c656e5fc9af2e8776f9c95e269806f.tar.gz
kernel_samsung_smdk4412-8d2228dd95c656e5fc9af2e8776f9c95e269806f.tar.bz2
tcp: allow splice() to build full TSO packets
[ This combines upstream commit 2f53384424251c06038ae612e56231b96ab610ee and the follow-on bug fix commit 35f9c09fe9c72eb8ca2b8e89a593e1c151f28fc2 ] vmsplice()/splice(pipe, socket) call do_tcp_sendpages() one page at a time, adding at most 4096 bytes to an skb. (assuming PAGE_SIZE=4096) The call to tcp_push() at the end of do_tcp_sendpages() forces an immediate xmit when pipe is not already filled, and tso_fragment() try to split these skb to MSS multiples. 4096 bytes are usually split in a skb with 2 MSS, and a remaining sub-mss skb (assuming MTU=1500) This makes slow start suboptimal because many small frames are sent to qdisc/driver layers instead of big ones (constrained by cwnd and packets in flight of course) In fact, applications using sendmsg() (adding an additional memory copy) instead of vmsplice()/splice()/sendfile() are a bit faster because of this anomaly, especially if serving small files in environments with large initial [c]wnd. Call tcp_push() only if MSG_MORE is not set in the flags parameter. This bit is automatically provided by splice() internals but for the last page, or on all pages if user specified SPLICE_F_MORE splice() flag. In some workloads, this can reduce number of sent logical packets by an order of magnitude, making zero-copy TCP actually faster than one-copy :) Reported-by: Tom Herbert <therbert@google.com> Cc: Nandita Dukkipati <nanditad@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: H.K. Jerry Chu <hkchu@google.com> Cc: Maciej Żenczykowski <maze@google.com> Cc: Mahesh Bandewar <maheshb@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Diffstat (limited to 'fs/splice.c')
-rw-r--r--fs/splice.c5
1 files changed, 4 insertions, 1 deletions
diff --git a/fs/splice.c b/fs/splice.c
index aa866d3..9d89008 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -31,6 +31,7 @@
#include <linux/uio.h>
#include <linux/security.h>
#include <linux/gfp.h>
+#include <linux/socket.h>
/*
* Attempt to steal a page from a pipe buffer. This should perhaps go into
@@ -691,7 +692,9 @@ static int pipe_to_sendpage(struct pipe_inode_info *pipe,
if (!likely(file->f_op && file->f_op->sendpage))
return -EINVAL;
- more = (sd->flags & SPLICE_F_MORE) || sd->len < sd->total_len;
+ more = (sd->flags & SPLICE_F_MORE) ? MSG_MORE : 0;
+ if (sd->len < sd->total_len)
+ more |= MSG_SENDPAGE_NOTLAST;
return file->f_op->sendpage(file, buf->page, buf->offset,
sd->len, &pos, more);
}