From 4b9d9be839fdb7dcd7ce7619a623fd9015a50cda Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 8 Jun 2011 13:35:34 +0000 Subject: inetpeer: remove unused list Andi Kleen and Tim Chen reported huge contention on inetpeer unused_peers.lock, on memcached workload on a 40 core machine, with disabled route cache. It appears we constantly flip peers refcnt between 0 and 1 values, and we must insert/remove peers from unused_peers.list, holding a contended spinlock. Remove this list completely and perform a garbage collection on-the-fly, at lookup time, using the expired nodes we met during the tree traversal. This removes a lot of code, makes locking more standard, and obsoletes two sysctls (inet_peer_gc_mintime and inet_peer_gc_maxtime). This also removes two pointers in inet_peer structure. There is still a false sharing effect because refcnt is in first cache line of object [were the links and keys used by lookups are located], we might move it at the end of inet_peer structure to let this first cache line mostly read by cpus. Signed-off-by: Eric Dumazet CC: Andi Kleen CC: Tim Chen Signed-off-by: David S. Miller --- Documentation/networking/ip-sysctl.txt | 10 ---------- 1 file changed, 10 deletions(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index d3d653a..3dcb26c 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -106,16 +106,6 @@ inet_peer_maxttl - INTEGER when the number of entries in the pool is very small). Measured in seconds. -inet_peer_gc_mintime - INTEGER - Minimum interval between garbage collection passes. This interval is - in effect under high memory pressure on the pool. - Measured in seconds. - -inet_peer_gc_maxtime - INTEGER - Minimum interval between garbage collection passes. This interval is - in effect under low (or absent) memory pressure on the pool. - Measured in seconds. - TCP variables: somaxconn - INTEGER -- cgit v1.1 From a6e1204b82d3e8d7ba1baca35bdb595c722105eb Mon Sep 17 00:00:00 2001 From: Max Matveev Date: Sun, 19 Jun 2011 22:08:10 +0000 Subject: Update description of net.sctp.sctp_rmem and net.sctp.sctp_wmem tunables sctp does not use second and third ("default" and "max") values of sctp_rmem tunable. The format is the same as tcp_rmem but the meaning is different so make the documentation explicit to avoid confusion. sctp_wmem is not used at all. Acked-by: Neil Horman Signed-off-by: Max Matveev Reviewed-by: Shan Wei Signed-off-by: David S. Miller --- Documentation/networking/ip-sysctl.txt | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 3dcb26c..1a6de5b 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -1455,10 +1455,17 @@ sctp_mem - vector of 3 INTEGERs: min, pressure, max Default is calculated at boot time from amount of available memory. sctp_rmem - vector of 3 INTEGERs: min, default, max - See tcp_rmem for a description. + Only the first value ("min") is used, "default" and "max" are + ignored. + + min: Minimal size of receive buffer used by SCTP socket. + It is guaranteed to each SCTP socket (but not association) even + under moderate memory pressure. + + Default: 1 page sctp_wmem - vector of 3 INTEGERs: min, default, max - See tcp_wmem for a description. + Currently this tunable has no effect. addr_scope_policy - INTEGER Control IPv4 address scoping - draft-stewart-tsvwg-sctp-ipv4-00 -- cgit v1.1 From 6539fefd9bec5faf948b03aec9f9078691d0be6b Mon Sep 17 00:00:00 2001 From: Max Matveev Date: Tue, 21 Jun 2011 21:18:13 +0000 Subject: Update documented default values for various TCP/UDP tunables tcp_rmem and tcp_wmem use 1 page as default value for the minimum amount of memory to be used, same as udp_wmem_min and udp_rmem_min. Pages are different size on different architectures - use the right units when describing the defaults. Reviewed-by: Shan Wei Acked-by: Neil Horman Signed-off-by: Max Matveev Signed-off-by: David S. Miller --- Documentation/networking/ip-sysctl.txt | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 1a6de5b..e3df119 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -384,7 +384,7 @@ tcp_rmem - vector of 3 INTEGERs: min, default, max min: Minimal size of receive buffer used by TCP sockets. It is guaranteed to each TCP socket, even under moderate memory pressure. - Default: 8K + Default: 1 page default: initial size of receive buffer used by TCP sockets. This value overrides net.core.rmem_default used by other protocols. @@ -473,7 +473,7 @@ tcp_window_scaling - BOOLEAN tcp_wmem - vector of 3 INTEGERs: min, default, max min: Amount of memory reserved for send buffers for TCP sockets. Each TCP socket has rights to use it due to fact of its birth. - Default: 4K + Default: 1 page default: initial size of send buffer used by TCP sockets. This value overrides net.core.wmem_default used by other protocols. @@ -543,13 +543,13 @@ udp_rmem_min - INTEGER Minimal size of receive buffer used by UDP sockets in moderation. Each UDP socket is able to use the size for receiving data, even if total pages of UDP sockets exceed udp_mem pressure. The unit is byte. - Default: 4096 + Default: 1 page udp_wmem_min - INTEGER Minimal size of send buffer used by UDP sockets in moderation. Each UDP socket is able to use the size for sending data, even if total pages of UDP sockets exceed udp_mem pressure. The unit is byte. - Default: 4096 + Default: 1 page CIPSOv4 Variables: -- cgit v1.1 From 14205aa21c8041d7e940ee9bcde87824dc00a08a Mon Sep 17 00:00:00 2001 From: Aloisio Almeida Jr Date: Fri, 1 Jul 2011 19:31:38 -0300 Subject: NFC: add Documentation/networking/nfc.txt Signed-off-by: Aloisio Almeida Jr Signed-off-by: Lauro Ramos Venancio Signed-off-by: John W. Linville --- Documentation/networking/nfc.txt | 128 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 128 insertions(+) create mode 100644 Documentation/networking/nfc.txt (limited to 'Documentation/networking') diff --git a/Documentation/networking/nfc.txt b/Documentation/networking/nfc.txt new file mode 100644 index 0000000..b24c29b --- /dev/null +++ b/Documentation/networking/nfc.txt @@ -0,0 +1,128 @@ +Linux NFC subsystem +=================== + +The Near Field Communication (NFC) subsystem is required to standardize the +NFC device drivers development and to create an unified userspace interface. + +This document covers the architecture overview, the device driver interface +description and the userspace interface description. + +Architecture overview +--------------------- + +The NFC subsystem is responsible for: + - NFC adapters management; + - Polling for targets; + - Low-level data exchange; + +The subsystem is divided in some parts. The 'core' is responsible for +providing the device driver interface. On the other side, it is also +responsible for providing an interface to control operations and low-level +data exchange. + +The control operations are available to userspace via generic netlink. + +The low-level data exchange interface is provided by the new socket family +PF_NFC. The NFC_SOCKPROTO_RAW performs raw communication with NFC targets. + + + +--------------------------------------+ + | USER SPACE | + +--------------------------------------+ + ^ ^ + | low-level | control + | data exchange | operations + | | + | v + | +-----------+ + | AF_NFC | netlink | + | socket +-----------+ + | raw ^ + | | + v v + +---------+ +-----------+ + | rawsock | <--------> | core | + +---------+ +-----------+ + ^ + | + v + +-----------+ + | driver | + +-----------+ + +Device Driver Interface +----------------------- + +When registering on the NFC subsystem, the device driver must inform the core +of the set of supported NFC protocols and the set of ops callbacks. The ops +callbacks that must be implemented are the following: + +* start_poll - setup the device to poll for targets +* stop_poll - stop on progress polling operation +* activate_target - select and initialize one of the targets found +* deactivate_target - deselect and deinitialize the selected target +* data_exchange - send data and receive the response (transceive operation) + +Userspace interface +-------------------- + +The userspace interface is divided in control operations and low-level data +exchange operation. + +CONTROL OPERATIONS: + +Generic netlink is used to implement the interface to the control operations. +The operations are composed by commands and events, all listed below: + +* NFC_CMD_GET_DEVICE - get specific device info or dump the device list +* NFC_CMD_START_POLL - setup a specific device to polling for targets +* NFC_CMD_STOP_POLL - stop the polling operation in a specific device +* NFC_CMD_GET_TARGET - dump the list of targets found by a specific device + +* NFC_EVENT_DEVICE_ADDED - reports an NFC device addition +* NFC_EVENT_DEVICE_REMOVED - reports an NFC device removal +* NFC_EVENT_TARGETS_FOUND - reports START_POLL results when 1 or more targets +are found + +The user must call START_POLL to poll for NFC targets, passing the desired NFC +protocols through NFC_ATTR_PROTOCOLS attribute. The device remains in polling +state until it finds any target. However, the user can stop the polling +operation by calling STOP_POLL command. In this case, it will be checked if +the requester of STOP_POLL is the same of START_POLL. + +If the polling operation finds one or more targets, the event TARGETS_FOUND is +sent (including the device id). The user must call GET_TARGET to get the list of +all targets found by such device. Each reply message has target attributes with +relevant information such as the supported NFC protocols. + +All polling operations requested through one netlink socket are stopped when +it's closed. + +LOW-LEVEL DATA EXCHANGE: + +The userspace must use PF_NFC sockets to perform any data communication with +targets. All NFC sockets use AF_NFC: + +struct sockaddr_nfc { + sa_family_t sa_family; + __u32 dev_idx; + __u32 target_idx; + __u32 nfc_protocol; +}; + +To establish a connection with one target, the user must create an +NFC_SOCKPROTO_RAW socket and call the 'connect' syscall with the sockaddr_nfc +struct correctly filled. All information comes from NFC_EVENT_TARGETS_FOUND +netlink event. As a target can support more than one NFC protocol, the user +must inform which protocol it wants to use. + +Internally, 'connect' will result in an activate_target call to the driver. +When the socket is closed, the target is deactivated. + +The data format exchanged through the sockets is NFC protocol dependent. For +instance, when communicating with MIFARE tags, the data exchanged are MIFARE +commands and their responses. + +The first received package is the response to the first sent package and so +on. In order to allow valid "empty" responses, every data received has a NULL +header of 1 byte. -- cgit v1.1 From d804c6f2122ae4ffac21757c7774007e38093390 Mon Sep 17 00:00:00 2001 From: Shan Wei Date: Thu, 7 Jul 2011 00:27:49 -0700 Subject: net: doc: fix compile warning of no format arguments in ifenslave.c Fix following warning in ifenslave.c with gcc version 4.5.2 (Ubuntu/Linaro 4.5.2-8ubuntu4). Documentation/networking/ifenslave.c:263:4: warning: format not a string literal and no format arguments Documentation/networking/ifenslave.c:271:3: warning: format not a string literal and no format arguments Documentation/networking/ifenslave.c:277:3: warning: format not a string literal and no format arguments Documentation/networking/ifenslave.c:285:3: warning: format not a string literal and no format arguments Documentation/networking/ifenslave.c:291:3: warning: format not a string literal and no format arguments Documentation/networking/ifenslave.c:292:3: warning: format not a string literal and no format arguments Documentation/networking/ifenslave.c:312:4: warning: format not a string literal and no format arguments Documentation/networking/ifenslave.c:323:3: warning: format not a string literal and no format arguments Documentation/networking/ifenslave.c:342:4: warning: format not a string literal and no format arguments Signed-off-by: Shan Wei Signed-off-by: David S. Miller --- Documentation/networking/ifenslave.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/ifenslave.c b/Documentation/networking/ifenslave.c index 2bac961..65968fb 100644 --- a/Documentation/networking/ifenslave.c +++ b/Documentation/networking/ifenslave.c @@ -260,7 +260,7 @@ int main(int argc, char *argv[]) case 'V': opt_V++; exclusive++; break; case '?': - fprintf(stderr, usage_msg); + fprintf(stderr, "%s", usage_msg); res = 2; goto out; } @@ -268,13 +268,13 @@ int main(int argc, char *argv[]) /* options check */ if (exclusive > 1) { - fprintf(stderr, usage_msg); + fprintf(stderr, "%s", usage_msg); res = 2; goto out; } if (opt_v || opt_V) { - printf(version); + printf("%s", version); if (opt_V) { res = 0; goto out; @@ -282,14 +282,14 @@ int main(int argc, char *argv[]) } if (opt_u) { - printf(usage_msg); + printf("%s", usage_msg); res = 0; goto out; } if (opt_h) { - printf(usage_msg); - printf(help_msg); + printf("%s", usage_msg); + printf("%s", help_msg); res = 0; goto out; } @@ -309,7 +309,7 @@ int main(int argc, char *argv[]) goto out; } else { /* Just show usage */ - fprintf(stderr, usage_msg); + fprintf(stderr, "%s", usage_msg); res = 2; goto out; } @@ -320,7 +320,7 @@ int main(int argc, char *argv[]) master_ifname = *spp++; if (master_ifname == NULL) { - fprintf(stderr, usage_msg); + fprintf(stderr, "%s", usage_msg); res = 2; goto out; } @@ -339,7 +339,7 @@ int main(int argc, char *argv[]) if (slave_ifname == NULL) { if (opt_d || opt_c) { - fprintf(stderr, usage_msg); + fprintf(stderr, "%s", usage_msg); res = 2; goto out; } -- cgit v1.1 From e5b1de1f5ebe0200e988e195fefb6c7396de6e20 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20Miros=C5=82aw?= Date: Tue, 12 Jul 2011 22:27:00 -0700 Subject: net: Add documentation for netdev features handling MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit v2: incorporated suggestions from Randy Dunlap Signed-off-by: Michał Mirosław Signed-off-by: David S. Miller --- Documentation/networking/netdev-features.txt | 154 +++++++++++++++++++++++++++ 1 file changed, 154 insertions(+) create mode 100644 Documentation/networking/netdev-features.txt (limited to 'Documentation/networking') diff --git a/Documentation/networking/netdev-features.txt b/Documentation/networking/netdev-features.txt new file mode 100644 index 0000000..4b1c0dc --- /dev/null +++ b/Documentation/networking/netdev-features.txt @@ -0,0 +1,154 @@ +Netdev features mess and how to get out from it alive +===================================================== + +Author: + Michał Mirosław + + + + Part I: Feature sets +====================== + +Long gone are the days when a network card would just take and give packets +verbatim. Today's devices add multiple features and bugs (read: offloads) +that relieve an OS of various tasks like generating and checking checksums, +splitting packets, classifying them. Those capabilities and their state +are commonly referred to as netdev features in Linux kernel world. + +There are currently three sets of features relevant to the driver, and +one used internally by network core: + + 1. netdev->hw_features set contains features whose state may possibly + be changed (enabled or disabled) for a particular device by user's + request. This set should be initialized in ndo_init callback and not + changed later. + + 2. netdev->features set contains features which are currently enabled + for a device. This should be changed only by network core or in + error paths of ndo_set_features callback. + + 3. netdev->vlan_features set contains features whose state is inherited + by child VLAN devices (limits netdev->features set). This is currently + used for all VLAN devices whether tags are stripped or inserted in + hardware or software. + + 4. netdev->wanted_features set contains feature set requested by user. + This set is filtered by ndo_fix_features callback whenever it or + some device-specific conditions change. This set is internal to + networking core and should not be referenced in drivers. + + + + Part II: Controlling enabled features +======================================= + +When current feature set (netdev->features) is to be changed, new set +is calculated and filtered by calling ndo_fix_features callback +and netdev_fix_features(). If the resulting set differs from current +set, it is passed to ndo_set_features callback and (if the callback +returns success) replaces value stored in netdev->features. +NETDEV_FEAT_CHANGE notification is issued after that whenever current +set might have changed. + +The following events trigger recalculation: + 1. device's registration, after ndo_init returned success + 2. user requested changes in features state + 3. netdev_update_features() is called + +ndo_*_features callbacks are called with rtnl_lock held. Missing callbacks +are treated as always returning success. + +A driver that wants to trigger recalculation must do so by calling +netdev_update_features() while holding rtnl_lock. This should not be done +from ndo_*_features callbacks. netdev->features should not be modified by +driver except by means of ndo_fix_features callback. + + + + Part III: Implementation hints +================================ + + * ndo_fix_features: + +All dependencies between features should be resolved here. The resulting +set can be reduced further by networking core imposed limitations (as coded +in netdev_fix_features()). For this reason it is safer to disable a feature +when its dependencies are not met instead of forcing the dependency on. + +This callback should not modify hardware nor driver state (should be +stateless). It can be called multiple times between successive +ndo_set_features calls. + +Callback must not alter features contained in NETIF_F_SOFT_FEATURES or +NETIF_F_NEVER_CHANGE sets. The exception is NETIF_F_VLAN_CHALLENGED but +care must be taken as the change won't affect already configured VLANs. + + * ndo_set_features: + +Hardware should be reconfigured to match passed feature set. The set +should not be altered unless some error condition happens that can't +be reliably detected in ndo_fix_features. In this case, the callback +should update netdev->features to match resulting hardware state. +Errors returned are not (and cannot be) propagated anywhere except dmesg. +(Note: successful return is zero, >0 means silent error.) + + + + Part IV: Features +=================== + +For current list of features, see include/linux/netdev_features.h. +This section describes semantics of some of them. + + * Transmit checksumming + +For complete description, see comments near the top of include/linux/skbuff.h. + +Note: NETIF_F_HW_CSUM is a superset of NETIF_F_IP_CSUM + NETIF_F_IPV6_CSUM. +It means that device can fill TCP/UDP-like checksum anywhere in the packets +whatever headers there might be. + + * Transmit TCP segmentation offload + +NETIF_F_TSO_ECN means that hardware can properly split packets with CWR bit +set, be it TCPv4 (when NETIF_F_TSO is enabled) or TCPv6 (NETIF_F_TSO6). + + * Transmit DMA from high memory + +On platforms where this is relevant, NETIF_F_HIGHDMA signals that +ndo_start_xmit can handle skbs with frags in high memory. + + * Transmit scatter-gather + +Those features say that ndo_start_xmit can handle fragmented skbs: +NETIF_F_SG --- paged skbs (skb_shinfo()->frags), NETIF_F_FRAGLIST --- +chained skbs (skb->next/prev list). + + * Software features + +Features contained in NETIF_F_SOFT_FEATURES are features of networking +stack. Driver should not change behaviour based on them. + + * LLTX driver (deprecated for hardware drivers) + +NETIF_F_LLTX should be set in drivers that implement their own locking in +transmit path or don't need locking at all (e.g. software tunnels). +In ndo_start_xmit, it is recommended to use a try_lock and return +NETDEV_TX_LOCKED when the spin lock fails. The locking should also properly +protect against other callbacks (the rules you need to find out). + +Don't use it for new drivers. + + * netns-local device + +NETIF_F_NETNS_LOCAL is set for devices that are not allowed to move between +network namespaces (e.g. loopback). + +Don't use it in drivers. + + * VLAN challenged + +NETIF_F_VLAN_CHALLENGED should be set for devices which can't cope with VLAN +headers. Some drivers set this because the cards can't handle the bigger MTU. +[FIXME: Those cases could be fixed in VLAN code by allowing only reduced-MTU +VLANs. This may be not useful, though.] -- cgit v1.1 From 557e2a394de0d142ba930ff3cdb2909419414e06 Mon Sep 17 00:00:00 2001 From: Giuseppe CAVALLARO Date: Wed, 20 Jul 2011 00:05:24 +0000 Subject: stmmac: improve and up-to-date the documentation This patch adds new information for the driver especially about its platform structure fields. Signed-off-by: Giuseppe Cavallaro Signed-off-by: David S. Miller --- Documentation/networking/stmmac.txt | 200 ++++++++++++++++++++++++------------ 1 file changed, 136 insertions(+), 64 deletions(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/stmmac.txt b/Documentation/networking/stmmac.txt index 80a7a34..57a2410 100644 --- a/Documentation/networking/stmmac.txt +++ b/Documentation/networking/stmmac.txt @@ -7,7 +7,7 @@ This is the driver for the MAC 10/100/1000 on-chip Ethernet controllers (Synopsys IP blocks); it has been fully tested on STLinux platforms. Currently this network device driver is for all STM embedded MAC/GMAC -(7xxx SoCs). Other platforms start using it i.e. ARM SPEAr. +(i.e. 7xxx/5xxx SoCs) and it's known working on other platforms i.e. ARM SPEAr. DWC Ether MAC 10/100/1000 Universal version 3.41a and DWC Ether MAC 10/100 Universal version 4.0 have been used for developing the first code @@ -71,7 +71,7 @@ Several performance tests on STM platforms showed this optimisation allows to sp the CPU while having the maximum throughput. 4.4) WOL -Wake up on Lan feature through Magic Frame is only supported for the GMAC +Wake up on Lan feature through Magic and Unicast frames are supported for the GMAC core. 4.5) DMA descriptors @@ -91,11 +91,15 @@ LRO is not supported. The driver is compatible with PAL to work with PHY and GPHY devices. 4.9) Platform information -Several information came from the platform; please refer to the -driver's Header file in include/linux directory. +Several driver's information can be passed through the platform +These are included in the include/linux/stmmac.h header file +and detailed below as well: -struct plat_stmmacenet_data { + struct plat_stmmacenet_data { int bus_id; + int phy_addr; + int interface; + struct stmmac_mdio_bus_data *mdio_bus_data; int pbl; int clk_csr; int has_gmac; @@ -103,67 +107,135 @@ struct plat_stmmacenet_data { int tx_coe; int bugged_jumbo; int pmt; - void (*fix_mac_speed)(void *priv, unsigned int speed); - void (*bus_setup)(unsigned long ioaddr); -#ifdef CONFIG_STM_DRIVERS - struct stm_pad_config *pad_config; -#endif - void *bsp_priv; -}; + int force_sf_dma_mode; + void (*fix_mac_speed)(void *priv, unsigned int speed); + void (*bus_setup)(void __iomem *ioaddr); + int (*init)(struct platform_device *pdev); + void (*exit)(struct platform_device *pdev); + void *bsp_priv; + }; Where: -- pbl (Programmable Burst Length) is maximum number of - beats to be transferred in one DMA transaction. - GMAC also enables the 4xPBL by default. -- fix_mac_speed and bus_setup are used to configure internal target - registers (on STM platforms); -- has_gmac: GMAC core is on board (get it at run-time in the next step); -- bus_id: bus identifier. -- tx_coe: core is able to perform the tx csum in HW. -- enh_desc: if sets the MAC will use the enhanced descriptor structure. -- clk_csr: CSR Clock range selection. -- bugged_jumbo: some HWs are not able to perform the csum in HW for - over-sized frames due to limited buffer sizes. Setting this - flag the csum will be done in SW on JUMBO frames. - -struct plat_stmmacphy_data { - int bus_id; - int phy_addr; - unsigned int phy_mask; - int interface; - int (*phy_reset)(void *priv); - void *priv; -}; + o bus_id: bus identifier. + o phy_addr: the physical address can be passed from the platform. + If it is set to -1 the driver will automatically + detect it at run-time by probing all the 32 addresses. + o interface: PHY device's interface. + o mdio_bus_data: specific platform fields for the MDIO bus. + o pbl: the Programmable Burst Length is maximum number of beats to + be transferred in one DMA transaction. + GMAC also enables the 4xPBL by default. + o clk_csr: CSR Clock range selection. + o has_gmac: uses the GMAC core. + o enh_desc: if sets the MAC will use the enhanced descriptor structure. + o tx_coe: core is able to perform the tx csum in HW. + o bugged_jumbo: some HWs are not able to perform the csum in HW for + over-sized frames due to limited buffer sizes. + Setting this flag the csum will be done in SW on + JUMBO frames. + o pmt: core has the embedded power module (optional). + o force_sf_dma_mode: force DMA to use the Store and Forward mode + instead of the Threshold. + o fix_mac_speed: this callback is used for modifying some syscfg registers + (on ST SoCs) according to the link speed negotiated by the + physical layer . + o bus_setup: perform HW setup of the bus. For example, on some ST platforms + this field is used to configure the AMBA bridge to generate more + efficient STBus traffic. + o init/exit: callbacks used for calling a custom initialisation; + this is sometime necessary on some platforms (e.g. ST boxes) + where the HW needs to have set some PIO lines or system cfg + registers. + o custom_cfg: this is a custom configuration that can be passed while + initialising the resources. + +The we have: + + struct stmmac_mdio_bus_data { + int bus_id; + int (*phy_reset)(void *priv); + unsigned int phy_mask; + int *irqs; + int probed_phy_irq; + }; Where: -- bus_id: bus identifier; -- phy_addr: physical address used for the attached phy device; - set it to -1 to get it at run-time; -- interface: physical MII interface mode; -- phy_reset: hook to reset HW function. - -SOURCES: -- Kconfig -- Makefile -- stmmac_main.c: main network device driver; -- stmmac_mdio.c: mdio functions; -- stmmac_ethtool.c: ethtool support; -- stmmac_timer.[ch]: timer code used for mitigating the driver dma interrupts - Only tested on ST40 platforms based. -- stmmac.h: private driver structure; -- common.h: common definitions and VFTs; -- descs.h: descriptor structure definitions; -- dwmac1000_core.c: GMAC core functions; -- dwmac1000_dma.c: dma functions for the GMAC chip; -- dwmac1000.h: specific header file for the GMAC; -- dwmac100_core: MAC 100 core and dma code; -- dwmac100_dma.c: dma funtions for the MAC chip; -- dwmac1000.h: specific header file for the MAC; -- dwmac_lib.c: generic DMA functions shared among chips -- enh_desc.c: functions for handling enhanced descriptors -- norm_desc.c: functions for handling normal descriptors - -TODO: -- XGMAC controller is not supported. -- Review the timer optimisation code to use an embedded device that seems to be + o bus_id: bus identifier; + o phy_reset: hook to reset the phy device attached to the bus. + o phy_mask: phy mask passed when register the MDIO bus within the driver. + o irqs: list of IRQs, one per PHY. + o probed_phy_irq: if irqs is NULL, use this for probed PHY. + +Below an example how the structures above are using on ST platforms. + + static struct plat_stmmacenet_data stxYYY_ethernet_platform_data = { + .pbl = 32, + .has_gmac = 0, + .enh_desc = 0, + .fix_mac_speed = stxYYY_ethernet_fix_mac_speed, + | + |-> to write an internal syscfg + | on this platform when the + | link speed changes from 10 to + | 100 and viceversa + .init = &stmmac_claim_resource, + | + |-> On ST SoC this calls own "PAD" + | manager framework to claim + | all the resources necessary + | (GPIO ...). The .custom_cfg field + | is used to pass a custom config. +}; + +Below the usage of the stmmac_mdio_bus_data: on this SoC, in fact, +there are two MAC cores: one MAC is for MDIO Bus/PHY emulation +with fixed_link support. + +static struct stmmac_mdio_bus_data stmmac1_mdio_bus = { + .bus_id = 1, + | + |-> phy device on the bus_id 1 + .phy_reset = phy_reset; + | + |-> function to provide the phy_reset on this board + .phy_mask = 0, +}; + +static struct fixed_phy_status stmmac0_fixed_phy_status = { + .link = 1, + .speed = 100, + .duplex = 1, +}; + +During the board's device_init we can configure the first +MAC for fixed_link by calling: + fixed_phy_add(PHY_POLL, 1, &stmmac0_fixed_phy_status));) +and the second one, with a real PHY device attached to the bus, +by using the stmmac_mdio_bus_data structure (to provide the id, the +reset procedure etc). + +4.10) List of source files: + o Kconfig + o Makefile + o stmmac_main.c: main network device driver; + o stmmac_mdio.c: mdio functions; + o stmmac_ethtool.c: ethtool support; + o stmmac_timer.[ch]: timer code used for mitigating the driver dma interrupts + Only tested on ST40 platforms based. + o stmmac.h: private driver structure; + o common.h: common definitions and VFTs; + o descs.h: descriptor structure definitions; + o dwmac1000_core.c: GMAC core functions; + o dwmac1000_dma.c: dma functions for the GMAC chip; + o dwmac1000.h: specific header file for the GMAC; + o dwmac100_core: MAC 100 core and dma code; + o dwmac100_dma.c: dma funtions for the MAC chip; + o dwmac1000.h: specific header file for the MAC; + o dwmac_lib.c: generic DMA functions shared among chips + o enh_desc.c: functions for handling enhanced descriptors + o norm_desc.c: functions for handling normal descriptors + +5) TODO: + o XGMAC is not supported. + o Review the timer optimisation code to use an embedded device that will be available in new chip generations. -- cgit v1.1 From 8fb4e1399137dbbf7f1071bc723b5eff8ad46d1a Mon Sep 17 00:00:00 2001 From: Jesper Juhl Date: Mon, 1 Aug 2011 17:59:44 -0700 Subject: Documentation/bonding.txt: Update to 3.x version numbers Signed-off-by: Jesper Juhl Signed-off-by: David S. Miller --- Documentation/networking/bonding.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt index 675612f..5dd960d 100644 --- a/Documentation/networking/bonding.txt +++ b/Documentation/networking/bonding.txt @@ -599,7 +599,7 @@ num_unsol_na affect only the active-backup mode. These options were added for bonding versions 3.3.0 and 3.4.0 respectively. - From Linux 2.6.40 and bonding version 3.7.1, these notifications + From Linux 3.0 and bonding version 3.7.1, these notifications are generated by the ipv4 and ipv6 code and the numbers of repetitions cannot be set independently. -- cgit v1.1 From 025890b4ed433b9c9e0f221bb806d42d049c87fe Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Nicolas=20de=20Peslo=C3=BCan?= Date: Sat, 6 Aug 2011 07:06:39 +0000 Subject: bonding: document two undocumented options. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Commit 655f8919d549ad1872e24d826b6ce42530516d2e bonding: add min links parameter to 802.3ad and commit ebd8e4977a87cb81d93c62a9bff0102a9713722f bonding: add all_slaves_active parameter introduced new options to bonding, but didn't provide the documentation for those options. V2: add the default value for both options. V3: document the exact behavior of min_links default value. Signed-off-by: Nicolas de Pesloüan Signed-off-by: David S. Miller --- Documentation/networking/bonding.txt | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) (limited to 'Documentation/networking') diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt index 5dd960d..91df678 100644 --- a/Documentation/networking/bonding.txt +++ b/Documentation/networking/bonding.txt @@ -238,6 +238,18 @@ ad_select This option was added in bonding version 3.4.0. +all_slaves_active + + Specifies that duplicate frames (received on inactive ports) should be + dropped (0) or delivered (1). + + Normally, bonding will drop duplicate frames (received on inactive + ports), which is desirable for most users. But there are some times + it is nice to allow duplicate frames to be delivered. + + The default value is 0 (drop duplicate frames received on inactive + ports). + arp_interval Specifies the ARP link monitoring frequency in milliseconds. @@ -433,6 +445,23 @@ miimon determined. See the High Availability section for additional information. The default value is 0. +min_links + + Specifies the minimum number of links that must be active before + asserting carrier. It is similar to the Cisco EtherChannel min-links + feature. This allows setting the minimum number of member ports that + must be up (link-up state) before marking the bond device as up + (carrier on). This is useful for situations where higher level services + such as clustering want to ensure a minimum number of low bandwidth + links are active before switchover. This option only affect 802.3ad + mode. + + The default value is 0. This will cause carrier to be asserted (for + 802.3ad mode) whenever there is an active aggregator, regardless of the + number of available links in that aggregator. Note that, because an + aggregator cannot be active without at least one available link, + setting this option to 0 or to 1 has the exact same effect. + mode Specifies one of the bonding policies. The default is -- cgit v1.1 From 56c07271307b4a20802005692b2b70dfe13d72e8 Mon Sep 17 00:00:00 2001 From: Willem de Bruijn Date: Tue, 9 Aug 2011 04:20:48 +0000 Subject: net: add Documentation/networking/scaling.txt Describes RSS, RPS, RFS, accelerated RFS, and XPS. This version incorporates comments by Randy Dunlap and Rick Jones. Besides text cleanup, it adds an explicit "Suggested Configuration" heading to each section. Signed-off-by: Willem de Bruijn Acked-By: Rick Jones Signed-off-by: David S. Miller --- Documentation/networking/scaling.txt | 371 +++++++++++++++++++++++++++++++++++ 1 file changed, 371 insertions(+) create mode 100644 Documentation/networking/scaling.txt (limited to 'Documentation/networking') diff --git a/Documentation/networking/scaling.txt b/Documentation/networking/scaling.txt new file mode 100644 index 0000000..7254b4b --- /dev/null +++ b/Documentation/networking/scaling.txt @@ -0,0 +1,371 @@ +Scaling in the Linux Networking Stack + + +Introduction +============ + +This document describes a set of complementary techniques in the Linux +networking stack to increase parallelism and improve performance for +multi-processor systems. + +The following technologies are described: + + RSS: Receive Side Scaling + RPS: Receive Packet Steering + RFS: Receive Flow Steering + Accelerated Receive Flow Steering + XPS: Transmit Packet Steering + + +RSS: Receive Side Scaling +========================= + +Contemporary NICs support multiple receive and transmit descriptor queues +(multi-queue). On reception, a NIC can send different packets to different +queues to distribute processing among CPUs. The NIC distributes packets by +applying a filter to each packet that assigns it to one of a small number +of logical flows. Packets for each flow are steered to a separate receive +queue, which in turn can be processed by separate CPUs. This mechanism is +generally known as “Receive-side Scaling” (RSS). The goal of RSS and +the other scaling techniques to increase performance uniformly. +Multi-queue distribution can also be used for traffic prioritization, but +that is not the focus of these techniques. + +The filter used in RSS is typically a hash function over the network +and/or transport layer headers-- for example, a 4-tuple hash over +IP addresses and TCP ports of a packet. The most common hardware +implementation of RSS uses a 128-entry indirection table where each entry +stores a queue number. The receive queue for a packet is determined +by masking out the low order seven bits of the computed hash for the +packet (usually a Toeplitz hash), taking this number as a key into the +indirection table and reading the corresponding value. + +Some advanced NICs allow steering packets to queues based on +programmable filters. For example, webserver bound TCP port 80 packets +can be directed to their own receive queue. Such “n-tuple” filters can +be configured from ethtool (--config-ntuple). + +==== RSS Configuration + +The driver for a multi-queue capable NIC typically provides a kernel +module parameter for specifying the number of hardware queues to +configure. In the bnx2x driver, for instance, this parameter is called +num_queues. A typical RSS configuration would be to have one receive queue +for each CPU if the device supports enough queues, or otherwise at least +one for each cache domain at a particular cache level (L1, L2, etc.). + +The indirection table of an RSS device, which resolves a queue by masked +hash, is usually programmed by the driver at initialization. The +default mapping is to distribute the queues evenly in the table, but the +indirection table can be retrieved and modified at runtime using ethtool +commands (--show-rxfh-indir and --set-rxfh-indir). Modifying the +indirection table could be done to give different queues different +relative weights. + +== RSS IRQ Configuration + +Each receive queue has a separate IRQ associated with it. The NIC triggers +this to notify a CPU when new packets arrive on the given queue. The +signaling path for PCIe devices uses message signaled interrupts (MSI-X), +that can route each interrupt to a particular CPU. The active mapping +of queues to IRQs can be determined from /proc/interrupts. By default, +an IRQ may be handled on any CPU. Because a non-negligible part of packet +processing takes place in receive interrupt handling, it is advantageous +to spread receive interrupts between CPUs. To manually adjust the IRQ +affinity of each interrupt see Documentation/IRQ-affinity. Some systems +will be running irqbalance, a daemon that dynamically optimizes IRQ +assignments and as a result may override any manual settings. + +== Suggested Configuration + +RSS should be enabled when latency is a concern or whenever receive +interrupt processing forms a bottleneck. Spreading load between CPUs +decreases queue length. For low latency networking, the optimal setting +is to allocate as many queues as there are CPUs in the system (or the +NIC maximum, if lower). Because the aggregate number of interrupts grows +with each additional queue, the most efficient high-rate configuration +is likely the one with the smallest number of receive queues where no +CPU that processes receive interrupts reaches 100% utilization. Per-cpu +load can be observed using the mpstat utility. + + +RPS: Receive Packet Steering +============================ + +Receive Packet Steering (RPS) is logically a software implementation of +RSS. Being in software, it is necessarily called later in the datapath. +Whereas RSS selects the queue and hence CPU that will run the hardware +interrupt handler, RPS selects the CPU to perform protocol processing +above the interrupt handler. This is accomplished by placing the packet +on the desired CPU’s backlog queue and waking up the CPU for processing. +RPS has some advantages over RSS: 1) it can be used with any NIC, +2) software filters can easily be added to hash over new protocols, +3) it does not increase hardware device interrupt rate (although it does +introduce inter-processor interrupts (IPIs)). + +RPS is called during bottom half of the receive interrupt handler, when +a driver sends a packet up the network stack with netif_rx() or +netif_receive_skb(). These call the get_rps_cpu() function, which +selects the queue that should process a packet. + +The first step in determining the target CPU for RPS is to calculate a +flow hash over the packet’s addresses or ports (2-tuple or 4-tuple hash +depending on the protocol). This serves as a consistent hash of the +associated flow of the packet. The hash is either provided by hardware +or will be computed in the stack. Capable hardware can pass the hash in +the receive descriptor for the packet; this would usually be the same +hash used for RSS (e.g. computed Toeplitz hash). The hash is saved in +skb->rx_hash and can be used elsewhere in the stack as a hash of the +packet’s flow. + +Each receive hardware queue has an associated list of CPUs to which +RPS may enqueue packets for processing. For each received packet, +an index into the list is computed from the flow hash modulo the size +of the list. The indexed CPU is the target for processing the packet, +and the packet is queued to the tail of that CPU’s backlog queue. At +the end of the bottom half routine, IPIs are sent to any CPUs for which +packets have been queued to their backlog queue. The IPI wakes backlog +processing on the remote CPU, and any queued packets are then processed +up the networking stack. + +==== RPS Configuration + +RPS requires a kernel compiled with the CONFIG_RPS kconfig symbol (on +by default for SMP). Even when compiled in, RPS remains disabled until +explicitly configured. The list of CPUs to which RPS may forward traffic +can be configured for each receive queue using a sysfs file entry: + + /sys/class/net//queues/rx-/rps_cpus + +This file implements a bitmap of CPUs. RPS is disabled when it is zero +(the default), in which case packets are processed on the interrupting +CPU. Documentation/IRQ-affinity.txt explains how CPUs are assigned to +the bitmap. + +== Suggested Configuration + +For a single queue device, a typical RPS configuration would be to set +the rps_cpus to the CPUs in the same cache domain of the interrupting +CPU. If NUMA locality is not an issue, this could also be all CPUs in +the system. At high interrupt rate, it might be wise to exclude the +interrupting CPU from the map since that already performs much work. + +For a multi-queue system, if RSS is configured so that a hardware +receive queue is mapped to each CPU, then RPS is probably redundant +and unnecessary. If there are fewer hardware queues than CPUs, then +RPS might be beneficial if the rps_cpus for each queue are the ones that +share the same cache domain as the interrupting CPU for that queue. + + +RFS: Receive Flow Steering +========================== + +While RPS steers packets solely based on hash, and thus generally +provides good load distribution, it does not take into account +application locality. This is accomplished by Receive Flow Steering +(RFS). The goal of RFS is to increase datacache hitrate by steering +kernel processing of packets to the CPU where the application thread +consuming the packet is running. RFS relies on the same RPS mechanisms +to enqueue packets onto the backlog of another CPU and to wake up that +CPU. + +In RFS, packets are not forwarded directly by the value of their hash, +but the hash is used as index into a flow lookup table. This table maps +flows to the CPUs where those flows are being processed. The flow hash +(see RPS section above) is used to calculate the index into this table. +The CPU recorded in each entry is the one which last processed the flow. +If an entry does not hold a valid CPU, then packets mapped to that entry +are steered using plain RPS. Multiple table entries may point to the +same CPU. Indeed, with many flows and few CPUs, it is very likely that +a single application thread handles flows with many different flow hashes. + +rps_sock_table is a global flow table that contains the *desired* CPU for +flows: the CPU that is currently processing the flow in userspace. Each +table value is a CPU index that is updated during calls to recvmsg and +sendmsg (specifically, inet_recvmsg(), inet_sendmsg(), inet_sendpage() +and tcp_splice_read()). + +When the scheduler moves a thread to a new CPU while it has outstanding +receive packets on the old CPU, packets may arrive out of order. To +avoid this, RFS uses a second flow table to track outstanding packets +for each flow: rps_dev_flow_table is a table specific to each hardware +receive queue of each device. Each table value stores a CPU index and a +counter. The CPU index represents the *current* CPU onto which packets +for this flow are enqueued for further kernel processing. Ideally, kernel +and userspace processing occur on the same CPU, and hence the CPU index +in both tables is identical. This is likely false if the scheduler has +recently migrated a userspace thread while the kernel still has packets +enqueued for kernel processing on the old CPU. + +The counter in rps_dev_flow_table values records the length of the current +CPU's backlog when a packet in this flow was last enqueued. Each backlog +queue has a head counter that is incremented on dequeue. A tail counter +is computed as head counter + queue length. In other words, the counter +in rps_dev_flow_table[i] records the last element in flow i that has +been enqueued onto the currently designated CPU for flow i (of course, +entry i is actually selected by hash and multiple flows may hash to the +same entry i). + +And now the trick for avoiding out of order packets: when selecting the +CPU for packet processing (from get_rps_cpu()) the rps_sock_flow table +and the rps_dev_flow table of the queue that the packet was received on +are compared. If the desired CPU for the flow (found in the +rps_sock_flow table) matches the current CPU (found in the rps_dev_flow +table), the packet is enqueued onto that CPU’s backlog. If they differ, +the current CPU is updated to match the desired CPU if one of the +following is true: + +- The current CPU's queue head counter >= the recorded tail counter + value in rps_dev_flow[i] +- The current CPU is unset (equal to NR_CPUS) +- The current CPU is offline + +After this check, the packet is sent to the (possibly updated) current +CPU. These rules aim to ensure that a flow only moves to a new CPU when +there are no packets outstanding on the old CPU, as the outstanding +packets could arrive later than those about to be processed on the new +CPU. + +==== RFS Configuration + +RFS is only available if the kconfig symbol CONFIG_RFS is enabled (on +by default for SMP). The functionality remains disabled until explicitly +configured. The number of entries in the global flow table is set through: + + /proc/sys/net/core/rps_sock_flow_entries + +The number of entries in the per-queue flow table are set through: + + /sys/class/net//queues/tx-/rps_flow_cnt + +== Suggested Configuration + +Both of these need to be set before RFS is enabled for a receive queue. +Values for both are rounded up to the nearest power of two. The +suggested flow count depends on the expected number of active connections +at any given time, which may be significantly less than the number of open +connections. We have found that a value of 32768 for rps_sock_flow_entries +works fairly well on a moderately loaded server. + +For a single queue device, the rps_flow_cnt value for the single queue +would normally be configured to the same value as rps_sock_flow_entries. +For a multi-queue device, the rps_flow_cnt for each queue might be +configured as rps_sock_flow_entries / N, where N is the number of +queues. So for instance, if rps_flow_entries is set to 32768 and there +are 16 configured receive queues, rps_flow_cnt for each queue might be +configured as 2048. + + +Accelerated RFS +=============== + +Accelerated RFS is to RFS what RSS is to RPS: a hardware-accelerated load +balancing mechanism that uses soft state to steer flows based on where +the application thread consuming the packets of each flow is running. +Accelerated RFS should perform better than RFS since packets are sent +directly to a CPU local to the thread consuming the data. The target CPU +will either be the same CPU where the application runs, or at least a CPU +which is local to the application thread’s CPU in the cache hierarchy. + +To enable accelerated RFS, the networking stack calls the +ndo_rx_flow_steer driver function to communicate the desired hardware +queue for packets matching a particular flow. The network stack +automatically calls this function every time a flow entry in +rps_dev_flow_table is updated. The driver in turn uses a device specific +method to program the NIC to steer the packets. + +The hardware queue for a flow is derived from the CPU recorded in +rps_dev_flow_table. The stack consults a CPU to hardware queue map which +is maintained by the NIC driver. This is an auto-generated reverse map of +the IRQ affinity table shown by /proc/interrupts. Drivers can use +functions in the cpu_rmap (“CPU affinity reverse map”) kernel library +to populate the map. For each CPU, the corresponding queue in the map is +set to be one whose processing CPU is closest in cache locality. + +==== Accelerated RFS Configuration + +Accelerated RFS is only available if the kernel is compiled with +CONFIG_RFS_ACCEL and support is provided by the NIC device and driver. +It also requires that ntuple filtering is enabled via ethtool. The map +of CPU to queues is automatically deduced from the IRQ affinities +configured for each receive queue by the driver, so no additional +configuration should be necessary. + +== Suggested Configuration + +This technique should be enabled whenever one wants to use RFS and the +NIC supports hardware acceleration. + +XPS: Transmit Packet Steering +============================= + +Transmit Packet Steering is a mechanism for intelligently selecting +which transmit queue to use when transmitting a packet on a multi-queue +device. To accomplish this, a mapping from CPU to hardware queue(s) is +recorded. The goal of this mapping is usually to assign queues +exclusively to a subset of CPUs, where the transmit completions for +these queues are processed on a CPU within this set. This choice +provides two benefits. First, contention on the device queue lock is +significantly reduced since fewer CPUs contend for the same queue +(contention can be eliminated completely if each CPU has its own +transmit queue). Secondly, cache miss rate on transmit completion is +reduced, in particular for data cache lines that hold the sk_buff +structures. + +XPS is configured per transmit queue by setting a bitmap of CPUs that +may use that queue to transmit. The reverse mapping, from CPUs to +transmit queues, is computed and maintained for each network device. +When transmitting the first packet in a flow, the function +get_xps_queue() is called to select a queue. This function uses the ID +of the running CPU as a key into the CPU-to-queue lookup table. If the +ID matches a single queue, that is used for transmission. If multiple +queues match, one is selected by using the flow hash to compute an index +into the set. + +The queue chosen for transmitting a particular flow is saved in the +corresponding socket structure for the flow (e.g. a TCP connection). +This transmit queue is used for subsequent packets sent on the flow to +prevent out of order (ooo) packets. The choice also amortizes the cost +of calling get_xps_queues() over all packets in the connection. To avoid +ooo packets, the queue for a flow can subsequently only be changed if +skb->ooo_okay is set for a packet in the flow. This flag indicates that +there are no outstanding packets in the flow, so the transmit queue can +change without the risk of generating out of order packets. The +transport layer is responsible for setting ooo_okay appropriately. TCP, +for instance, sets the flag when all data for a connection has been +acknowledged. + +==== XPS Configuration + +XPS is only available if the kconfig symbol CONFIG_XPS is enabled (on by +default for SMP). The functionality remains disabled until explicitly +configured. To enable XPS, the bitmap of CPUs that may use a transmit +queue is configured using the sysfs file entry: + +/sys/class/net//queues/tx-/xps_cpus + +== Suggested Configuration + +For a network device with a single transmission queue, XPS configuration +has no effect, since there is no choice in this case. In a multi-queue +system, XPS is preferably configured so that each CPU maps onto one queue. +If there are as many queues as there are CPUs in the system, then each +queue can also map onto one CPU, resulting in exclusive pairings that +experience no contention. If there are fewer queues than CPUs, then the +best CPUs to share a given queue are probably those that share the cache +with the CPU that processes transmit completions for that queue +(transmit interrupts). + + +Further Information +=================== +RPS and RFS were introduced in kernel 2.6.35. XPS was incorporated into +2.6.38. Original patches were submitted by Tom Herbert +(therbert@google.com) + +Accelerated RFS was introduced in 2.6.35. Original patches were +submitted by Ben Hutchings (bhutchings@solarflare.com) + +Authors: +Tom Herbert (therbert@google.com) +Willem de Bruijn (willemb@google.com) -- cgit v1.1 From b88cf73d9278a5838e3ac2b670ab3b4ff533ea17 Mon Sep 17 00:00:00 2001 From: Willem de Bruijn Date: Thu, 11 Aug 2011 14:39:59 +0000 Subject: net: add missing entries to Documentation/networking/00-INDEX A simple janitor duty patch that adds a one sentence overview to 00-INDEX for all files that lacked it. - does not add entries for subdirectories - does not modify existing entries. Signed-off-by: David S. Miller --- Documentation/networking/00-INDEX | 116 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 116 insertions(+) (limited to 'Documentation/networking') diff --git a/Documentation/networking/00-INDEX b/Documentation/networking/00-INDEX index 4edd78d..811252b 100644 --- a/Documentation/networking/00-INDEX +++ b/Documentation/networking/00-INDEX @@ -1,13 +1,21 @@ 00-INDEX - this file +3c359.txt + - information on the 3Com TokenLink Velocity XL (3c5359) driver. 3c505.txt - information on the 3Com EtherLink Plus (3c505) driver. +3c509.txt + - information on the 3Com Etherlink III Series Ethernet cards. 6pack.txt - info on the 6pack protocol, an alternative to KISS for AX.25 DLINK.txt - info on the D-Link DE-600/DE-620 parallel port pocket adapters PLIP.txt - PLIP: The Parallel Line Internet Protocol device driver +README.ipw2100 + - README for the Intel PRO/Wireless 2100 driver. +README.ipw2200 + - README for the Intel PRO/Wireless 2915ABG and 2200BG driver. README.sb1000 - info on General Instrument/NextLevel SURFboard1000 cable modem. alias.txt @@ -20,8 +28,12 @@ atm.txt - info on where to get ATM programs and support for Linux. ax25.txt - info on using AX.25 and NET/ROM code for Linux +batman-adv.txt + - B.A.T.M.A.N routing protocol on top of layer 2 Ethernet Frames. baycom.txt - info on the driver for Baycom style amateur radio modems +bonding.txt + - Linux Ethernet Bonding Driver HOWTO: link aggregation in Linux. bridge.txt - where to get user space programs for ethernet bridging with Linux. can.txt @@ -34,32 +46,60 @@ cxacru.txt - Conexant AccessRunner USB ADSL Modem cxacru-cf.py - Conexant AccessRunner USB ADSL Modem configuration file parser +cxgb.txt + - Release Notes for the Chelsio N210 Linux device driver. +dccp.txt + - the Datagram Congestion Control Protocol (DCCP) (RFC 4340..42). de4x5.txt - the Digital EtherWORKS DE4?? and DE5?? PCI Ethernet driver decnet.txt - info on using the DECnet networking layer in Linux. depca.txt - the Digital DEPCA/EtherWORKS DE1?? and DE2?? LANCE Ethernet driver +dl2k.txt + - README for D-Link DL2000-based Gigabit Ethernet Adapters (dl2k.ko). +dm9000.txt + - README for the Simtec DM9000 Network driver. dmfe.txt - info on the Davicom DM9102(A)/DM9132/DM9801 fast ethernet driver. +dns_resolver.txt + - The DNS resolver module allows kernel servies to make DNS queries. +driver.txt + - Softnet driver issues. e100.txt - info on Intel's EtherExpress PRO/100 line of 10/100 boards e1000.txt - info on Intel's E1000 line of gigabit ethernet boards +e1000e.txt + - README for the Intel Gigabit Ethernet Driver (e1000e). eql.txt - serial IP load balancing ewrk3.txt - the Digital EtherWORKS 3 DE203/4/5 Ethernet driver +fib_trie.txt + - Level Compressed Trie (LC-trie) notes: a structure for routing. filter.txt - Linux Socket Filtering fore200e.txt - FORE Systems PCA-200E/SBA-200E ATM NIC driver info. framerelay.txt - info on using Frame Relay/Data Link Connection Identifier (DLCI). +gen_stats.txt + - Generic networking statistics for netlink users. +generic_hdlc.txt + - The generic High Level Data Link Control (HDLC) layer. generic_netlink.txt - info on Generic Netlink +gianfar.txt + - Gianfar Ethernet Driver. ieee802154.txt - Linux IEEE 802.15.4 implementation, API and drivers +ifenslave.c + - Configure network interfaces for parallel routing (bonding). +igb.txt + - README for the Intel Gigabit Ethernet Driver (igb). +igbvf.txt + - README for the Intel Gigabit Ethernet Driver (igbvf). ip-sysctl.txt - /proc/sys/net/ipv4/* variables ip_dynaddr.txt @@ -68,41 +108,117 @@ ipddp.txt - AppleTalk-IP Decapsulation and AppleTalk-IP Encapsulation iphase.txt - Interphase PCI ATM (i)Chip IA Linux driver info. +ipv6.txt + - Options to the ipv6 kernel module. +ipvs-sysctl.txt + - Per-inode explanation of the /proc/sys/net/ipv4/vs interface. irda.txt - where to get IrDA (infrared) utilities and info for Linux. +ixgb.txt + - README for the Intel 10 Gigabit Ethernet Driver (ixgb). +ixgbe.txt + - README for the Intel 10 Gigabit Ethernet Driver (ixgbe). +ixgbevf.txt + - README for the Intel Virtual Function (VF) Driver (ixgbevf). +l2tp.txt + - User guide to the L2TP tunnel protocol. lapb-module.txt - programming information of the LAPB module. ltpc.txt - the Apple or Farallon LocalTalk PC card driver +mac80211-injection.txt + - HOWTO use packet injection with mac80211 multicast.txt - Behaviour of cards under Multicast +multiqueue.txt + - HOWTO for multiqueue network device support. +netconsole.txt + - The network console module netconsole.ko: configuration and notes. +netdev-features.txt + - Network interface "feature mess and how to get out from it alive". netdevices.txt - info on network device driver functions exported to the kernel. +netif-msg.txt + - Design of the network interface message level setting (NETIF_MSG_*). +nfc.txt + - The Linux Near Field Communication (NFS) subsystem. olympic.txt - IBM PCI Pit/Pit-Phy/Olympic Token Ring driver info. +operstates.txt + - Overview of network interface operational states. +packet_mmap.txt + - User guide to memory mapped packet socket rings (PACKET_[RT]X_RING). +phonet.txt + - The Phonet packet protocol used in Nokia cellular modems. +phy.txt + - The PHY abstraction layer. +pktgen.txt + - User guide to the kernel packet generator (pktgen.ko). policy-routing.txt - IP policy-based routing +ppp_generic.txt + - Information about the generic PPP driver. +proc_net_tcp.txt + - Per inode overview of the /proc/net/tcp and /proc/net/tcp6 interfaces. +radiotap-headers.txt + - Background on radiotap headers. ray_cs.txt - Raylink Wireless LAN card driver info. +rds.txt + - Background on the reliable, ordered datagram delivery method RDS. +regulatory.txt + - Overview of the Linux wireless regulatory infrastructure. +rxrpc.txt + - Guide to the RxRPC protocol. +s2io.txt + - Release notes for Neterion Xframe I/II 10GbE driver. +scaling.txt + - Explanation of network scaling techniques: RSS, RPS, RFS, aRFS, XPS. +sctp.txt + - Notes on the Linux kernel implementation of the SCTP protocol. +secid.txt + - Explanation of the secid member in flow structures. skfp.txt - SysKonnect FDDI (SK-5xxx, Compaq Netelligent) driver info. smc9.txt - the driver for SMC's 9000 series of Ethernet cards smctr.txt - SMC TokenCard TokenRing Linux driver info. +spider-net.txt + - README for the Spidernet Driver (as found in PS3 / Cell BE). +stmmac.txt + - README for the STMicro Synopsys Ethernet driver. +tc-actions-env-rules.txt + - rules for traffic control (tc) actions. +timestamping.txt + - overview of network packet timestamping variants. tcp.txt - short blurb on how TCP output takes place. +tcp-thin.txt + - kernel tuning options for low rate 'thin' TCP streams. tlan.txt - ThunderLAN (Compaq Netelligent 10/100, Olicom OC-2xxx) driver info. tms380tr.txt - SysKonnect Token Ring ISA/PCI adapter driver info. +tproxy.txt + - Transparent proxy support user guide. tuntap.txt - TUN/TAP device driver, allowing user space Rx/Tx of packets. +udplite.txt + - UDP-Lite protocol (RFC 3828) introduction. vortex.txt - info on using 3Com Vortex (3c590, 3c592, 3c595, 3c597) Ethernet cards. +vxge.txt + - README for the Neterion X3100 PCIe Server Adapter. x25.txt - general info on X.25 development. x25-iface.txt - description of the X.25 Packet Layer to LAPB device interface. +xfrm_proc.txt + - description of the statistics package for XFRM. +xfrm_sync.txt + - sync patches for XFRM enable migration of an SA between hosts. +xfrm_sysctl.txt + - description of the XFRM configuration options. z8530drv.txt - info about Linux driver for Z8530 based HDLC cards for AX.25 -- cgit v1.1 From 320f24e482e6b390c608c6afec253405f9ab7436 Mon Sep 17 00:00:00 2001 From: Willem de Bruijn Date: Thu, 11 Aug 2011 14:41:48 +0000 Subject: net: minor update to Documentation/networking/scaling.txt Incorporate last comments about hyperthreading, interrupt coalescing and the definition of cache domains into the network scaling document scaling.txt Signed-off-by: Willem de Bruijn Signed-off-by: David S. Miller --- Documentation/networking/scaling.txt | 23 +++++++++++++++-------- 1 file changed, 15 insertions(+), 8 deletions(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/scaling.txt b/Documentation/networking/scaling.txt index 7254b4b..58fd741 100644 --- a/Documentation/networking/scaling.txt +++ b/Documentation/networking/scaling.txt @@ -52,7 +52,8 @@ module parameter for specifying the number of hardware queues to configure. In the bnx2x driver, for instance, this parameter is called num_queues. A typical RSS configuration would be to have one receive queue for each CPU if the device supports enough queues, or otherwise at least -one for each cache domain at a particular cache level (L1, L2, etc.). +one for each memory domain, where a memory domain is a set of CPUs that +share a particular memory level (L1, L2, NUMA node, etc.). The indirection table of an RSS device, which resolves a queue by masked hash, is usually programmed by the driver at initialization. The @@ -82,11 +83,17 @@ RSS should be enabled when latency is a concern or whenever receive interrupt processing forms a bottleneck. Spreading load between CPUs decreases queue length. For low latency networking, the optimal setting is to allocate as many queues as there are CPUs in the system (or the -NIC maximum, if lower). Because the aggregate number of interrupts grows -with each additional queue, the most efficient high-rate configuration +NIC maximum, if lower). The most efficient high-rate configuration is likely the one with the smallest number of receive queues where no -CPU that processes receive interrupts reaches 100% utilization. Per-cpu -load can be observed using the mpstat utility. +receive queue overflows due to a saturated CPU, because in default +mode with interrupt coalescing enabled, the aggregate number of +interrupts (and thus work) grows with each additional queue. + +Per-cpu load can be observed using the mpstat utility, but note that on +processors with hyperthreading (HT), each hyperthread is represented as +a separate CPU. For interrupt handling, HT has shown no benefit in +initial tests, so limit the number of queues to the number of CPU cores +in the system. RPS: Receive Packet Steering @@ -145,7 +152,7 @@ the bitmap. == Suggested Configuration For a single queue device, a typical RPS configuration would be to set -the rps_cpus to the CPUs in the same cache domain of the interrupting +the rps_cpus to the CPUs in the same memory domain of the interrupting CPU. If NUMA locality is not an issue, this could also be all CPUs in the system. At high interrupt rate, it might be wise to exclude the interrupting CPU from the map since that already performs much work. @@ -154,7 +161,7 @@ For a multi-queue system, if RSS is configured so that a hardware receive queue is mapped to each CPU, then RPS is probably redundant and unnecessary. If there are fewer hardware queues than CPUs, then RPS might be beneficial if the rps_cpus for each queue are the ones that -share the same cache domain as the interrupting CPU for that queue. +share the same memory domain as the interrupting CPU for that queue. RFS: Receive Flow Steering @@ -326,7 +333,7 @@ The queue chosen for transmitting a particular flow is saved in the corresponding socket structure for the flow (e.g. a TCP connection). This transmit queue is used for subsequent packets sent on the flow to prevent out of order (ooo) packets. The choice also amortizes the cost -of calling get_xps_queues() over all packets in the connection. To avoid +of calling get_xps_queues() over all packets in the flow. To avoid ooo packets, the queue for a flow can subsequently only be changed if skb->ooo_okay is set for a packet in the flow. This flag indicates that there are no outstanding packets in the flow, so the transmit queue can -- cgit v1.1 From b81693d9149c598302e8eb9c20cb20330d922c8e Mon Sep 17 00:00:00 2001 From: Jiri Pirko Date: Tue, 16 Aug 2011 06:29:02 +0000 Subject: net: remove ndo_set_multicast_list callback Remove no longer used operation. Signed-off-by: Jiri Pirko Signed-off-by: David S. Miller --- Documentation/networking/netdevices.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/netdevices.txt b/Documentation/networking/netdevices.txt index 87b3d15..8935834 100644 --- a/Documentation/networking/netdevices.txt +++ b/Documentation/networking/netdevices.txt @@ -73,7 +73,7 @@ dev->hard_start_xmit: has to lock by itself when needed. It is recommended to use a try lock for this and return NETDEV_TX_LOCKED when the spin lock fails. The locking there should also properly protect against - set_multicast_list. Note that the use of NETIF_F_LLTX is deprecated. + set_rx_mode. Note that the use of NETIF_F_LLTX is deprecated. Don't use it for new drivers. Context: Process with BHs disabled or BH (timer), @@ -92,7 +92,7 @@ dev->tx_timeout: Context: BHs disabled Notes: netif_queue_stopped() is guaranteed true -dev->set_multicast_list: +dev->set_rx_mode: Synchronization: netif_tx_lock spinlock. Context: BHs disabled -- cgit v1.1 From 2d5b2c5ca0d3ebe707386b3add365496460cf918 Mon Sep 17 00:00:00 2001 From: Willem de Bruijn Date: Thu, 18 Aug 2011 21:30:37 -0700 Subject: net: netdev-features.txt update to Documentation/networking/00-INDEX MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Update netdev-features.txt entry in 00-INDEX to incorporate feedback by Michał Mirosław. v2: restored tabs that were inadvertently changed to spaces in v1. sorry for the error. Signed-off-by: Willem de Bruijn Signed-off-by: David S. Miller --- Documentation/networking/00-INDEX | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/00-INDEX b/Documentation/networking/00-INDEX index 811252b..bbce121 100644 --- a/Documentation/networking/00-INDEX +++ b/Documentation/networking/00-INDEX @@ -135,7 +135,7 @@ multiqueue.txt netconsole.txt - The network console module netconsole.ko: configuration and notes. netdev-features.txt - - Network interface "feature mess and how to get out from it alive". + - Network interface features API description. netdevices.txt - info on network device driver functions exported to the kernel. netif-msg.txt -- cgit v1.1 From d5c073caf050bc713271a02e016b1672d9b7b935 Mon Sep 17 00:00:00 2001 From: Geoffrey Thomas Date: Mon, 22 Aug 2011 11:28:57 -0700 Subject: net: Documentation: RFC 2553bis is now RFC 3493 Signed-off-by: Geoffrey Thomas Signed-off-by: David S. Miller --- Documentation/networking/ip-sysctl.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index db2a406..8154699 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -992,7 +992,7 @@ bindv6only - BOOLEAN TRUE: disable IPv4-mapped address feature FALSE: enable IPv4-mapped address feature - Default: FALSE (as specified in RFC2553bis) + Default: FALSE (as specified in RFC3493) IPv6 Fragmentation: -- cgit v1.1 From 5f30a4ab4ac40a71ce7e2aaaab782284553b21a4 Mon Sep 17 00:00:00 2001 From: Simon Wunderlich Date: Sun, 21 Aug 2011 15:19:08 +0200 Subject: batman-adv: update README (date & ap isolation sysfs file) Signed-off-by: Simon Wunderlich Signed-off-by: Marek Lindner --- Documentation/networking/batman-adv.txt | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/batman-adv.txt b/Documentation/networking/batman-adv.txt index 88d4afb..c86d03f 100644 --- a/Documentation/networking/batman-adv.txt +++ b/Documentation/networking/batman-adv.txt @@ -1,4 +1,4 @@ -[state: 17-04-2011] +[state: 21-08-2011] BATMAN-ADV ---------- @@ -68,9 +68,9 @@ All mesh wide settings can be found in batman's own interface folder: # ls /sys/class/net/bat0/mesh/ -# aggregated_ogms gw_bandwidth hop_penalty -# bonding gw_mode orig_interval -# fragmentation gw_sel_class vis_mode +# aggregated_ogms fragmentation gw_sel_class vis_mode +# ap_isolation gw_bandwidth hop_penalty +# bonding gw_mode orig_interval There is a special folder for debugging information: -- cgit v1.1 From 6b59e3191daade2b975eeec1c71c591eb5c86b7b Mon Sep 17 00:00:00 2001 From: Marcos Paulo de Souza Date: Tue, 30 Aug 2011 05:33:57 +0000 Subject: Documentation: networking: dmfe.txt: Remove the maintainer of orphan networking driver The dmfe module is a orphan driver, and with this was removed the maintainer of the documentation. Signed-off-by: Marcos Paulo de Souza Signed-off-by: David S. Miller --- Documentation/networking/dmfe.txt | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/dmfe.txt b/Documentation/networking/dmfe.txt index 8006c22..25320bf 100644 --- a/Documentation/networking/dmfe.txt +++ b/Documentation/networking/dmfe.txt @@ -1,3 +1,5 @@ +Note: This driver doesn't have a maintainer. + Davicom DM9102(A)/DM9132/DM9801 fast ethernet driver for Linux. This program is free software; you can redistribute it and/or @@ -55,7 +57,6 @@ Test and make sure PCI latency is now correct for all cases. Authors: Sten Wang : Original Author -Tobias Ringstrom : Current Maintainer Contributors: -- cgit v1.1 From 4f2f25f9f04a92aab31e3bc1dcb84bec33acc773 Mon Sep 17 00:00:00 2001 From: Giuseppe CAVALLARO Date: Thu, 1 Sep 2011 21:51:42 +0000 Subject: stmmac: update the doc with new info about the driver's debug (v3) Signed-off-by: Giuseppe Cavallaro Signed-off-by: David S. Miller --- Documentation/networking/stmmac.txt | 33 ++++++++++++++++++++++++++++++++- 1 file changed, 32 insertions(+), 1 deletion(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/stmmac.txt b/Documentation/networking/stmmac.txt index 57a2410..40ec92c 100644 --- a/Documentation/networking/stmmac.txt +++ b/Documentation/networking/stmmac.txt @@ -235,7 +235,38 @@ reset procedure etc). o enh_desc.c: functions for handling enhanced descriptors o norm_desc.c: functions for handling normal descriptors -5) TODO: +5) Debug Information + +The driver exports many information i.e. internal statistics, +debug information, MAC and DMA registers etc. + +These can be read in several ways depending on the +type of the information actually needed. + +For example a user can be use the ethtool support +to get statistics: e.g. using: ethtool -S ethX +(that shows the Management counters (MMC) if supported) +or sees the MAC/DMA registers: e.g. using: ethtool -d ethX + +Compiling the Kernel with CONFIG_DEBUG_FS and enabling the +STMMAC_DEBUG_FS option the driver will export the following +debugfs entries: + +/sys/kernel/debug/stmmaceth/descriptors_status + To show the DMA TX/RX descriptor rings + +Developer can also use the "debug" module parameter to get +further debug information. + +In the end, there are other macros (that cannot be enabled +via menuconfig) to turn-on the RX/TX DMA debugging, +specific MAC core debug printk etc. Others to enable the +debug in the TX and RX processes. +All these are only useful during the developing stage +and should never enabled inside the code for general usage. +In fact, these can generate an huge amount of debug messages. + +6) TODO: o XGMAC is not supported. o Review the timer optimisation code to use an embedded device that will be available in new chip generations. -- cgit v1.1 From 026359bc6eddfdc2d2e684bf0b51691649b90f33 Mon Sep 17 00:00:00 2001 From: Tore Anderson Date: Sun, 28 Aug 2011 23:47:33 +0000 Subject: ipv6: Send ICMPv6 RSes only when RAs are accepted This patch improves the logic determining when to send ICMPv6 Router Solicitations, so that they are 1) always sent when the kernel is accepting Router Advertisements, and 2) never sent when the kernel is not accepting RAs. In other words, the operational setting of the "accept_ra" sysctl is used. The change also makes the special "Hybrid Router" forwarding mode ("forwarding" sysctl set to 2) operate exactly the same as the standard Router mode (forwarding=1). The only difference between the two was that RSes was being sent in the Hybrid Router mode only. The sysctl documentation describing the special Hybrid Router mode has therefore been removed. Rationale for the change: Currently, the value of forwarding sysctl is the only thing determining whether or not to send RSes. If it has the value 0 or 2, they are sent, otherwise they are not. This leads to inconsistent behaviour in the following cases: * accept_ra=0, forwarding=0 * accept_ra=0, forwarding=2 * accept_ra=1, forwarding=2 * accept_ra=2, forwarding=1 In the first three cases, the kernel will send RSes, even though it will not accept any RAs received in reply. In the last case, it will not send any RSes, even though it will accept and process any RAs received. (Most routers will send unsolicited RAs periodically, so suppressing RSes in the last case will merely delay auto-configuration, not prevent it.) Also, it is my opinion that having the forwarding sysctl control RS sending behaviour (completely independent of whether RAs are being accepted or not) is simply not what most users would intuitively expect to be the case. Signed-off-by: Tore Anderson Signed-off-by: David S. Miller --- Documentation/networking/ip-sysctl.txt | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index db2a406..f2716df 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -1045,6 +1045,11 @@ conf/interface/*: accept_ra - BOOLEAN Accept Router Advertisements; autoconfigure using them. + It also determines whether or not to transmit Router + Solicitations. If and only if the functional setting is to + accept Router Advertisements, Router Solicitations will be + transmitted. + Possible values are: 0 Do not accept Router Advertisements. 1 Accept Router Advertisements if forwarding is disabled. @@ -1115,14 +1120,14 @@ forwarding - BOOLEAN Possible values are: 0 Forwarding disabled 1 Forwarding enabled - 2 Forwarding enabled (Hybrid Mode) FALSE (0): By default, Host behaviour is assumed. This means: 1. IsRouter flag is not set in Neighbour Advertisements. - 2. Router Solicitations are being sent when necessary. + 2. If accept_ra is TRUE (default), transmit Router + Solicitations. 3. If accept_ra is TRUE (default), accept Router Advertisements (and do autoconfiguration). 4. If accept_redirects is TRUE (default), accept Redirects. @@ -1133,16 +1138,10 @@ forwarding - BOOLEAN This means exactly the reverse from the above: 1. IsRouter flag is set in Neighbour Advertisements. - 2. Router Solicitations are not sent. + 2. Router Solicitations are not sent unless accept_ra is 2. 3. Router Advertisements are ignored unless accept_ra is 2. 4. Redirects are ignored. - TRUE (2): - - Hybrid mode. Same behaviour as TRUE, except for: - - 2. Router Solicitations are being sent when necessary. - Default: 0 (disabled) if global forwarding is disabled (default), otherwise 1 (enabled). -- cgit v1.1 From 395cf9691d72173d8cdaa613c5f0255f993af94b Mon Sep 17 00:00:00 2001 From: Paul Bolle Date: Mon, 15 Aug 2011 02:02:26 +0200 Subject: doc: fix broken references There are numerous broken references to Documentation files (in other Documentation files, in comments, etc.). These broken references are caused by typo's in the references, and by renames or removals of the Documentation files. Some broken references are simply odd. Fix these broken references, sometimes by dropping the irrelevant text they were part of. Signed-off-by: Paul Bolle Signed-off-by: Jiri Kosina --- Documentation/networking/scaling.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/scaling.txt b/Documentation/networking/scaling.txt index 58fd741..729985e 100644 --- a/Documentation/networking/scaling.txt +++ b/Documentation/networking/scaling.txt @@ -73,7 +73,7 @@ of queues to IRQs can be determined from /proc/interrupts. By default, an IRQ may be handled on any CPU. Because a non-negligible part of packet processing takes place in receive interrupt handling, it is advantageous to spread receive interrupts between CPUs. To manually adjust the IRQ -affinity of each interrupt see Documentation/IRQ-affinity. Some systems +affinity of each interrupt see Documentation/IRQ-affinity.txt. Some systems will be running irqbalance, a daemon that dynamically optimizes IRQ assignments and as a result may override any manual settings. -- cgit v1.1 From e451e61b56b0401442f7a306cd309c3b0c56c285 Mon Sep 17 00:00:00 2001 From: Jason Wang Date: Tue, 27 Sep 2011 13:26:27 -0400 Subject: net: fix a typo in Documentation/networking/scaling.txt Signed-off-by: Jason Wang Signed-off-by: David S. Miller --- Documentation/networking/scaling.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/scaling.txt b/Documentation/networking/scaling.txt index 58fd741..8ce7c30 100644 --- a/Documentation/networking/scaling.txt +++ b/Documentation/networking/scaling.txt @@ -243,7 +243,7 @@ configured. The number of entries in the global flow table is set through: The number of entries in the per-queue flow table are set through: - /sys/class/net//queues/tx-/rps_flow_cnt + /sys/class/net//queues/rx-/rps_flow_cnt == Suggested Configuration -- cgit v1.1 From 605b91c8f619047ed6d88e24c531d4bdddad46f4 Mon Sep 17 00:00:00 2001 From: "Roy.Li" Date: Wed, 28 Sep 2011 19:51:54 +0000 Subject: net: Documentation: Fix type of variables Signed-off-by: Roy.Li Signed-off-by: David S. Miller --- Documentation/networking/ip-sysctl.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 8154699..ca5cdcd 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -1042,7 +1042,7 @@ conf/interface/*: The functional behaviour for certain settings is different depending on whether local forwarding is enabled or not. -accept_ra - BOOLEAN +accept_ra - INTEGER Accept Router Advertisements; autoconfigure using them. Possible values are: @@ -1106,7 +1106,7 @@ dad_transmits - INTEGER The amount of Duplicate Address Detection probes to send. Default: 1 -forwarding - BOOLEAN +forwarding - INTEGER Configure interface-specific Host/Router behaviour. Note: It is recommended to have the same setting on all -- cgit v1.1 From 186c6bbced722cfeff041d2a1264c95f5d042050 Mon Sep 17 00:00:00 2001 From: Benjamin Poirier Date: Tue, 4 Oct 2011 04:00:30 +0000 Subject: net: fix typos in Documentation/networking/scaling.txt The second hunk fixes rps_sock_flow_table but has to re-wrap the paragraph. Signed-off-by: Benjamin Poirier Signed-off-by: David S. Miller --- Documentation/networking/scaling.txt | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/scaling.txt b/Documentation/networking/scaling.txt index 8ce7c30..fe67b5c 100644 --- a/Documentation/networking/scaling.txt +++ b/Documentation/networking/scaling.txt @@ -27,7 +27,7 @@ applying a filter to each packet that assigns it to one of a small number of logical flows. Packets for each flow are steered to a separate receive queue, which in turn can be processed by separate CPUs. This mechanism is generally known as “Receive-side Scaling” (RSS). The goal of RSS and -the other scaling techniques to increase performance uniformly. +the other scaling techniques is to increase performance uniformly. Multi-queue distribution can also be used for traffic prioritization, but that is not the focus of these techniques. @@ -186,10 +186,10 @@ are steered using plain RPS. Multiple table entries may point to the same CPU. Indeed, with many flows and few CPUs, it is very likely that a single application thread handles flows with many different flow hashes. -rps_sock_table is a global flow table that contains the *desired* CPU for -flows: the CPU that is currently processing the flow in userspace. Each -table value is a CPU index that is updated during calls to recvmsg and -sendmsg (specifically, inet_recvmsg(), inet_sendmsg(), inet_sendpage() +rps_sock_flow_table is a global flow table that contains the *desired* CPU +for flows: the CPU that is currently processing the flow in userspace. +Each table value is a CPU index that is updated during calls to recvmsg +and sendmsg (specifically, inet_recvmsg(), inet_sendmsg(), inet_sendpage() and tcp_splice_read()). When the scheduler moves a thread to a new CPU while it has outstanding -- cgit v1.1 From d9cd48f95c5ba9e5f1d5287ed74630607471031c Mon Sep 17 00:00:00 2001 From: Helmut Schaa Date: Fri, 7 Oct 2011 11:53:41 +0200 Subject: mac80211: Update injection documentation Add documentation about NOACK tx flag usage. Signed-off-by: Helmut Schaa Signed-off-by: John W. Linville --- Documentation/networking/mac80211-injection.txt | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'Documentation/networking') diff --git a/Documentation/networking/mac80211-injection.txt b/Documentation/networking/mac80211-injection.txt index b30e81a..3a93007 100644 --- a/Documentation/networking/mac80211-injection.txt +++ b/Documentation/networking/mac80211-injection.txt @@ -23,6 +23,10 @@ radiotap headers and used to control injection: IEEE80211_RADIOTAP_F_FRAG: frame will be fragmented if longer than the current fragmentation threshold. + * IEEE80211_RADIOTAP_TX_FLAGS + + IEEE80211_RADIOTAP_F_TX_NOACK: frame should be sent without waiting for + an ACK even if it is a unicast frame The injection code can also skip all other currently defined radiotap fields facilitating replay of captured radiotap headers directly. -- cgit v1.1 From 51e3137b9b6113c7e12cf0a0dc82238854a86712 Mon Sep 17 00:00:00 2001 From: Giuseppe CAVALLARO Date: Tue, 18 Oct 2011 00:01:20 +0000 Subject: stmmac: update the driver version and doc (V4) Signed-off-by: Giuseppe Cavallaro Signed-off-by: David S. Miller --- Documentation/networking/stmmac.txt | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/stmmac.txt b/Documentation/networking/stmmac.txt index 40ec92c..8d67980 100644 --- a/Documentation/networking/stmmac.txt +++ b/Documentation/networking/stmmac.txt @@ -76,7 +76,16 @@ core. 4.5) DMA descriptors Driver handles both normal and enhanced descriptors. The latter has been only -tested on DWC Ether MAC 10/100/1000 Universal version 3.41a. +tested on DWC Ether MAC 10/100/1000 Universal version 3.41a and later. + +STMMAC supports DMA descriptor to operate both in dual buffer (RING) +and linked-list(CHAINED) mode. In RING each descriptor points to two +data buffer pointers whereas in CHAINED mode they point to only one data +buffer pointer. RING mode is the default. + +In CHAINED mode each descriptor will have pointer to next descriptor in +the list, hence creating the explicit chaining in the descriptor itself, +whereas such explicit chaining is not possible in RING mode. 4.6) Ethtool support Ethtool is supported. Driver statistics and internal errors can be taken using: -- cgit v1.1 From 445b62dfd934394807c12aee06d3f20cf70c93b4 Mon Sep 17 00:00:00 2001 From: Sritej Velaga Date: Fri, 28 Oct 2011 12:57:14 +0000 Subject: qlcnic: Updated License file Updated qlcnic's license file. Signed-off-by: Sritej Velaga Signed-off-by: Anirban Chakraborty Signed-off-by: David S. Miller --- Documentation/networking/LICENSE.qlcnic | 51 ++++----------------------------- 1 file changed, 6 insertions(+), 45 deletions(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/LICENSE.qlcnic b/Documentation/networking/LICENSE.qlcnic index 29ad4b1..e7fb2c6 100644 --- a/Documentation/networking/LICENSE.qlcnic +++ b/Documentation/networking/LICENSE.qlcnic @@ -1,61 +1,22 @@ -Copyright (c) 2009-2010 QLogic Corporation +Copyright (c) 2009-2011 QLogic Corporation QLogic Linux qlcnic NIC Driver -This program includes a device driver for Linux 2.6 that may be -distributed with QLogic hardware specific firmware binary file. You may modify and redistribute the device driver code under the GNU General Public License (a copy of which is attached hereto as Exhibit A) published by the Free Software Foundation (version 2). -You may redistribute the hardware specific firmware binary file -under the following terms: - - 1. Redistribution of source code (only if applicable), - must retain the above copyright notice, this list of - conditions and the following disclaimer. - - 2. Redistribution in binary form must reproduce the above - copyright notice, this list of conditions and the - following disclaimer in the documentation and/or other - materials provided with the distribution. - - 3. The name of QLogic Corporation may not be used to - endorse or promote products derived from this software - without specific prior written permission - -REGARDLESS OF WHAT LICENSING MECHANISM IS USED OR APPLICABLE, -THIS PROGRAM IS PROVIDED BY QLOGIC CORPORATION "AS IS'' AND ANY -EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A -PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR -BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, -EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED -TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON -ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, -OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY -OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. - -USER ACKNOWLEDGES AND AGREES THAT USE OF THIS PROGRAM WILL NOT -CREATE OR GIVE GROUNDS FOR A LICENSE BY IMPLICATION, ESTOPPEL, OR -OTHERWISE IN ANY INTELLECTUAL PROPERTY RIGHTS (PATENT, COPYRIGHT, -TRADE SECRET, MASK WORK, OR OTHER PROPRIETARY RIGHT) EMBODIED IN -ANY OTHER QLOGIC HARDWARE OR SOFTWARE EITHER SOLELY OR IN -COMBINATION WITH THIS PROGRAM. - EXHIBIT A - GNU GENERAL PUBLIC LICENSE - Version 2, June 1991 + GNU GENERAL PUBLIC LICENSE + Version 2, June 1991 Copyright (C) 1989, 1991 Free Software Foundation, Inc. 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. - Preamble + Preamble The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public @@ -105,7 +66,7 @@ patent must be licensed for everyone's free use or not licensed at all. The precise terms and conditions for copying, distribution and modification follow. - GNU GENERAL PUBLIC LICENSE + GNU GENERAL PUBLIC LICENSE TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 0. This License applies to any program or other work which contains @@ -304,7 +265,7 @@ make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. - NO WARRANTY + NO WARRANTY 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN -- cgit v1.1 From 7e777dd43d55a78c41c3498afaf3ef7edf157120 Mon Sep 17 00:00:00 2001 From: Simon Horman Date: Fri, 9 Sep 2011 17:07:43 +0900 Subject: ipvs: Add documentation for new sysctl entries Add missing documentation for conntrack, snat_reroute and sync_version. Also fix up a typo, IPVS_DEBUG should be IP_VS_DEBUG. Acked-by: Julian Anastasov Acked-by Hans Schillstrom Signed-off-by: Simon Horman Signed-off-by: Pablo Neira Ayuso --- Documentation/networking/ipvs-sysctl.txt | 52 +++++++++++++++++++++++++++++++- 1 file changed, 51 insertions(+), 1 deletion(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/ipvs-sysctl.txt b/Documentation/networking/ipvs-sysctl.txt index 4ccdbca..1dcdd49 100644 --- a/Documentation/networking/ipvs-sysctl.txt +++ b/Documentation/networking/ipvs-sysctl.txt @@ -15,6 +15,23 @@ amemthresh - INTEGER enabled and the variable is automatically set to 2, otherwise the strategy is disabled and the variable is set to 1. +conntrack - BOOLEAN + 0 - disabled (default) + not 0 - enabled + + If set, maintain connection tracking entries for + connections handled by IPVS. + + This should be enabled if connections handled by IPVS are to be + also handled by stateful firewall rules. That is, iptables rules + that make use of connection tracking. It is a performance + optimisation to disable this setting otherwise. + + Connections handled by the IPVS FTP application module + will have connection tracking entries regardless of this setting. + + Only available when IPVS is compiled with the CONFIG_IP_VS_NFCT + cache_bypass - BOOLEAN 0 - disabled (default) not 0 - enabled @@ -39,7 +56,7 @@ debug_level - INTEGER 11 - IPVS packet handling (ip_vs_in/ip_vs_out) 12 or more - packet traversal - Only available when IPVS is compiled with the CONFIG_IPVS_DEBUG + Only available when IPVS is compiled with the CONFIG_IP_VS_DEBUG Higher debugging levels include the messages for lower debugging levels, so setting debug level 2, includes level 0, 1 and 2 @@ -141,3 +158,36 @@ sync_threshold - INTEGER synchronized, every time the number of its incoming packets modulus 50 equals the threshold. The range of the threshold is from 0 to 49. + +snat_reroute - BOOLEAN + 0 - disabled + not 0 - enabled (default) + + If enabled, recalculate the route of SNATed packets from + realservers so that they are routed as if they originate from the + director. Otherwise they are routed as if they are forwarded by the + director. + + If policy routing is in effect then it is possible that the route + of a packet originating from a director is routed differently to a + packet being forwarded by the director. + + If policy routing is not in effect then the recalculated route will + always be the same as the original route so it is an optimisation + to disable snat_reroute and avoid the recalculation. + +sync_version - INTEGER + default 1 + + The version of the synchronisation protocol used when sending + synchronisation messages. + + 0 selects the original synchronisation protocol (version 0). This + should be used when sending synchronisation messages to a legacy + system that only understands the original synchronisation protocol. + + 1 selects the current synchronisation protocol (version 1). This + should be used where possible. + + Kernels with this sync_version entry are able to receive messages + of both version 1 and version 2 of the synchronisation protocol. -- cgit v1.1 From 325aadc8483e4fc3bbd4acfa7e471e3a032bc941 Mon Sep 17 00:00:00 2001 From: Simon Horman Date: Thu, 29 Sep 2011 16:14:51 +0900 Subject: ipvs: secure_tcp does provide alternate state timeouts Also reword the test to make it read more easily (to me) Signed-off-by: Simon Horman Signed-off-by: Pablo Neira Ayuso --- Documentation/networking/ipvs-sysctl.txt | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/ipvs-sysctl.txt b/Documentation/networking/ipvs-sysctl.txt index 1dcdd49..13610e3 100644 --- a/Documentation/networking/ipvs-sysctl.txt +++ b/Documentation/networking/ipvs-sysctl.txt @@ -140,13 +140,11 @@ nat_icmp_send - BOOLEAN secure_tcp - INTEGER 0 - disabled (default) - The secure_tcp defense is to use a more complicated state - transition table and some possible short timeouts of each - state. In the VS/NAT, it delays the entering the ESTABLISHED - until the real server starts to send data and ACK packet - (after 3-way handshake). + The secure_tcp defense is to use a more complicated TCP state + transition table. For VS/NAT, it also delays entering the + TCP ESTABLISHED state until the three way handshake is completed. - The value definition is the same as that of drop_entry or + The value definition is the same as that of drop_entry and drop_packet. sync_threshold - INTEGER -- cgit v1.1 From 40cb1f9bc52186a1a9ef56f0d976482863516ce1 Mon Sep 17 00:00:00 2001 From: Simon Horman Date: Thu, 29 Sep 2011 16:27:37 +0900 Subject: ipvs: Enhance grammar used to refer to Kconfig options Reported-by: Randy Dunlap Signed-off-by: Simon Horman Signed-off-by: Pablo Neira Ayuso --- Documentation/networking/ipvs-sysctl.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/ipvs-sysctl.txt b/Documentation/networking/ipvs-sysctl.txt index 13610e3..f2a2488 100644 --- a/Documentation/networking/ipvs-sysctl.txt +++ b/Documentation/networking/ipvs-sysctl.txt @@ -30,7 +30,7 @@ conntrack - BOOLEAN Connections handled by the IPVS FTP application module will have connection tracking entries regardless of this setting. - Only available when IPVS is compiled with the CONFIG_IP_VS_NFCT + Only available when IPVS is compiled with CONFIG_IP_VS_NFCT enabled. cache_bypass - BOOLEAN 0 - disabled (default) @@ -56,7 +56,7 @@ debug_level - INTEGER 11 - IPVS packet handling (ip_vs_in/ip_vs_out) 12 or more - packet traversal - Only available when IPVS is compiled with the CONFIG_IP_VS_DEBUG + Only available when IPVS is compiled with CONFIG_IP_VS_DEBUG enabled. Higher debugging levels include the messages for lower debugging levels, so setting debug level 2, includes level 0, 1 and 2 -- cgit v1.1 From 20db93c34095553a01a9c31136658917bf1fa5d5 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Tue, 8 Nov 2011 14:21:44 -0500 Subject: net: min_pmtu default is 552 Small fix in Documentation, since min_pmtu is 512 + 20 + 20 = 552 Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- Documentation/networking/ip-sysctl.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index cb7f314..f049a1c 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -20,7 +20,7 @@ ip_no_pmtu_disc - BOOLEAN default FALSE min_pmtu - INTEGER - default 562 - minimum discovered Path MTU + default 552 - minimum discovered Path MTU route/max_size - INTEGER Maximum number of routes allowed in the kernel. Increase -- cgit v1.1 From 99b53bdd810611cc178e1a86bc112d8f4f56a1e9 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Peter=20Pan=28=E6=BD=98=E5=8D=AB=E5=B9=B3=29?= Date: Mon, 5 Dec 2011 21:39:41 +0000 Subject: ipv4:correct description for tcp_max_syn_backlog Since commit c5ed63d66f24(tcp: fix three tcp sysctls tuning), sysctl_max_syn_backlog is determined by tcp_hashinfo->ehash_mask, and the minimal value is 128, and it will increase in proportion to the memory of machine. The original description for tcp_max_syn_backlog and sysctl_max_syn_backlog are out of date. Changelog: V2: update description for sysctl_max_syn_backlog Signed-off-by: Weiping Pan Reviewed-by: Shan Wei Acked-by: Neil Horman Signed-off-by: David S. Miller --- Documentation/networking/ip-sysctl.txt | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index f049a1c..589f2da 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -282,11 +282,11 @@ tcp_max_ssthresh - INTEGER Default: 0 (off) tcp_max_syn_backlog - INTEGER - Maximal number of remembered connection requests, which are - still did not receive an acknowledgment from connecting client. - Default value is 1024 for systems with more than 128Mb of memory, - and 128 for low memory machines. If server suffers of overload, - try to increase this number. + Maximal number of remembered connection requests, which have not + received an acknowledgment from connecting client. + The minimal value is 128 for low memory machines, and it will + increase in proportion to the memory of machine. + If server suffers from overload, try increasing this number. tcp_max_tw_buckets - INTEGER Maximal number of timewait sockets held by system simultaneously. -- cgit v1.1 From 4b9b05fd95c502521eaef111ba0f83c58b391587 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 2 May 2012 02:28:41 +0000 Subject: tcp: change tcp_adv_win_scale and tcp_rmem[2] [ Upstream commit b49960a05e32121d29316cfdf653894b88ac9190 ] tcp_adv_win_scale default value is 2, meaning we expect a good citizen skb to have skb->len / skb->truesize ratio of 75% (3/4) In 2.6 kernels we (mis)accounted for typical MSS=1460 frame : 1536 + 64 + 256 = 1856 'estimated truesize', and 1856 * 3/4 = 1392. So these skbs were considered as not bloated. With recent truesize fixes, a typical MSS=1460 frame truesize is now the more precise : 2048 + 256 = 2304. But 2304 * 3/4 = 1728. So these skb are not good citizen anymore, because 1460 < 1728 (GRO can escape this problem because it build skbs with a too low truesize.) This also means tcp advertises a too optimistic window for a given allocated rcvspace : When receiving frames, sk_rmem_alloc can hit sk_rcvbuf limit and we call tcp_prune_queue()/tcp_collapse() too often, especially when application is slow to drain its receive queue or in case of losses (netperf is fast, scp is slow). This is a major latency source. We should adjust the len/truesize ratio to 50% instead of 75% This patch : 1) changes tcp_adv_win_scale default to 1 instead of 2 2) increase tcp_rmem[2] limit from 4MB to 6MB to take into account better truesize tracking and to allow autotuning tcp receive window to reach same value than before. Note that same amount of kernel memory is consumed compared to 2.6 kernels. Signed-off-by: Eric Dumazet Cc: Neal Cardwell Cc: Tom Herbert Cc: Yuchung Cheng Acked-by: Neal Cardwell Signed-off-by: David S. Miller Signed-off-by: Ben Hutchings --- Documentation/networking/ip-sysctl.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 589f2da..a4399f5 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -137,7 +137,7 @@ tcp_adv_win_scale - INTEGER (if tcp_adv_win_scale > 0) or bytes-bytes/2^(-tcp_adv_win_scale), if it is <= 0. Possible values are [-31, 31], inclusive. - Default: 2 + Default: 1 tcp_allowed_congestion_control - STRING Show/set the congestion control choices available to non-privileged @@ -397,7 +397,7 @@ tcp_rmem - vector of 3 INTEGERs: min, default, max net.core.rmem_max. Calling setsockopt() with SO_RCVBUF disables automatic tuning of that socket's receive buffer size, in which case this value is ignored. - Default: between 87380B and 4MB, depending on RAM size. + Default: between 87380B and 6MB, depending on RAM size. tcp_sack - BOOLEAN Enable select acknowledgments (SACKS). -- cgit v1.1 From 61f69dc4e40e41b0018f00fa4aeb23d3239556fb Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Tue, 17 Jul 2012 10:13:05 +0200 Subject: tcp: implement RFC 5961 3.2 [ Upstream commit 282f23c6ee343126156dd41218b22ece96d747e3 ] Implement the RFC 5691 mitigation against Blind Reset attack using RST bit. Idea is to validate incoming RST sequence, to match RCV.NXT value, instead of previouly accepted window : (RCV.NXT <= SEG.SEQ < RCV.NXT+RCV.WND) If sequence is in window but not an exact match, send a "challenge ACK", so that the other part can resend an RST with the appropriate sequence. Add a new sysctl, tcp_challenge_ack_limit, to limit number of challenge ACK sent per second. Add a new SNMP counter to count number of challenge acks sent. (netstat -s | grep TCPChallengeACK) Signed-off-by: Eric Dumazet Cc: Kiran Kumar Kella Signed-off-by: David S. Miller Signed-off-by: Ben Hutchings --- Documentation/networking/ip-sysctl.txt | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'Documentation/networking') diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index a4399f5..3b979c6 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -524,6 +524,11 @@ tcp_thin_dupack - BOOLEAN Documentation/networking/tcp-thin.txt Default: 0 +tcp_challenge_ack_limit - INTEGER + Limits number of Challenge ACK sent per second, as recommended + in RFC 5961 (Improving TCP's Robustness to Blind In-Window Attacks) + Default: 100 + UDP variables: udp_mem - vector of 3 INTEGERs: min, pressure, max -- cgit v1.1 From 74f9ee201a77eaad7192e24d1a37d3ea018dc45c Mon Sep 17 00:00:00 2001 From: Sowmini Varadhan Date: Wed, 8 Apr 2015 12:33:45 -0400 Subject: RDS: Documentation: Document AF_RDS, PF_RDS and SOL_RDS correctly. commit ebe96e641dee2cbd135ee802ae7e40c361640088 upstream. AF_RDS, PF_RDS and SOL_RDS are available in header files, and there is no need to get their values from /proc. Document this correctly. Fixes: 0c5f9b8830aa ("RDS: Documentation") Signed-off-by: Sowmini Varadhan Signed-off-by: David S. Miller Signed-off-by: Ben Hutchings --- Documentation/networking/rds.txt | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) (limited to 'Documentation/networking') diff --git a/Documentation/networking/rds.txt b/Documentation/networking/rds.txt index c67077c..e1a3d59 100644 --- a/Documentation/networking/rds.txt +++ b/Documentation/networking/rds.txt @@ -62,11 +62,10 @@ Socket Interface ================ AF_RDS, PF_RDS, SOL_RDS - These constants haven't been assigned yet, because RDS isn't in - mainline yet. Currently, the kernel module assigns some constant - and publishes it to user space through two sysctl files - /proc/sys/net/rds/pf_rds - /proc/sys/net/rds/sol_rds + AF_RDS and PF_RDS are the domain type to be used with socket(2) + to create RDS sockets. SOL_RDS is the socket-level to be used + with setsockopt(2) and getsockopt(2) for RDS specific socket + options. fd = socket(PF_RDS, SOCK_SEQPACKET, 0); This creates a new, unbound RDS socket. -- cgit v1.1