aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/networking
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/networking')
-rw-r--r--Documentation/networking/00-INDEX116
-rw-r--r--Documentation/networking/LICENSE.qlcnic51
-rw-r--r--Documentation/networking/batman-adv.txt8
-rw-r--r--Documentation/networking/bonding.txt31
-rw-r--r--Documentation/networking/dmfe.txt3
-rw-r--r--Documentation/networking/ifenslave.c18
-rw-r--r--Documentation/networking/ip-sysctl.txt64
-rw-r--r--Documentation/networking/ipvs-sysctl.txt62
-rw-r--r--Documentation/networking/mac80211-injection.txt4
-rw-r--r--Documentation/networking/netdev-features.txt154
-rw-r--r--Documentation/networking/netdevices.txt4
-rw-r--r--Documentation/networking/nfc.txt128
-rw-r--r--Documentation/networking/rds.txt9
-rw-r--r--Documentation/networking/scaling.txt378
-rw-r--r--Documentation/networking/stmmac.txt242
15 files changed, 1099 insertions, 173 deletions
diff --git a/Documentation/networking/00-INDEX b/Documentation/networking/00-INDEX
index 4edd78d..bbce121 100644
--- a/Documentation/networking/00-INDEX
+++ b/Documentation/networking/00-INDEX
@@ -1,13 +1,21 @@
00-INDEX
- this file
+3c359.txt
+ - information on the 3Com TokenLink Velocity XL (3c5359) driver.
3c505.txt
- information on the 3Com EtherLink Plus (3c505) driver.
+3c509.txt
+ - information on the 3Com Etherlink III Series Ethernet cards.
6pack.txt
- info on the 6pack protocol, an alternative to KISS for AX.25
DLINK.txt
- info on the D-Link DE-600/DE-620 parallel port pocket adapters
PLIP.txt
- PLIP: The Parallel Line Internet Protocol device driver
+README.ipw2100
+ - README for the Intel PRO/Wireless 2100 driver.
+README.ipw2200
+ - README for the Intel PRO/Wireless 2915ABG and 2200BG driver.
README.sb1000
- info on General Instrument/NextLevel SURFboard1000 cable modem.
alias.txt
@@ -20,8 +28,12 @@ atm.txt
- info on where to get ATM programs and support for Linux.
ax25.txt
- info on using AX.25 and NET/ROM code for Linux
+batman-adv.txt
+ - B.A.T.M.A.N routing protocol on top of layer 2 Ethernet Frames.
baycom.txt
- info on the driver for Baycom style amateur radio modems
+bonding.txt
+ - Linux Ethernet Bonding Driver HOWTO: link aggregation in Linux.
bridge.txt
- where to get user space programs for ethernet bridging with Linux.
can.txt
@@ -34,32 +46,60 @@ cxacru.txt
- Conexant AccessRunner USB ADSL Modem
cxacru-cf.py
- Conexant AccessRunner USB ADSL Modem configuration file parser
+cxgb.txt
+ - Release Notes for the Chelsio N210 Linux device driver.
+dccp.txt
+ - the Datagram Congestion Control Protocol (DCCP) (RFC 4340..42).
de4x5.txt
- the Digital EtherWORKS DE4?? and DE5?? PCI Ethernet driver
decnet.txt
- info on using the DECnet networking layer in Linux.
depca.txt
- the Digital DEPCA/EtherWORKS DE1?? and DE2?? LANCE Ethernet driver
+dl2k.txt
+ - README for D-Link DL2000-based Gigabit Ethernet Adapters (dl2k.ko).
+dm9000.txt
+ - README for the Simtec DM9000 Network driver.
dmfe.txt
- info on the Davicom DM9102(A)/DM9132/DM9801 fast ethernet driver.
+dns_resolver.txt
+ - The DNS resolver module allows kernel servies to make DNS queries.
+driver.txt
+ - Softnet driver issues.
e100.txt
- info on Intel's EtherExpress PRO/100 line of 10/100 boards
e1000.txt
- info on Intel's E1000 line of gigabit ethernet boards
+e1000e.txt
+ - README for the Intel Gigabit Ethernet Driver (e1000e).
eql.txt
- serial IP load balancing
ewrk3.txt
- the Digital EtherWORKS 3 DE203/4/5 Ethernet driver
+fib_trie.txt
+ - Level Compressed Trie (LC-trie) notes: a structure for routing.
filter.txt
- Linux Socket Filtering
fore200e.txt
- FORE Systems PCA-200E/SBA-200E ATM NIC driver info.
framerelay.txt
- info on using Frame Relay/Data Link Connection Identifier (DLCI).
+gen_stats.txt
+ - Generic networking statistics for netlink users.
+generic_hdlc.txt
+ - The generic High Level Data Link Control (HDLC) layer.
generic_netlink.txt
- info on Generic Netlink
+gianfar.txt
+ - Gianfar Ethernet Driver.
ieee802154.txt
- Linux IEEE 802.15.4 implementation, API and drivers
+ifenslave.c
+ - Configure network interfaces for parallel routing (bonding).
+igb.txt
+ - README for the Intel Gigabit Ethernet Driver (igb).
+igbvf.txt
+ - README for the Intel Gigabit Ethernet Driver (igbvf).
ip-sysctl.txt
- /proc/sys/net/ipv4/* variables
ip_dynaddr.txt
@@ -68,41 +108,117 @@ ipddp.txt
- AppleTalk-IP Decapsulation and AppleTalk-IP Encapsulation
iphase.txt
- Interphase PCI ATM (i)Chip IA Linux driver info.
+ipv6.txt
+ - Options to the ipv6 kernel module.
+ipvs-sysctl.txt
+ - Per-inode explanation of the /proc/sys/net/ipv4/vs interface.
irda.txt
- where to get IrDA (infrared) utilities and info for Linux.
+ixgb.txt
+ - README for the Intel 10 Gigabit Ethernet Driver (ixgb).
+ixgbe.txt
+ - README for the Intel 10 Gigabit Ethernet Driver (ixgbe).
+ixgbevf.txt
+ - README for the Intel Virtual Function (VF) Driver (ixgbevf).
+l2tp.txt
+ - User guide to the L2TP tunnel protocol.
lapb-module.txt
- programming information of the LAPB module.
ltpc.txt
- the Apple or Farallon LocalTalk PC card driver
+mac80211-injection.txt
+ - HOWTO use packet injection with mac80211
multicast.txt
- Behaviour of cards under Multicast
+multiqueue.txt
+ - HOWTO for multiqueue network device support.
+netconsole.txt
+ - The network console module netconsole.ko: configuration and notes.
+netdev-features.txt
+ - Network interface features API description.
netdevices.txt
- info on network device driver functions exported to the kernel.
+netif-msg.txt
+ - Design of the network interface message level setting (NETIF_MSG_*).
+nfc.txt
+ - The Linux Near Field Communication (NFS) subsystem.
olympic.txt
- IBM PCI Pit/Pit-Phy/Olympic Token Ring driver info.
+operstates.txt
+ - Overview of network interface operational states.
+packet_mmap.txt
+ - User guide to memory mapped packet socket rings (PACKET_[RT]X_RING).
+phonet.txt
+ - The Phonet packet protocol used in Nokia cellular modems.
+phy.txt
+ - The PHY abstraction layer.
+pktgen.txt
+ - User guide to the kernel packet generator (pktgen.ko).
policy-routing.txt
- IP policy-based routing
+ppp_generic.txt
+ - Information about the generic PPP driver.
+proc_net_tcp.txt
+ - Per inode overview of the /proc/net/tcp and /proc/net/tcp6 interfaces.
+radiotap-headers.txt
+ - Background on radiotap headers.
ray_cs.txt
- Raylink Wireless LAN card driver info.
+rds.txt
+ - Background on the reliable, ordered datagram delivery method RDS.
+regulatory.txt
+ - Overview of the Linux wireless regulatory infrastructure.
+rxrpc.txt
+ - Guide to the RxRPC protocol.
+s2io.txt
+ - Release notes for Neterion Xframe I/II 10GbE driver.
+scaling.txt
+ - Explanation of network scaling techniques: RSS, RPS, RFS, aRFS, XPS.
+sctp.txt
+ - Notes on the Linux kernel implementation of the SCTP protocol.
+secid.txt
+ - Explanation of the secid member in flow structures.
skfp.txt
- SysKonnect FDDI (SK-5xxx, Compaq Netelligent) driver info.
smc9.txt
- the driver for SMC's 9000 series of Ethernet cards
smctr.txt
- SMC TokenCard TokenRing Linux driver info.
+spider-net.txt
+ - README for the Spidernet Driver (as found in PS3 / Cell BE).
+stmmac.txt
+ - README for the STMicro Synopsys Ethernet driver.
+tc-actions-env-rules.txt
+ - rules for traffic control (tc) actions.
+timestamping.txt
+ - overview of network packet timestamping variants.
tcp.txt
- short blurb on how TCP output takes place.
+tcp-thin.txt
+ - kernel tuning options for low rate 'thin' TCP streams.
tlan.txt
- ThunderLAN (Compaq Netelligent 10/100, Olicom OC-2xxx) driver info.
tms380tr.txt
- SysKonnect Token Ring ISA/PCI adapter driver info.
+tproxy.txt
+ - Transparent proxy support user guide.
tuntap.txt
- TUN/TAP device driver, allowing user space Rx/Tx of packets.
+udplite.txt
+ - UDP-Lite protocol (RFC 3828) introduction.
vortex.txt
- info on using 3Com Vortex (3c590, 3c592, 3c595, 3c597) Ethernet cards.
+vxge.txt
+ - README for the Neterion X3100 PCIe Server Adapter.
x25.txt
- general info on X.25 development.
x25-iface.txt
- description of the X.25 Packet Layer to LAPB device interface.
+xfrm_proc.txt
+ - description of the statistics package for XFRM.
+xfrm_sync.txt
+ - sync patches for XFRM enable migration of an SA between hosts.
+xfrm_sysctl.txt
+ - description of the XFRM configuration options.
z8530drv.txt
- info about Linux driver for Z8530 based HDLC cards for AX.25
diff --git a/Documentation/networking/LICENSE.qlcnic b/Documentation/networking/LICENSE.qlcnic
index 29ad4b1..e7fb2c6 100644
--- a/Documentation/networking/LICENSE.qlcnic
+++ b/Documentation/networking/LICENSE.qlcnic
@@ -1,61 +1,22 @@
-Copyright (c) 2009-2010 QLogic Corporation
+Copyright (c) 2009-2011 QLogic Corporation
QLogic Linux qlcnic NIC Driver
-This program includes a device driver for Linux 2.6 that may be
-distributed with QLogic hardware specific firmware binary file.
You may modify and redistribute the device driver code under the
GNU General Public License (a copy of which is attached hereto as
Exhibit A) published by the Free Software Foundation (version 2).
-You may redistribute the hardware specific firmware binary file
-under the following terms:
-
- 1. Redistribution of source code (only if applicable),
- must retain the above copyright notice, this list of
- conditions and the following disclaimer.
-
- 2. Redistribution in binary form must reproduce the above
- copyright notice, this list of conditions and the
- following disclaimer in the documentation and/or other
- materials provided with the distribution.
-
- 3. The name of QLogic Corporation may not be used to
- endorse or promote products derived from this software
- without specific prior written permission
-
-REGARDLESS OF WHAT LICENSING MECHANISM IS USED OR APPLICABLE,
-THIS PROGRAM IS PROVIDED BY QLOGIC CORPORATION "AS IS'' AND ANY
-EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
-PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR
-BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
-EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
-TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
-DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
-ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
-OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
-OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
-POSSIBILITY OF SUCH DAMAGE.
-
-USER ACKNOWLEDGES AND AGREES THAT USE OF THIS PROGRAM WILL NOT
-CREATE OR GIVE GROUNDS FOR A LICENSE BY IMPLICATION, ESTOPPEL, OR
-OTHERWISE IN ANY INTELLECTUAL PROPERTY RIGHTS (PATENT, COPYRIGHT,
-TRADE SECRET, MASK WORK, OR OTHER PROPRIETARY RIGHT) EMBODIED IN
-ANY OTHER QLOGIC HARDWARE OR SOFTWARE EITHER SOLELY OR IN
-COMBINATION WITH THIS PROGRAM.
-
EXHIBIT A
- GNU GENERAL PUBLIC LICENSE
- Version 2, June 1991
+ GNU GENERAL PUBLIC LICENSE
+ Version 2, June 1991
Copyright (C) 1989, 1991 Free Software Foundation, Inc.
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
- Preamble
+ Preamble
The licenses for most software are designed to take away your
freedom to share and change it. By contrast, the GNU General Public
@@ -105,7 +66,7 @@ patent must be licensed for everyone's free use or not licensed at all.
The precise terms and conditions for copying, distribution and
modification follow.
- GNU GENERAL PUBLIC LICENSE
+ GNU GENERAL PUBLIC LICENSE
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
0. This License applies to any program or other work which contains
@@ -304,7 +265,7 @@ make exceptions for this. Our decision will be guided by the two goals
of preserving the free status of all derivatives of our free software and
of promoting the sharing and reuse of software generally.
- NO WARRANTY
+ NO WARRANTY
11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
diff --git a/Documentation/networking/batman-adv.txt b/Documentation/networking/batman-adv.txt
index 88d4afb..c86d03f 100644
--- a/Documentation/networking/batman-adv.txt
+++ b/Documentation/networking/batman-adv.txt
@@ -1,4 +1,4 @@
-[state: 17-04-2011]
+[state: 21-08-2011]
BATMAN-ADV
----------
@@ -68,9 +68,9 @@ All mesh wide settings can be found in batman's own interface
folder:
# ls /sys/class/net/bat0/mesh/
-# aggregated_ogms gw_bandwidth hop_penalty
-# bonding gw_mode orig_interval
-# fragmentation gw_sel_class vis_mode
+# aggregated_ogms fragmentation gw_sel_class vis_mode
+# ap_isolation gw_bandwidth hop_penalty
+# bonding gw_mode orig_interval
There is a special folder for debugging information:
diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
index 675612f..91df678 100644
--- a/Documentation/networking/bonding.txt
+++ b/Documentation/networking/bonding.txt
@@ -238,6 +238,18 @@ ad_select
This option was added in bonding version 3.4.0.
+all_slaves_active
+
+ Specifies that duplicate frames (received on inactive ports) should be
+ dropped (0) or delivered (1).
+
+ Normally, bonding will drop duplicate frames (received on inactive
+ ports), which is desirable for most users. But there are some times
+ it is nice to allow duplicate frames to be delivered.
+
+ The default value is 0 (drop duplicate frames received on inactive
+ ports).
+
arp_interval
Specifies the ARP link monitoring frequency in milliseconds.
@@ -433,6 +445,23 @@ miimon
determined. See the High Availability section for additional
information. The default value is 0.
+min_links
+
+ Specifies the minimum number of links that must be active before
+ asserting carrier. It is similar to the Cisco EtherChannel min-links
+ feature. This allows setting the minimum number of member ports that
+ must be up (link-up state) before marking the bond device as up
+ (carrier on). This is useful for situations where higher level services
+ such as clustering want to ensure a minimum number of low bandwidth
+ links are active before switchover. This option only affect 802.3ad
+ mode.
+
+ The default value is 0. This will cause carrier to be asserted (for
+ 802.3ad mode) whenever there is an active aggregator, regardless of the
+ number of available links in that aggregator. Note that, because an
+ aggregator cannot be active without at least one available link,
+ setting this option to 0 or to 1 has the exact same effect.
+
mode
Specifies one of the bonding policies. The default is
@@ -599,7 +628,7 @@ num_unsol_na
affect only the active-backup mode. These options were added for
bonding versions 3.3.0 and 3.4.0 respectively.
- From Linux 2.6.40 and bonding version 3.7.1, these notifications
+ From Linux 3.0 and bonding version 3.7.1, these notifications
are generated by the ipv4 and ipv6 code and the numbers of
repetitions cannot be set independently.
diff --git a/Documentation/networking/dmfe.txt b/Documentation/networking/dmfe.txt
index 8006c22..25320bf 100644
--- a/Documentation/networking/dmfe.txt
+++ b/Documentation/networking/dmfe.txt
@@ -1,3 +1,5 @@
+Note: This driver doesn't have a maintainer.
+
Davicom DM9102(A)/DM9132/DM9801 fast ethernet driver for Linux.
This program is free software; you can redistribute it and/or
@@ -55,7 +57,6 @@ Test and make sure PCI latency is now correct for all cases.
Authors:
Sten Wang <sten_wang@davicom.com.tw > : Original Author
-Tobias Ringstrom <tori@unhappy.mine.nu> : Current Maintainer
Contributors:
diff --git a/Documentation/networking/ifenslave.c b/Documentation/networking/ifenslave.c
index 50f1dc4..ac5debb 100644
--- a/Documentation/networking/ifenslave.c
+++ b/Documentation/networking/ifenslave.c
@@ -260,7 +260,7 @@ int main(int argc, char *argv[])
case 'V': opt_V++; exclusive++; break;
case '?':
- fprintf(stderr, usage_msg);
+ fprintf(stderr, "%s", usage_msg);
res = 2;
goto out;
}
@@ -268,13 +268,13 @@ int main(int argc, char *argv[])
/* options check */
if (exclusive > 1) {
- fprintf(stderr, usage_msg);
+ fprintf(stderr, "%s", usage_msg);
res = 2;
goto out;
}
if (opt_v || opt_V) {
- printf(version);
+ printf("%s", version);
if (opt_V) {
res = 0;
goto out;
@@ -282,14 +282,14 @@ int main(int argc, char *argv[])
}
if (opt_u) {
- printf(usage_msg);
+ printf("%s", usage_msg);
res = 0;
goto out;
}
if (opt_h) {
- printf(usage_msg);
- printf(help_msg);
+ printf("%s", usage_msg);
+ printf("%s", help_msg);
res = 0;
goto out;
}
@@ -309,7 +309,7 @@ int main(int argc, char *argv[])
goto out;
} else {
/* Just show usage */
- fprintf(stderr, usage_msg);
+ fprintf(stderr, "%s", usage_msg);
res = 2;
goto out;
}
@@ -320,7 +320,7 @@ int main(int argc, char *argv[])
master_ifname = *spp++;
if (master_ifname == NULL) {
- fprintf(stderr, usage_msg);
+ fprintf(stderr, "%s", usage_msg);
res = 2;
goto out;
}
@@ -339,7 +339,7 @@ int main(int argc, char *argv[])
if (slave_ifname == NULL) {
if (opt_d || opt_c) {
- fprintf(stderr, usage_msg);
+ fprintf(stderr, "%s", usage_msg);
res = 2;
goto out;
}
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 890fce9..3b979c6 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -20,7 +20,7 @@ ip_no_pmtu_disc - BOOLEAN
default FALSE
min_pmtu - INTEGER
- default 562 - minimum discovered Path MTU
+ default 552 - minimum discovered Path MTU
route/max_size - INTEGER
Maximum number of routes allowed in the kernel. Increase
@@ -106,16 +106,6 @@ inet_peer_maxttl - INTEGER
when the number of entries in the pool is very small).
Measured in seconds.
-inet_peer_gc_mintime - INTEGER
- Minimum interval between garbage collection passes. This interval is
- in effect under high memory pressure on the pool.
- Measured in seconds.
-
-inet_peer_gc_maxtime - INTEGER
- Minimum interval between garbage collection passes. This interval is
- in effect under low (or absent) memory pressure on the pool.
- Measured in seconds.
-
TCP variables:
somaxconn - INTEGER
@@ -292,11 +282,11 @@ tcp_max_ssthresh - INTEGER
Default: 0 (off)
tcp_max_syn_backlog - INTEGER
- Maximal number of remembered connection requests, which are
- still did not receive an acknowledgment from connecting client.
- Default value is 1024 for systems with more than 128Mb of memory,
- and 128 for low memory machines. If server suffers of overload,
- try to increase this number.
+ Maximal number of remembered connection requests, which have not
+ received an acknowledgment from connecting client.
+ The minimal value is 128 for low memory machines, and it will
+ increase in proportion to the memory of machine.
+ If server suffers from overload, try increasing this number.
tcp_max_tw_buckets - INTEGER
Maximal number of timewait sockets held by system simultaneously.
@@ -394,7 +384,7 @@ tcp_rmem - vector of 3 INTEGERs: min, default, max
min: Minimal size of receive buffer used by TCP sockets.
It is guaranteed to each TCP socket, even under moderate memory
pressure.
- Default: 8K
+ Default: 1 page
default: initial size of receive buffer used by TCP sockets.
This value overrides net.core.rmem_default used by other protocols.
@@ -483,7 +473,7 @@ tcp_window_scaling - BOOLEAN
tcp_wmem - vector of 3 INTEGERs: min, default, max
min: Amount of memory reserved for send buffers for TCP sockets.
Each TCP socket has rights to use it due to fact of its birth.
- Default: 4K
+ Default: 1 page
default: initial size of send buffer used by TCP sockets. This
value overrides net.core.wmem_default used by other protocols.
@@ -558,13 +548,13 @@ udp_rmem_min - INTEGER
Minimal size of receive buffer used by UDP sockets in moderation.
Each UDP socket is able to use the size for receiving data, even if
total pages of UDP sockets exceed udp_mem pressure. The unit is byte.
- Default: 4096
+ Default: 1 page
udp_wmem_min - INTEGER
Minimal size of send buffer used by UDP sockets in moderation.
Each UDP socket is able to use the size for sending data, even if
total pages of UDP sockets exceed udp_mem pressure. The unit is byte.
- Default: 4096
+ Default: 1 page
CIPSOv4 Variables:
@@ -1007,7 +997,7 @@ bindv6only - BOOLEAN
TRUE: disable IPv4-mapped address feature
FALSE: enable IPv4-mapped address feature
- Default: FALSE (as specified in RFC2553bis)
+ Default: FALSE (as specified in RFC3493)
IPv6 Fragmentation:
@@ -1057,9 +1047,14 @@ conf/interface/*:
The functional behaviour for certain settings is different
depending on whether local forwarding is enabled or not.
-accept_ra - BOOLEAN
+accept_ra - INTEGER
Accept Router Advertisements; autoconfigure using them.
+ It also determines whether or not to transmit Router
+ Solicitations. If and only if the functional setting is to
+ accept Router Advertisements, Router Solicitations will be
+ transmitted.
+
Possible values are:
0 Do not accept Router Advertisements.
1 Accept Router Advertisements if forwarding is disabled.
@@ -1121,7 +1116,7 @@ dad_transmits - INTEGER
The amount of Duplicate Address Detection probes to send.
Default: 1
-forwarding - BOOLEAN
+forwarding - INTEGER
Configure interface-specific Host/Router behaviour.
Note: It is recommended to have the same setting on all
@@ -1130,14 +1125,14 @@ forwarding - BOOLEAN
Possible values are:
0 Forwarding disabled
1 Forwarding enabled
- 2 Forwarding enabled (Hybrid Mode)
FALSE (0):
By default, Host behaviour is assumed. This means:
1. IsRouter flag is not set in Neighbour Advertisements.
- 2. Router Solicitations are being sent when necessary.
+ 2. If accept_ra is TRUE (default), transmit Router
+ Solicitations.
3. If accept_ra is TRUE (default), accept Router
Advertisements (and do autoconfiguration).
4. If accept_redirects is TRUE (default), accept Redirects.
@@ -1148,16 +1143,10 @@ forwarding - BOOLEAN
This means exactly the reverse from the above:
1. IsRouter flag is set in Neighbour Advertisements.
- 2. Router Solicitations are not sent.
+ 2. Router Solicitations are not sent unless accept_ra is 2.
3. Router Advertisements are ignored unless accept_ra is 2.
4. Redirects are ignored.
- TRUE (2):
-
- Hybrid mode. Same behaviour as TRUE, except for:
-
- 2. Router Solicitations are being sent when necessary.
-
Default: 0 (disabled) if global forwarding is disabled (default),
otherwise 1 (enabled).
@@ -1470,10 +1459,17 @@ sctp_mem - vector of 3 INTEGERs: min, pressure, max
Default is calculated at boot time from amount of available memory.
sctp_rmem - vector of 3 INTEGERs: min, default, max
- See tcp_rmem for a description.
+ Only the first value ("min") is used, "default" and "max" are
+ ignored.
+
+ min: Minimal size of receive buffer used by SCTP socket.
+ It is guaranteed to each SCTP socket (but not association) even
+ under moderate memory pressure.
+
+ Default: 1 page
sctp_wmem - vector of 3 INTEGERs: min, default, max
- See tcp_wmem for a description.
+ Currently this tunable has no effect.
addr_scope_policy - INTEGER
Control IPv4 address scoping - draft-stewart-tsvwg-sctp-ipv4-00
diff --git a/Documentation/networking/ipvs-sysctl.txt b/Documentation/networking/ipvs-sysctl.txt
index 4ccdbca..f2a2488 100644
--- a/Documentation/networking/ipvs-sysctl.txt
+++ b/Documentation/networking/ipvs-sysctl.txt
@@ -15,6 +15,23 @@ amemthresh - INTEGER
enabled and the variable is automatically set to 2, otherwise
the strategy is disabled and the variable is set to 1.
+conntrack - BOOLEAN
+ 0 - disabled (default)
+ not 0 - enabled
+
+ If set, maintain connection tracking entries for
+ connections handled by IPVS.
+
+ This should be enabled if connections handled by IPVS are to be
+ also handled by stateful firewall rules. That is, iptables rules
+ that make use of connection tracking. It is a performance
+ optimisation to disable this setting otherwise.
+
+ Connections handled by the IPVS FTP application module
+ will have connection tracking entries regardless of this setting.
+
+ Only available when IPVS is compiled with CONFIG_IP_VS_NFCT enabled.
+
cache_bypass - BOOLEAN
0 - disabled (default)
not 0 - enabled
@@ -39,7 +56,7 @@ debug_level - INTEGER
11 - IPVS packet handling (ip_vs_in/ip_vs_out)
12 or more - packet traversal
- Only available when IPVS is compiled with the CONFIG_IPVS_DEBUG
+ Only available when IPVS is compiled with CONFIG_IP_VS_DEBUG enabled.
Higher debugging levels include the messages for lower debugging
levels, so setting debug level 2, includes level 0, 1 and 2
@@ -123,13 +140,11 @@ nat_icmp_send - BOOLEAN
secure_tcp - INTEGER
0 - disabled (default)
- The secure_tcp defense is to use a more complicated state
- transition table and some possible short timeouts of each
- state. In the VS/NAT, it delays the entering the ESTABLISHED
- until the real server starts to send data and ACK packet
- (after 3-way handshake).
+ The secure_tcp defense is to use a more complicated TCP state
+ transition table. For VS/NAT, it also delays entering the
+ TCP ESTABLISHED state until the three way handshake is completed.
- The value definition is the same as that of drop_entry or
+ The value definition is the same as that of drop_entry and
drop_packet.
sync_threshold - INTEGER
@@ -141,3 +156,36 @@ sync_threshold - INTEGER
synchronized, every time the number of its incoming packets
modulus 50 equals the threshold. The range of the threshold is
from 0 to 49.
+
+snat_reroute - BOOLEAN
+ 0 - disabled
+ not 0 - enabled (default)
+
+ If enabled, recalculate the route of SNATed packets from
+ realservers so that they are routed as if they originate from the
+ director. Otherwise they are routed as if they are forwarded by the
+ director.
+
+ If policy routing is in effect then it is possible that the route
+ of a packet originating from a director is routed differently to a
+ packet being forwarded by the director.
+
+ If policy routing is not in effect then the recalculated route will
+ always be the same as the original route so it is an optimisation
+ to disable snat_reroute and avoid the recalculation.
+
+sync_version - INTEGER
+ default 1
+
+ The version of the synchronisation protocol used when sending
+ synchronisation messages.
+
+ 0 selects the original synchronisation protocol (version 0). This
+ should be used when sending synchronisation messages to a legacy
+ system that only understands the original synchronisation protocol.
+
+ 1 selects the current synchronisation protocol (version 1). This
+ should be used where possible.
+
+ Kernels with this sync_version entry are able to receive messages
+ of both version 1 and version 2 of the synchronisation protocol.
diff --git a/Documentation/networking/mac80211-injection.txt b/Documentation/networking/mac80211-injection.txt
index b30e81a..3a93007 100644
--- a/Documentation/networking/mac80211-injection.txt
+++ b/Documentation/networking/mac80211-injection.txt
@@ -23,6 +23,10 @@ radiotap headers and used to control injection:
IEEE80211_RADIOTAP_F_FRAG: frame will be fragmented if longer than the
current fragmentation threshold.
+ * IEEE80211_RADIOTAP_TX_FLAGS
+
+ IEEE80211_RADIOTAP_F_TX_NOACK: frame should be sent without waiting for
+ an ACK even if it is a unicast frame
The injection code can also skip all other currently defined radiotap fields
facilitating replay of captured radiotap headers directly.
diff --git a/Documentation/networking/netdev-features.txt b/Documentation/networking/netdev-features.txt
new file mode 100644
index 0000000..4b1c0dc
--- /dev/null
+++ b/Documentation/networking/netdev-features.txt
@@ -0,0 +1,154 @@
+Netdev features mess and how to get out from it alive
+=====================================================
+
+Author:
+ Michał Mirosław <mirq-linux@rere.qmqm.pl>
+
+
+
+ Part I: Feature sets
+======================
+
+Long gone are the days when a network card would just take and give packets
+verbatim. Today's devices add multiple features and bugs (read: offloads)
+that relieve an OS of various tasks like generating and checking checksums,
+splitting packets, classifying them. Those capabilities and their state
+are commonly referred to as netdev features in Linux kernel world.
+
+There are currently three sets of features relevant to the driver, and
+one used internally by network core:
+
+ 1. netdev->hw_features set contains features whose state may possibly
+ be changed (enabled or disabled) for a particular device by user's
+ request. This set should be initialized in ndo_init callback and not
+ changed later.
+
+ 2. netdev->features set contains features which are currently enabled
+ for a device. This should be changed only by network core or in
+ error paths of ndo_set_features callback.
+
+ 3. netdev->vlan_features set contains features whose state is inherited
+ by child VLAN devices (limits netdev->features set). This is currently
+ used for all VLAN devices whether tags are stripped or inserted in
+ hardware or software.
+
+ 4. netdev->wanted_features set contains feature set requested by user.
+ This set is filtered by ndo_fix_features callback whenever it or
+ some device-specific conditions change. This set is internal to
+ networking core and should not be referenced in drivers.
+
+
+
+ Part II: Controlling enabled features
+=======================================
+
+When current feature set (netdev->features) is to be changed, new set
+is calculated and filtered by calling ndo_fix_features callback
+and netdev_fix_features(). If the resulting set differs from current
+set, it is passed to ndo_set_features callback and (if the callback
+returns success) replaces value stored in netdev->features.
+NETDEV_FEAT_CHANGE notification is issued after that whenever current
+set might have changed.
+
+The following events trigger recalculation:
+ 1. device's registration, after ndo_init returned success
+ 2. user requested changes in features state
+ 3. netdev_update_features() is called
+
+ndo_*_features callbacks are called with rtnl_lock held. Missing callbacks
+are treated as always returning success.
+
+A driver that wants to trigger recalculation must do so by calling
+netdev_update_features() while holding rtnl_lock. This should not be done
+from ndo_*_features callbacks. netdev->features should not be modified by
+driver except by means of ndo_fix_features callback.
+
+
+
+ Part III: Implementation hints
+================================
+
+ * ndo_fix_features:
+
+All dependencies between features should be resolved here. The resulting
+set can be reduced further by networking core imposed limitations (as coded
+in netdev_fix_features()). For this reason it is safer to disable a feature
+when its dependencies are not met instead of forcing the dependency on.
+
+This callback should not modify hardware nor driver state (should be
+stateless). It can be called multiple times between successive
+ndo_set_features calls.
+
+Callback must not alter features contained in NETIF_F_SOFT_FEATURES or
+NETIF_F_NEVER_CHANGE sets. The exception is NETIF_F_VLAN_CHALLENGED but
+care must be taken as the change won't affect already configured VLANs.
+
+ * ndo_set_features:
+
+Hardware should be reconfigured to match passed feature set. The set
+should not be altered unless some error condition happens that can't
+be reliably detected in ndo_fix_features. In this case, the callback
+should update netdev->features to match resulting hardware state.
+Errors returned are not (and cannot be) propagated anywhere except dmesg.
+(Note: successful return is zero, >0 means silent error.)
+
+
+
+ Part IV: Features
+===================
+
+For current list of features, see include/linux/netdev_features.h.
+This section describes semantics of some of them.
+
+ * Transmit checksumming
+
+For complete description, see comments near the top of include/linux/skbuff.h.
+
+Note: NETIF_F_HW_CSUM is a superset of NETIF_F_IP_CSUM + NETIF_F_IPV6_CSUM.
+It means that device can fill TCP/UDP-like checksum anywhere in the packets
+whatever headers there might be.
+
+ * Transmit TCP segmentation offload
+
+NETIF_F_TSO_ECN means that hardware can properly split packets with CWR bit
+set, be it TCPv4 (when NETIF_F_TSO is enabled) or TCPv6 (NETIF_F_TSO6).
+
+ * Transmit DMA from high memory
+
+On platforms where this is relevant, NETIF_F_HIGHDMA signals that
+ndo_start_xmit can handle skbs with frags in high memory.
+
+ * Transmit scatter-gather
+
+Those features say that ndo_start_xmit can handle fragmented skbs:
+NETIF_F_SG --- paged skbs (skb_shinfo()->frags), NETIF_F_FRAGLIST ---
+chained skbs (skb->next/prev list).
+
+ * Software features
+
+Features contained in NETIF_F_SOFT_FEATURES are features of networking
+stack. Driver should not change behaviour based on them.
+
+ * LLTX driver (deprecated for hardware drivers)
+
+NETIF_F_LLTX should be set in drivers that implement their own locking in
+transmit path or don't need locking at all (e.g. software tunnels).
+In ndo_start_xmit, it is recommended to use a try_lock and return
+NETDEV_TX_LOCKED when the spin lock fails. The locking should also properly
+protect against other callbacks (the rules you need to find out).
+
+Don't use it for new drivers.
+
+ * netns-local device
+
+NETIF_F_NETNS_LOCAL is set for devices that are not allowed to move between
+network namespaces (e.g. loopback).
+
+Don't use it in drivers.
+
+ * VLAN challenged
+
+NETIF_F_VLAN_CHALLENGED should be set for devices which can't cope with VLAN
+headers. Some drivers set this because the cards can't handle the bigger MTU.
+[FIXME: Those cases could be fixed in VLAN code by allowing only reduced-MTU
+VLANs. This may be not useful, though.]
diff --git a/Documentation/networking/netdevices.txt b/Documentation/networking/netdevices.txt
index 87b3d15..8935834 100644
--- a/Documentation/networking/netdevices.txt
+++ b/Documentation/networking/netdevices.txt
@@ -73,7 +73,7 @@ dev->hard_start_xmit:
has to lock by itself when needed. It is recommended to use a try lock
for this and return NETDEV_TX_LOCKED when the spin lock fails.
The locking there should also properly protect against
- set_multicast_list. Note that the use of NETIF_F_LLTX is deprecated.
+ set_rx_mode. Note that the use of NETIF_F_LLTX is deprecated.
Don't use it for new drivers.
Context: Process with BHs disabled or BH (timer),
@@ -92,7 +92,7 @@ dev->tx_timeout:
Context: BHs disabled
Notes: netif_queue_stopped() is guaranteed true
-dev->set_multicast_list:
+dev->set_rx_mode:
Synchronization: netif_tx_lock spinlock.
Context: BHs disabled
diff --git a/Documentation/networking/nfc.txt b/Documentation/networking/nfc.txt
new file mode 100644
index 0000000..b24c29b
--- /dev/null
+++ b/Documentation/networking/nfc.txt
@@ -0,0 +1,128 @@
+Linux NFC subsystem
+===================
+
+The Near Field Communication (NFC) subsystem is required to standardize the
+NFC device drivers development and to create an unified userspace interface.
+
+This document covers the architecture overview, the device driver interface
+description and the userspace interface description.
+
+Architecture overview
+---------------------
+
+The NFC subsystem is responsible for:
+ - NFC adapters management;
+ - Polling for targets;
+ - Low-level data exchange;
+
+The subsystem is divided in some parts. The 'core' is responsible for
+providing the device driver interface. On the other side, it is also
+responsible for providing an interface to control operations and low-level
+data exchange.
+
+The control operations are available to userspace via generic netlink.
+
+The low-level data exchange interface is provided by the new socket family
+PF_NFC. The NFC_SOCKPROTO_RAW performs raw communication with NFC targets.
+
+
+ +--------------------------------------+
+ | USER SPACE |
+ +--------------------------------------+
+ ^ ^
+ | low-level | control
+ | data exchange | operations
+ | |
+ | v
+ | +-----------+
+ | AF_NFC | netlink |
+ | socket +-----------+
+ | raw ^
+ | |
+ v v
+ +---------+ +-----------+
+ | rawsock | <--------> | core |
+ +---------+ +-----------+
+ ^
+ |
+ v
+ +-----------+
+ | driver |
+ +-----------+
+
+Device Driver Interface
+-----------------------
+
+When registering on the NFC subsystem, the device driver must inform the core
+of the set of supported NFC protocols and the set of ops callbacks. The ops
+callbacks that must be implemented are the following:
+
+* start_poll - setup the device to poll for targets
+* stop_poll - stop on progress polling operation
+* activate_target - select and initialize one of the targets found
+* deactivate_target - deselect and deinitialize the selected target
+* data_exchange - send data and receive the response (transceive operation)
+
+Userspace interface
+--------------------
+
+The userspace interface is divided in control operations and low-level data
+exchange operation.
+
+CONTROL OPERATIONS:
+
+Generic netlink is used to implement the interface to the control operations.
+The operations are composed by commands and events, all listed below:
+
+* NFC_CMD_GET_DEVICE - get specific device info or dump the device list
+* NFC_CMD_START_POLL - setup a specific device to polling for targets
+* NFC_CMD_STOP_POLL - stop the polling operation in a specific device
+* NFC_CMD_GET_TARGET - dump the list of targets found by a specific device
+
+* NFC_EVENT_DEVICE_ADDED - reports an NFC device addition
+* NFC_EVENT_DEVICE_REMOVED - reports an NFC device removal
+* NFC_EVENT_TARGETS_FOUND - reports START_POLL results when 1 or more targets
+are found
+
+The user must call START_POLL to poll for NFC targets, passing the desired NFC
+protocols through NFC_ATTR_PROTOCOLS attribute. The device remains in polling
+state until it finds any target. However, the user can stop the polling
+operation by calling STOP_POLL command. In this case, it will be checked if
+the requester of STOP_POLL is the same of START_POLL.
+
+If the polling operation finds one or more targets, the event TARGETS_FOUND is
+sent (including the device id). The user must call GET_TARGET to get the list of
+all targets found by such device. Each reply message has target attributes with
+relevant information such as the supported NFC protocols.
+
+All polling operations requested through one netlink socket are stopped when
+it's closed.
+
+LOW-LEVEL DATA EXCHANGE:
+
+The userspace must use PF_NFC sockets to perform any data communication with
+targets. All NFC sockets use AF_NFC:
+
+struct sockaddr_nfc {
+ sa_family_t sa_family;
+ __u32 dev_idx;
+ __u32 target_idx;
+ __u32 nfc_protocol;
+};
+
+To establish a connection with one target, the user must create an
+NFC_SOCKPROTO_RAW socket and call the 'connect' syscall with the sockaddr_nfc
+struct correctly filled. All information comes from NFC_EVENT_TARGETS_FOUND
+netlink event. As a target can support more than one NFC protocol, the user
+must inform which protocol it wants to use.
+
+Internally, 'connect' will result in an activate_target call to the driver.
+When the socket is closed, the target is deactivated.
+
+The data format exchanged through the sockets is NFC protocol dependent. For
+instance, when communicating with MIFARE tags, the data exchanged are MIFARE
+commands and their responses.
+
+The first received package is the response to the first sent package and so
+on. In order to allow valid "empty" responses, every data received has a NULL
+header of 1 byte.
diff --git a/Documentation/networking/rds.txt b/Documentation/networking/rds.txt
index c67077c..e1a3d59 100644
--- a/Documentation/networking/rds.txt
+++ b/Documentation/networking/rds.txt
@@ -62,11 +62,10 @@ Socket Interface
================
AF_RDS, PF_RDS, SOL_RDS
- These constants haven't been assigned yet, because RDS isn't in
- mainline yet. Currently, the kernel module assigns some constant
- and publishes it to user space through two sysctl files
- /proc/sys/net/rds/pf_rds
- /proc/sys/net/rds/sol_rds
+ AF_RDS and PF_RDS are the domain type to be used with socket(2)
+ to create RDS sockets. SOL_RDS is the socket-level to be used
+ with setsockopt(2) and getsockopt(2) for RDS specific socket
+ options.
fd = socket(PF_RDS, SOCK_SEQPACKET, 0);
This creates a new, unbound RDS socket.
diff --git a/Documentation/networking/scaling.txt b/Documentation/networking/scaling.txt
new file mode 100644
index 0000000..a177de2
--- /dev/null
+++ b/Documentation/networking/scaling.txt
@@ -0,0 +1,378 @@
+Scaling in the Linux Networking Stack
+
+
+Introduction
+============
+
+This document describes a set of complementary techniques in the Linux
+networking stack to increase parallelism and improve performance for
+multi-processor systems.
+
+The following technologies are described:
+
+ RSS: Receive Side Scaling
+ RPS: Receive Packet Steering
+ RFS: Receive Flow Steering
+ Accelerated Receive Flow Steering
+ XPS: Transmit Packet Steering
+
+
+RSS: Receive Side Scaling
+=========================
+
+Contemporary NICs support multiple receive and transmit descriptor queues
+(multi-queue). On reception, a NIC can send different packets to different
+queues to distribute processing among CPUs. The NIC distributes packets by
+applying a filter to each packet that assigns it to one of a small number
+of logical flows. Packets for each flow are steered to a separate receive
+queue, which in turn can be processed by separate CPUs. This mechanism is
+generally known as “Receive-side Scaling” (RSS). The goal of RSS and
+the other scaling techniques is to increase performance uniformly.
+Multi-queue distribution can also be used for traffic prioritization, but
+that is not the focus of these techniques.
+
+The filter used in RSS is typically a hash function over the network
+and/or transport layer headers-- for example, a 4-tuple hash over
+IP addresses and TCP ports of a packet. The most common hardware
+implementation of RSS uses a 128-entry indirection table where each entry
+stores a queue number. The receive queue for a packet is determined
+by masking out the low order seven bits of the computed hash for the
+packet (usually a Toeplitz hash), taking this number as a key into the
+indirection table and reading the corresponding value.
+
+Some advanced NICs allow steering packets to queues based on
+programmable filters. For example, webserver bound TCP port 80 packets
+can be directed to their own receive queue. Such “n-tuple” filters can
+be configured from ethtool (--config-ntuple).
+
+==== RSS Configuration
+
+The driver for a multi-queue capable NIC typically provides a kernel
+module parameter for specifying the number of hardware queues to
+configure. In the bnx2x driver, for instance, this parameter is called
+num_queues. A typical RSS configuration would be to have one receive queue
+for each CPU if the device supports enough queues, or otherwise at least
+one for each memory domain, where a memory domain is a set of CPUs that
+share a particular memory level (L1, L2, NUMA node, etc.).
+
+The indirection table of an RSS device, which resolves a queue by masked
+hash, is usually programmed by the driver at initialization. The
+default mapping is to distribute the queues evenly in the table, but the
+indirection table can be retrieved and modified at runtime using ethtool
+commands (--show-rxfh-indir and --set-rxfh-indir). Modifying the
+indirection table could be done to give different queues different
+relative weights.
+
+== RSS IRQ Configuration
+
+Each receive queue has a separate IRQ associated with it. The NIC triggers
+this to notify a CPU when new packets arrive on the given queue. The
+signaling path for PCIe devices uses message signaled interrupts (MSI-X),
+that can route each interrupt to a particular CPU. The active mapping
+of queues to IRQs can be determined from /proc/interrupts. By default,
+an IRQ may be handled on any CPU. Because a non-negligible part of packet
+processing takes place in receive interrupt handling, it is advantageous
+to spread receive interrupts between CPUs. To manually adjust the IRQ
+affinity of each interrupt see Documentation/IRQ-affinity.txt. Some systems
+will be running irqbalance, a daemon that dynamically optimizes IRQ
+assignments and as a result may override any manual settings.
+
+== Suggested Configuration
+
+RSS should be enabled when latency is a concern or whenever receive
+interrupt processing forms a bottleneck. Spreading load between CPUs
+decreases queue length. For low latency networking, the optimal setting
+is to allocate as many queues as there are CPUs in the system (or the
+NIC maximum, if lower). The most efficient high-rate configuration
+is likely the one with the smallest number of receive queues where no
+receive queue overflows due to a saturated CPU, because in default
+mode with interrupt coalescing enabled, the aggregate number of
+interrupts (and thus work) grows with each additional queue.
+
+Per-cpu load can be observed using the mpstat utility, but note that on
+processors with hyperthreading (HT), each hyperthread is represented as
+a separate CPU. For interrupt handling, HT has shown no benefit in
+initial tests, so limit the number of queues to the number of CPU cores
+in the system.
+
+
+RPS: Receive Packet Steering
+============================
+
+Receive Packet Steering (RPS) is logically a software implementation of
+RSS. Being in software, it is necessarily called later in the datapath.
+Whereas RSS selects the queue and hence CPU that will run the hardware
+interrupt handler, RPS selects the CPU to perform protocol processing
+above the interrupt handler. This is accomplished by placing the packet
+on the desired CPU’s backlog queue and waking up the CPU for processing.
+RPS has some advantages over RSS: 1) it can be used with any NIC,
+2) software filters can easily be added to hash over new protocols,
+3) it does not increase hardware device interrupt rate (although it does
+introduce inter-processor interrupts (IPIs)).
+
+RPS is called during bottom half of the receive interrupt handler, when
+a driver sends a packet up the network stack with netif_rx() or
+netif_receive_skb(). These call the get_rps_cpu() function, which
+selects the queue that should process a packet.
+
+The first step in determining the target CPU for RPS is to calculate a
+flow hash over the packet’s addresses or ports (2-tuple or 4-tuple hash
+depending on the protocol). This serves as a consistent hash of the
+associated flow of the packet. The hash is either provided by hardware
+or will be computed in the stack. Capable hardware can pass the hash in
+the receive descriptor for the packet; this would usually be the same
+hash used for RSS (e.g. computed Toeplitz hash). The hash is saved in
+skb->rx_hash and can be used elsewhere in the stack as a hash of the
+packet’s flow.
+
+Each receive hardware queue has an associated list of CPUs to which
+RPS may enqueue packets for processing. For each received packet,
+an index into the list is computed from the flow hash modulo the size
+of the list. The indexed CPU is the target for processing the packet,
+and the packet is queued to the tail of that CPU’s backlog queue. At
+the end of the bottom half routine, IPIs are sent to any CPUs for which
+packets have been queued to their backlog queue. The IPI wakes backlog
+processing on the remote CPU, and any queued packets are then processed
+up the networking stack.
+
+==== RPS Configuration
+
+RPS requires a kernel compiled with the CONFIG_RPS kconfig symbol (on
+by default for SMP). Even when compiled in, RPS remains disabled until
+explicitly configured. The list of CPUs to which RPS may forward traffic
+can be configured for each receive queue using a sysfs file entry:
+
+ /sys/class/net/<dev>/queues/rx-<n>/rps_cpus
+
+This file implements a bitmap of CPUs. RPS is disabled when it is zero
+(the default), in which case packets are processed on the interrupting
+CPU. Documentation/IRQ-affinity.txt explains how CPUs are assigned to
+the bitmap.
+
+== Suggested Configuration
+
+For a single queue device, a typical RPS configuration would be to set
+the rps_cpus to the CPUs in the same memory domain of the interrupting
+CPU. If NUMA locality is not an issue, this could also be all CPUs in
+the system. At high interrupt rate, it might be wise to exclude the
+interrupting CPU from the map since that already performs much work.
+
+For a multi-queue system, if RSS is configured so that a hardware
+receive queue is mapped to each CPU, then RPS is probably redundant
+and unnecessary. If there are fewer hardware queues than CPUs, then
+RPS might be beneficial if the rps_cpus for each queue are the ones that
+share the same memory domain as the interrupting CPU for that queue.
+
+
+RFS: Receive Flow Steering
+==========================
+
+While RPS steers packets solely based on hash, and thus generally
+provides good load distribution, it does not take into account
+application locality. This is accomplished by Receive Flow Steering
+(RFS). The goal of RFS is to increase datacache hitrate by steering
+kernel processing of packets to the CPU where the application thread
+consuming the packet is running. RFS relies on the same RPS mechanisms
+to enqueue packets onto the backlog of another CPU and to wake up that
+CPU.
+
+In RFS, packets are not forwarded directly by the value of their hash,
+but the hash is used as index into a flow lookup table. This table maps
+flows to the CPUs where those flows are being processed. The flow hash
+(see RPS section above) is used to calculate the index into this table.
+The CPU recorded in each entry is the one which last processed the flow.
+If an entry does not hold a valid CPU, then packets mapped to that entry
+are steered using plain RPS. Multiple table entries may point to the
+same CPU. Indeed, with many flows and few CPUs, it is very likely that
+a single application thread handles flows with many different flow hashes.
+
+rps_sock_flow_table is a global flow table that contains the *desired* CPU
+for flows: the CPU that is currently processing the flow in userspace.
+Each table value is a CPU index that is updated during calls to recvmsg
+and sendmsg (specifically, inet_recvmsg(), inet_sendmsg(), inet_sendpage()
+and tcp_splice_read()).
+
+When the scheduler moves a thread to a new CPU while it has outstanding
+receive packets on the old CPU, packets may arrive out of order. To
+avoid this, RFS uses a second flow table to track outstanding packets
+for each flow: rps_dev_flow_table is a table specific to each hardware
+receive queue of each device. Each table value stores a CPU index and a
+counter. The CPU index represents the *current* CPU onto which packets
+for this flow are enqueued for further kernel processing. Ideally, kernel
+and userspace processing occur on the same CPU, and hence the CPU index
+in both tables is identical. This is likely false if the scheduler has
+recently migrated a userspace thread while the kernel still has packets
+enqueued for kernel processing on the old CPU.
+
+The counter in rps_dev_flow_table values records the length of the current
+CPU's backlog when a packet in this flow was last enqueued. Each backlog
+queue has a head counter that is incremented on dequeue. A tail counter
+is computed as head counter + queue length. In other words, the counter
+in rps_dev_flow_table[i] records the last element in flow i that has
+been enqueued onto the currently designated CPU for flow i (of course,
+entry i is actually selected by hash and multiple flows may hash to the
+same entry i).
+
+And now the trick for avoiding out of order packets: when selecting the
+CPU for packet processing (from get_rps_cpu()) the rps_sock_flow table
+and the rps_dev_flow table of the queue that the packet was received on
+are compared. If the desired CPU for the flow (found in the
+rps_sock_flow table) matches the current CPU (found in the rps_dev_flow
+table), the packet is enqueued onto that CPU’s backlog. If they differ,
+the current CPU is updated to match the desired CPU if one of the
+following is true:
+
+- The current CPU's queue head counter >= the recorded tail counter
+ value in rps_dev_flow[i]
+- The current CPU is unset (equal to NR_CPUS)
+- The current CPU is offline
+
+After this check, the packet is sent to the (possibly updated) current
+CPU. These rules aim to ensure that a flow only moves to a new CPU when
+there are no packets outstanding on the old CPU, as the outstanding
+packets could arrive later than those about to be processed on the new
+CPU.
+
+==== RFS Configuration
+
+RFS is only available if the kconfig symbol CONFIG_RFS is enabled (on
+by default for SMP). The functionality remains disabled until explicitly
+configured. The number of entries in the global flow table is set through:
+
+ /proc/sys/net/core/rps_sock_flow_entries
+
+The number of entries in the per-queue flow table are set through:
+
+ /sys/class/net/<dev>/queues/rx-<n>/rps_flow_cnt
+
+== Suggested Configuration
+
+Both of these need to be set before RFS is enabled for a receive queue.
+Values for both are rounded up to the nearest power of two. The
+suggested flow count depends on the expected number of active connections
+at any given time, which may be significantly less than the number of open
+connections. We have found that a value of 32768 for rps_sock_flow_entries
+works fairly well on a moderately loaded server.
+
+For a single queue device, the rps_flow_cnt value for the single queue
+would normally be configured to the same value as rps_sock_flow_entries.
+For a multi-queue device, the rps_flow_cnt for each queue might be
+configured as rps_sock_flow_entries / N, where N is the number of
+queues. So for instance, if rps_flow_entries is set to 32768 and there
+are 16 configured receive queues, rps_flow_cnt for each queue might be
+configured as 2048.
+
+
+Accelerated RFS
+===============
+
+Accelerated RFS is to RFS what RSS is to RPS: a hardware-accelerated load
+balancing mechanism that uses soft state to steer flows based on where
+the application thread consuming the packets of each flow is running.
+Accelerated RFS should perform better than RFS since packets are sent
+directly to a CPU local to the thread consuming the data. The target CPU
+will either be the same CPU where the application runs, or at least a CPU
+which is local to the application thread’s CPU in the cache hierarchy.
+
+To enable accelerated RFS, the networking stack calls the
+ndo_rx_flow_steer driver function to communicate the desired hardware
+queue for packets matching a particular flow. The network stack
+automatically calls this function every time a flow entry in
+rps_dev_flow_table is updated. The driver in turn uses a device specific
+method to program the NIC to steer the packets.
+
+The hardware queue for a flow is derived from the CPU recorded in
+rps_dev_flow_table. The stack consults a CPU to hardware queue map which
+is maintained by the NIC driver. This is an auto-generated reverse map of
+the IRQ affinity table shown by /proc/interrupts. Drivers can use
+functions in the cpu_rmap (“CPU affinity reverse map”) kernel library
+to populate the map. For each CPU, the corresponding queue in the map is
+set to be one whose processing CPU is closest in cache locality.
+
+==== Accelerated RFS Configuration
+
+Accelerated RFS is only available if the kernel is compiled with
+CONFIG_RFS_ACCEL and support is provided by the NIC device and driver.
+It also requires that ntuple filtering is enabled via ethtool. The map
+of CPU to queues is automatically deduced from the IRQ affinities
+configured for each receive queue by the driver, so no additional
+configuration should be necessary.
+
+== Suggested Configuration
+
+This technique should be enabled whenever one wants to use RFS and the
+NIC supports hardware acceleration.
+
+XPS: Transmit Packet Steering
+=============================
+
+Transmit Packet Steering is a mechanism for intelligently selecting
+which transmit queue to use when transmitting a packet on a multi-queue
+device. To accomplish this, a mapping from CPU to hardware queue(s) is
+recorded. The goal of this mapping is usually to assign queues
+exclusively to a subset of CPUs, where the transmit completions for
+these queues are processed on a CPU within this set. This choice
+provides two benefits. First, contention on the device queue lock is
+significantly reduced since fewer CPUs contend for the same queue
+(contention can be eliminated completely if each CPU has its own
+transmit queue). Secondly, cache miss rate on transmit completion is
+reduced, in particular for data cache lines that hold the sk_buff
+structures.
+
+XPS is configured per transmit queue by setting a bitmap of CPUs that
+may use that queue to transmit. The reverse mapping, from CPUs to
+transmit queues, is computed and maintained for each network device.
+When transmitting the first packet in a flow, the function
+get_xps_queue() is called to select a queue. This function uses the ID
+of the running CPU as a key into the CPU-to-queue lookup table. If the
+ID matches a single queue, that is used for transmission. If multiple
+queues match, one is selected by using the flow hash to compute an index
+into the set.
+
+The queue chosen for transmitting a particular flow is saved in the
+corresponding socket structure for the flow (e.g. a TCP connection).
+This transmit queue is used for subsequent packets sent on the flow to
+prevent out of order (ooo) packets. The choice also amortizes the cost
+of calling get_xps_queues() over all packets in the flow. To avoid
+ooo packets, the queue for a flow can subsequently only be changed if
+skb->ooo_okay is set for a packet in the flow. This flag indicates that
+there are no outstanding packets in the flow, so the transmit queue can
+change without the risk of generating out of order packets. The
+transport layer is responsible for setting ooo_okay appropriately. TCP,
+for instance, sets the flag when all data for a connection has been
+acknowledged.
+
+==== XPS Configuration
+
+XPS is only available if the kconfig symbol CONFIG_XPS is enabled (on by
+default for SMP). The functionality remains disabled until explicitly
+configured. To enable XPS, the bitmap of CPUs that may use a transmit
+queue is configured using the sysfs file entry:
+
+/sys/class/net/<dev>/queues/tx-<n>/xps_cpus
+
+== Suggested Configuration
+
+For a network device with a single transmission queue, XPS configuration
+has no effect, since there is no choice in this case. In a multi-queue
+system, XPS is preferably configured so that each CPU maps onto one queue.
+If there are as many queues as there are CPUs in the system, then each
+queue can also map onto one CPU, resulting in exclusive pairings that
+experience no contention. If there are fewer queues than CPUs, then the
+best CPUs to share a given queue are probably those that share the cache
+with the CPU that processes transmit completions for that queue
+(transmit interrupts).
+
+
+Further Information
+===================
+RPS and RFS were introduced in kernel 2.6.35. XPS was incorporated into
+2.6.38. Original patches were submitted by Tom Herbert
+(therbert@google.com)
+
+Accelerated RFS was introduced in 2.6.35. Original patches were
+submitted by Ben Hutchings (bhutchings@solarflare.com)
+
+Authors:
+Tom Herbert (therbert@google.com)
+Willem de Bruijn (willemb@google.com)
diff --git a/Documentation/networking/stmmac.txt b/Documentation/networking/stmmac.txt
index 80a7a34..8d67980 100644
--- a/Documentation/networking/stmmac.txt
+++ b/Documentation/networking/stmmac.txt
@@ -7,7 +7,7 @@ This is the driver for the MAC 10/100/1000 on-chip Ethernet controllers
(Synopsys IP blocks); it has been fully tested on STLinux platforms.
Currently this network device driver is for all STM embedded MAC/GMAC
-(7xxx SoCs). Other platforms start using it i.e. ARM SPEAr.
+(i.e. 7xxx/5xxx SoCs) and it's known working on other platforms i.e. ARM SPEAr.
DWC Ether MAC 10/100/1000 Universal version 3.41a and DWC Ether MAC 10/100
Universal version 4.0 have been used for developing the first code
@@ -71,12 +71,21 @@ Several performance tests on STM platforms showed this optimisation allows to sp
the CPU while having the maximum throughput.
4.4) WOL
-Wake up on Lan feature through Magic Frame is only supported for the GMAC
+Wake up on Lan feature through Magic and Unicast frames are supported for the GMAC
core.
4.5) DMA descriptors
Driver handles both normal and enhanced descriptors. The latter has been only
-tested on DWC Ether MAC 10/100/1000 Universal version 3.41a.
+tested on DWC Ether MAC 10/100/1000 Universal version 3.41a and later.
+
+STMMAC supports DMA descriptor to operate both in dual buffer (RING)
+and linked-list(CHAINED) mode. In RING each descriptor points to two
+data buffer pointers whereas in CHAINED mode they point to only one data
+buffer pointer. RING mode is the default.
+
+In CHAINED mode each descriptor will have pointer to next descriptor in
+the list, hence creating the explicit chaining in the descriptor itself,
+whereas such explicit chaining is not possible in RING mode.
4.6) Ethtool support
Ethtool is supported. Driver statistics and internal errors can be taken using:
@@ -91,11 +100,15 @@ LRO is not supported.
The driver is compatible with PAL to work with PHY and GPHY devices.
4.9) Platform information
-Several information came from the platform; please refer to the
-driver's Header file in include/linux directory.
+Several driver's information can be passed through the platform
+These are included in the include/linux/stmmac.h header file
+and detailed below as well:
-struct plat_stmmacenet_data {
+ struct plat_stmmacenet_data {
int bus_id;
+ int phy_addr;
+ int interface;
+ struct stmmac_mdio_bus_data *mdio_bus_data;
int pbl;
int clk_csr;
int has_gmac;
@@ -103,67 +116,166 @@ struct plat_stmmacenet_data {
int tx_coe;
int bugged_jumbo;
int pmt;
- void (*fix_mac_speed)(void *priv, unsigned int speed);
- void (*bus_setup)(unsigned long ioaddr);
-#ifdef CONFIG_STM_DRIVERS
- struct stm_pad_config *pad_config;
-#endif
- void *bsp_priv;
-};
+ int force_sf_dma_mode;
+ void (*fix_mac_speed)(void *priv, unsigned int speed);
+ void (*bus_setup)(void __iomem *ioaddr);
+ int (*init)(struct platform_device *pdev);
+ void (*exit)(struct platform_device *pdev);
+ void *bsp_priv;
+ };
Where:
-- pbl (Programmable Burst Length) is maximum number of
- beats to be transferred in one DMA transaction.
- GMAC also enables the 4xPBL by default.
-- fix_mac_speed and bus_setup are used to configure internal target
- registers (on STM platforms);
-- has_gmac: GMAC core is on board (get it at run-time in the next step);
-- bus_id: bus identifier.
-- tx_coe: core is able to perform the tx csum in HW.
-- enh_desc: if sets the MAC will use the enhanced descriptor structure.
-- clk_csr: CSR Clock range selection.
-- bugged_jumbo: some HWs are not able to perform the csum in HW for
- over-sized frames due to limited buffer sizes. Setting this
- flag the csum will be done in SW on JUMBO frames.
-
-struct plat_stmmacphy_data {
- int bus_id;
- int phy_addr;
- unsigned int phy_mask;
- int interface;
- int (*phy_reset)(void *priv);
- void *priv;
-};
+ o bus_id: bus identifier.
+ o phy_addr: the physical address can be passed from the platform.
+ If it is set to -1 the driver will automatically
+ detect it at run-time by probing all the 32 addresses.
+ o interface: PHY device's interface.
+ o mdio_bus_data: specific platform fields for the MDIO bus.
+ o pbl: the Programmable Burst Length is maximum number of beats to
+ be transferred in one DMA transaction.
+ GMAC also enables the 4xPBL by default.
+ o clk_csr: CSR Clock range selection.
+ o has_gmac: uses the GMAC core.
+ o enh_desc: if sets the MAC will use the enhanced descriptor structure.
+ o tx_coe: core is able to perform the tx csum in HW.
+ o bugged_jumbo: some HWs are not able to perform the csum in HW for
+ over-sized frames due to limited buffer sizes.
+ Setting this flag the csum will be done in SW on
+ JUMBO frames.
+ o pmt: core has the embedded power module (optional).
+ o force_sf_dma_mode: force DMA to use the Store and Forward mode
+ instead of the Threshold.
+ o fix_mac_speed: this callback is used for modifying some syscfg registers
+ (on ST SoCs) according to the link speed negotiated by the
+ physical layer .
+ o bus_setup: perform HW setup of the bus. For example, on some ST platforms
+ this field is used to configure the AMBA bridge to generate more
+ efficient STBus traffic.
+ o init/exit: callbacks used for calling a custom initialisation;
+ this is sometime necessary on some platforms (e.g. ST boxes)
+ where the HW needs to have set some PIO lines or system cfg
+ registers.
+ o custom_cfg: this is a custom configuration that can be passed while
+ initialising the resources.
+
+The we have:
+
+ struct stmmac_mdio_bus_data {
+ int bus_id;
+ int (*phy_reset)(void *priv);
+ unsigned int phy_mask;
+ int *irqs;
+ int probed_phy_irq;
+ };
Where:
-- bus_id: bus identifier;
-- phy_addr: physical address used for the attached phy device;
- set it to -1 to get it at run-time;
-- interface: physical MII interface mode;
-- phy_reset: hook to reset HW function.
-
-SOURCES:
-- Kconfig
-- Makefile
-- stmmac_main.c: main network device driver;
-- stmmac_mdio.c: mdio functions;
-- stmmac_ethtool.c: ethtool support;
-- stmmac_timer.[ch]: timer code used for mitigating the driver dma interrupts
- Only tested on ST40 platforms based.
-- stmmac.h: private driver structure;
-- common.h: common definitions and VFTs;
-- descs.h: descriptor structure definitions;
-- dwmac1000_core.c: GMAC core functions;
-- dwmac1000_dma.c: dma functions for the GMAC chip;
-- dwmac1000.h: specific header file for the GMAC;
-- dwmac100_core: MAC 100 core and dma code;
-- dwmac100_dma.c: dma funtions for the MAC chip;
-- dwmac1000.h: specific header file for the MAC;
-- dwmac_lib.c: generic DMA functions shared among chips
-- enh_desc.c: functions for handling enhanced descriptors
-- norm_desc.c: functions for handling normal descriptors
-
-TODO:
-- XGMAC controller is not supported.
-- Review the timer optimisation code to use an embedded device that seems to be
+ o bus_id: bus identifier;
+ o phy_reset: hook to reset the phy device attached to the bus.
+ o phy_mask: phy mask passed when register the MDIO bus within the driver.
+ o irqs: list of IRQs, one per PHY.
+ o probed_phy_irq: if irqs is NULL, use this for probed PHY.
+
+Below an example how the structures above are using on ST platforms.
+
+ static struct plat_stmmacenet_data stxYYY_ethernet_platform_data = {
+ .pbl = 32,
+ .has_gmac = 0,
+ .enh_desc = 0,
+ .fix_mac_speed = stxYYY_ethernet_fix_mac_speed,
+ |
+ |-> to write an internal syscfg
+ | on this platform when the
+ | link speed changes from 10 to
+ | 100 and viceversa
+ .init = &stmmac_claim_resource,
+ |
+ |-> On ST SoC this calls own "PAD"
+ | manager framework to claim
+ | all the resources necessary
+ | (GPIO ...). The .custom_cfg field
+ | is used to pass a custom config.
+};
+
+Below the usage of the stmmac_mdio_bus_data: on this SoC, in fact,
+there are two MAC cores: one MAC is for MDIO Bus/PHY emulation
+with fixed_link support.
+
+static struct stmmac_mdio_bus_data stmmac1_mdio_bus = {
+ .bus_id = 1,
+ |
+ |-> phy device on the bus_id 1
+ .phy_reset = phy_reset;
+ |
+ |-> function to provide the phy_reset on this board
+ .phy_mask = 0,
+};
+
+static struct fixed_phy_status stmmac0_fixed_phy_status = {
+ .link = 1,
+ .speed = 100,
+ .duplex = 1,
+};
+
+During the board's device_init we can configure the first
+MAC for fixed_link by calling:
+ fixed_phy_add(PHY_POLL, 1, &stmmac0_fixed_phy_status));)
+and the second one, with a real PHY device attached to the bus,
+by using the stmmac_mdio_bus_data structure (to provide the id, the
+reset procedure etc).
+
+4.10) List of source files:
+ o Kconfig
+ o Makefile
+ o stmmac_main.c: main network device driver;
+ o stmmac_mdio.c: mdio functions;
+ o stmmac_ethtool.c: ethtool support;
+ o stmmac_timer.[ch]: timer code used for mitigating the driver dma interrupts
+ Only tested on ST40 platforms based.
+ o stmmac.h: private driver structure;
+ o common.h: common definitions and VFTs;
+ o descs.h: descriptor structure definitions;
+ o dwmac1000_core.c: GMAC core functions;
+ o dwmac1000_dma.c: dma functions for the GMAC chip;
+ o dwmac1000.h: specific header file for the GMAC;
+ o dwmac100_core: MAC 100 core and dma code;
+ o dwmac100_dma.c: dma funtions for the MAC chip;
+ o dwmac1000.h: specific header file for the MAC;
+ o dwmac_lib.c: generic DMA functions shared among chips
+ o enh_desc.c: functions for handling enhanced descriptors
+ o norm_desc.c: functions for handling normal descriptors
+
+5) Debug Information
+
+The driver exports many information i.e. internal statistics,
+debug information, MAC and DMA registers etc.
+
+These can be read in several ways depending on the
+type of the information actually needed.
+
+For example a user can be use the ethtool support
+to get statistics: e.g. using: ethtool -S ethX
+(that shows the Management counters (MMC) if supported)
+or sees the MAC/DMA registers: e.g. using: ethtool -d ethX
+
+Compiling the Kernel with CONFIG_DEBUG_FS and enabling the
+STMMAC_DEBUG_FS option the driver will export the following
+debugfs entries:
+
+/sys/kernel/debug/stmmaceth/descriptors_status
+ To show the DMA TX/RX descriptor rings
+
+Developer can also use the "debug" module parameter to get
+further debug information.
+
+In the end, there are other macros (that cannot be enabled
+via menuconfig) to turn-on the RX/TX DMA debugging,
+specific MAC core debug printk etc. Others to enable the
+debug in the TX and RX processes.
+All these are only useful during the developing stage
+and should never enabled inside the code for general usage.
+In fact, these can generate an huge amount of debug messages.
+
+6) TODO:
+ o XGMAC is not supported.
+ o Review the timer optimisation code to use an embedded device that will be
available in new chip generations.