aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
authorStefan Richter <stefanr@s5r6.in-berlin.de>2006-09-17 18:17:19 +0200
committerStefan Richter <stefanr@s5r6.in-berlin.de>2006-09-17 18:19:31 +0200
commit9b4f2e9576658c4e52d95dc8d309f51b2e2db096 (patch)
tree7b1902b0f931783fccc6fee45c6f9c16b4fde5ce /Documentation
parent3c6c65f5ed5a6d307bd607aecd06d658c0934d88 (diff)
parent803db244b9f71102e366fd689000c1417b9a7508 (diff)
downloadkernel_samsung_smdk4412-9b4f2e9576658c4e52d95dc8d309f51b2e2db096.zip
kernel_samsung_smdk4412-9b4f2e9576658c4e52d95dc8d309f51b2e2db096.tar.gz
kernel_samsung_smdk4412-9b4f2e9576658c4e52d95dc8d309f51b2e2db096.tar.bz2
ieee1394: merge from Linus
Conflicts: drivers/ieee1394/hosts.c Patch "lockdep: annotate ieee1394 skb-queue-head locking" was meddling with patch "ieee1394: fix kerneldoc of hpsb_alloc_host". Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/DMA-mapping.txt8
-rw-r--r--Documentation/DocBook/kernel-api.tmpl9
-rw-r--r--Documentation/DocBook/mtdnand.tmpl11
-rw-r--r--Documentation/RCU/whatisRCU.txt5
-rw-r--r--Documentation/SubmitChecklist76
-rw-r--r--Documentation/SubmittingPatches12
-rw-r--r--Documentation/accounting/delay-accounting.txt112
-rw-r--r--Documentation/accounting/getdelays.c396
-rw-r--r--Documentation/accounting/taskstats.txt181
-rw-r--r--Documentation/cciss.txt1
-rw-r--r--Documentation/connector/ucon.c206
-rw-r--r--Documentation/cpu-freq/user-guide.txt5
-rw-r--r--Documentation/cpu-hotplug.txt12
-rw-r--r--Documentation/cpusets.txt6
-rw-r--r--Documentation/devices.txt8
-rw-r--r--Documentation/drivers/edac/edac.txt152
-rw-r--r--Documentation/fb/imacfb.txt31
-rw-r--r--Documentation/feature-removal-schedule.txt53
-rw-r--r--Documentation/filesystems/00-INDEX4
-rw-r--r--Documentation/filesystems/Locking4
-rw-r--r--Documentation/filesystems/relay.txt479
-rw-r--r--Documentation/filesystems/relayfs.txt442
-rw-r--r--Documentation/filesystems/vfs.txt4
-rw-r--r--Documentation/hwmon/abituguru32
-rw-r--r--Documentation/i2c/busses/i2c-sis96x4
-rw-r--r--Documentation/i386/boot.txt1
-rw-r--r--Documentation/i386/zero-page.txt4
-rw-r--r--Documentation/infiniband/ipoib.txt2
-rw-r--r--Documentation/initrd.txt16
-rw-r--r--Documentation/input/joystick.txt1
-rw-r--r--Documentation/irqflags-tracing.txt57
-rw-r--r--Documentation/kbuild/makefiles.txt14
-rw-r--r--Documentation/kernel-parameters.txt13
-rw-r--r--Documentation/kobject.txt2
-rw-r--r--Documentation/lockdep-design.txt197
-rw-r--r--Documentation/memory-barriers.txt5
-rw-r--r--Documentation/mips/time.README10
-rw-r--r--Documentation/networking/ip-sysctl.txt6
-rw-r--r--Documentation/networking/ipvs-sysctl.txt143
-rw-r--r--Documentation/nfsroot.txt275
-rw-r--r--Documentation/powerpc/booting-without-of.txt20
-rw-r--r--Documentation/ramdisk.txt12
-rw-r--r--Documentation/scsi/ChangeLog.megaraid123
-rw-r--r--Documentation/scsi/ChangeLog.megaraid_sas16
-rw-r--r--Documentation/sound/alsa/DocBook/writing-an-alsa-driver.tmpl4
-rw-r--r--Documentation/sysctl/fs.txt20
-rw-r--r--Documentation/sysctl/kernel.txt25
-rw-r--r--Documentation/sysctl/vm.txt14
-rw-r--r--Documentation/usb/proc_usb_info.txt2
-rw-r--r--Documentation/usb/usb-help.txt3
-rw-r--r--Documentation/usb/usb-serial.txt4
-rw-r--r--Documentation/x86_64/boot-options.txt7
52 files changed, 2413 insertions, 836 deletions
diff --git a/Documentation/DMA-mapping.txt b/Documentation/DMA-mapping.txt
index 7c71769..63392c9 100644
--- a/Documentation/DMA-mapping.txt
+++ b/Documentation/DMA-mapping.txt
@@ -698,12 +698,12 @@ these interfaces. Remember that, as defined, consistent mappings are
always going to be SAC addressable.
The first thing your driver needs to do is query the PCI platform
-layer with your devices DAC addressing capabilities:
+layer if it is capable of handling your devices DAC addressing
+capabilities:
- int pci_dac_set_dma_mask(struct pci_dev *pdev, u64 mask);
+ int pci_dac_dma_supported(struct pci_dev *hwdev, u64 mask);
-This routine behaves identically to pci_set_dma_mask. You may not
-use the following interfaces if this routine fails.
+You may not use the following interfaces if this routine fails.
Next, DMA addresses using this API are kept track of using the
dma64_addr_t type. It is guaranteed to be big enough to hold any
diff --git a/Documentation/DocBook/kernel-api.tmpl b/Documentation/DocBook/kernel-api.tmpl
index 1ae4dc0f..f8fe882 100644
--- a/Documentation/DocBook/kernel-api.tmpl
+++ b/Documentation/DocBook/kernel-api.tmpl
@@ -59,6 +59,9 @@
!Iinclude/linux/hrtimer.h
!Ekernel/hrtimer.c
</sect1>
+ <sect1><title>Workqueues and Kevents</title>
+!Ekernel/workqueue.c
+ </sect1>
<sect1><title>Internal Functions</title>
!Ikernel/exit.c
!Ikernel/signal.c
@@ -300,7 +303,7 @@ X!Ekernel/module.c
</sect1>
<sect1><title>Resources Management</title>
-!Ekernel/resource.c
+!Ikernel/resource.c
</sect1>
<sect1><title>MTRR Handling</title>
@@ -312,9 +315,7 @@ X!Ekernel/module.c
!Edrivers/pci/pci-driver.c
!Edrivers/pci/remove.c
!Edrivers/pci/pci-acpi.c
-<!-- kerneldoc does not understand __devinit
-X!Edrivers/pci/search.c
- -->
+!Edrivers/pci/search.c
!Edrivers/pci/msi.c
!Edrivers/pci/bus.c
<!-- FIXME: Removed for now since no structured comments in source
diff --git a/Documentation/DocBook/mtdnand.tmpl b/Documentation/DocBook/mtdnand.tmpl
index 999afe1..a8c8cce 100644
--- a/Documentation/DocBook/mtdnand.tmpl
+++ b/Documentation/DocBook/mtdnand.tmpl
@@ -109,7 +109,7 @@
for most of the implementations. These functions can be replaced by the
board driver if neccecary. Those functions are called via pointers in the
NAND chip description structure. The board driver can set the functions which
- should be replaced by board dependend functions before calling nand_scan().
+ should be replaced by board dependent functions before calling nand_scan().
If the function pointer is NULL on entry to nand_scan() then the pointer
is set to the default function which is suitable for the detected chip type.
</para></listitem>
@@ -133,7 +133,7 @@
[REPLACEABLE]</para><para>
Replaceable members hold hardware related functions which can be
provided by the board driver. The board driver can set the functions which
- should be replaced by board dependend functions before calling nand_scan().
+ should be replaced by board dependent functions before calling nand_scan().
If the function pointer is NULL on entry to nand_scan() then the pointer
is set to the default function which is suitable for the detected chip type.
</para></listitem>
@@ -156,9 +156,8 @@
<title>Basic board driver</title>
<para>
For most boards it will be sufficient to provide just the
- basic functions and fill out some really board dependend
+ basic functions and fill out some really board dependent
members in the nand chip description structure.
- See drivers/mtd/nand/skeleton for reference.
</para>
<sect1>
<title>Basic defines</title>
@@ -1295,7 +1294,9 @@ in this page</entry>
</para>
!Idrivers/mtd/nand/nand_base.c
!Idrivers/mtd/nand/nand_bbt.c
-!Idrivers/mtd/nand/nand_ecc.c
+<!-- No internal functions for kernel-doc:
+X!Idrivers/mtd/nand/nand_ecc.c
+-->
</chapter>
<chapter id="credits">
diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt
index 4f41a60..318df44 100644
--- a/Documentation/RCU/whatisRCU.txt
+++ b/Documentation/RCU/whatisRCU.txt
@@ -687,8 +687,9 @@ diff shows how closely related RCU and reader-writer locking can be.
+ spin_lock(&listmutex);
list_for_each_entry(p, head, lp) {
if (p->key == key) {
- list_del(&p->list);
+ - list_del(&p->list);
- write_unlock(&listmutex);
+ + list_del_rcu(&p->list);
+ spin_unlock(&listmutex);
+ synchronize_rcu();
kfree(p);
@@ -736,7 +737,7 @@ Or, for those who prefer a side-by-side listing:
5 write_lock(&listmutex); 5 spin_lock(&listmutex);
6 list_for_each_entry(p, head, lp) { 6 list_for_each_entry(p, head, lp) {
7 if (p->key == key) { 7 if (p->key == key) {
- 8 list_del(&p->list); 8 list_del(&p->list);
+ 8 list_del(&p->list); 8 list_del_rcu(&p->list);
9 write_unlock(&listmutex); 9 spin_unlock(&listmutex);
10 synchronize_rcu();
10 kfree(p); 11 kfree(p);
diff --git a/Documentation/SubmitChecklist b/Documentation/SubmitChecklist
index 8230098..a10bfb6 100644
--- a/Documentation/SubmitChecklist
+++ b/Documentation/SubmitChecklist
@@ -1,57 +1,63 @@
Linux Kernel patch sumbittal checklist
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Here are some basic things that developers should do if they
-want to see their kernel patch submittals accepted quicker.
+Here are some basic things that developers should do if they want to see their
+kernel patch submissions accepted more quickly.
-These are all above and beyond the documentation that is provided
-in Documentation/SubmittingPatches and elsewhere about submitting
-Linux kernel patches.
+These are all above and beyond the documentation that is provided in
+Documentation/SubmittingPatches and elsewhere regarding submitting Linux
+kernel patches.
-- Builds cleanly with applicable or modified CONFIG options =y, =m, and =n.
- No gcc warnings/errors, no linker warnings/errors.
+1: Builds cleanly with applicable or modified CONFIG options =y, =m, and
+ =n. No gcc warnings/errors, no linker warnings/errors.
-- Passes allnoconfig, allmodconfig
+2: Passes allnoconfig, allmodconfig
-- Builds on multiple CPU arch-es by using local cross-compile tools
- or something like PLM at OSDL.
+3: Builds on multiple CPU architectures by using local cross-compile tools
+ or something like PLM at OSDL.
-- ppc64 is a good architecture for cross-compilation checking because it
- tends to use `unsigned long' for 64-bit quantities.
+4: ppc64 is a good architecture for cross-compilation checking because it
+ tends to use `unsigned long' for 64-bit quantities.
-- Matches kernel coding style(!)
+5: Matches kernel coding style(!)
-- Any new or modified CONFIG options don't muck up the config menu.
+6: Any new or modified CONFIG options don't muck up the config menu.
-- All new Kconfig options have help text.
+7: All new Kconfig options have help text.
-- Has been carefully reviewed with respect to relevant Kconfig
- combinations. This is very hard to get right with testing --
- brainpower pays off here.
+8: Has been carefully reviewed with respect to relevant Kconfig
+ combinations. This is very hard to get right with testing -- brainpower
+ pays off here.
-- Check cleanly with sparse.
+9: Check cleanly with sparse.
-- Use 'make checkstack' and 'make namespacecheck' and fix any
- problems that they find. Note: checkstack does not point out
- problems explicitly, but any one function that uses more than
- 512 bytes on the stack is a candidate for change.
+10: Use 'make checkstack' and 'make namespacecheck' and fix any problems
+ that they find. Note: checkstack does not point out problems explicitly,
+ but any one function that uses more than 512 bytes on the stack is a
+ candidate for change.
-- Include kernel-doc to document global kernel APIs. (Not required
- for static functions, but OK there also.) Use 'make htmldocs'
- or 'make mandocs' to check the kernel-doc and fix any issues.
+11: Include kernel-doc to document global kernel APIs. (Not required for
+ static functions, but OK there also.) Use 'make htmldocs' or 'make
+ mandocs' to check the kernel-doc and fix any issues.
-- Has been tested with CONFIG_PREEMPT, CONFIG_DEBUG_PREEMPT,
- CONFIG_DEBUG_SLAB, CONFIG_DEBUG_PAGEALLOC, CONFIG_DEBUG_MUTEXES,
- CONFIG_DEBUG_SPINLOCK, CONFIG_DEBUG_SPINLOCK_SLEEP all simultaneously
- enabled.
+12: Has been tested with CONFIG_PREEMPT, CONFIG_DEBUG_PREEMPT,
+ CONFIG_DEBUG_SLAB, CONFIG_DEBUG_PAGEALLOC, CONFIG_DEBUG_MUTEXES,
+ CONFIG_DEBUG_SPINLOCK, CONFIG_DEBUG_SPINLOCK_SLEEP all simultaneously
+ enabled.
-- Has been build- and runtime tested with and without CONFIG_SMP and
- CONFIG_PREEMPT.
+13: Has been build- and runtime tested with and without CONFIG_SMP and
+ CONFIG_PREEMPT.
-- If the patch affects IO/Disk, etc: has been tested with and without
- CONFIG_LBD.
+14: If the patch affects IO/Disk, etc: has been tested with and without
+ CONFIG_LBD.
+15: All codepaths have been exercised with all lockdep features enabled.
-2006-APR-27
+16: All new /proc entries are documented under Documentation/
+
+17: All new kernel boot parameters are documented in
+ Documentation/kernel-parameters.txt.
+
+18: All new module parameters are documented with MODULE_PARM_DESC()
diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
index c2c85bc..d42ab4c 100644
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -10,7 +10,9 @@ kernel, the process can sometimes be daunting if you're not familiar
with "the system." This text is a collection of suggestions which
can greatly increase the chances of your change being accepted.
-If you are submitting a driver, also read Documentation/SubmittingDrivers.
+Read Documentation/SubmitChecklist for a list of items to check
+before submitting code. If you are submitting a driver, also read
+Documentation/SubmittingDrivers.
@@ -74,9 +76,6 @@ There are a number of scripts which can aid in this:
Quilt:
http://savannah.nongnu.org/projects/quilt
-Randy Dunlap's patch scripts:
-http://www.xenotime.net/linux/scripts/patching-scripts-002.tar.gz
-
Andrew Morton's patch scripts:
http://www.zip.com.au/~akpm/linux/patches/
Instead of these scripts, quilt is the recommended patch management
@@ -309,6 +308,8 @@ then you just add a line saying
Signed-off-by: Random J Developer <random@developer.example.org>
+using your real name (sorry, no pseudonyms or anonymous contributions.)
+
Some people also put extra tags at the end. They'll just be ignored for
now, but you can do this to mark internal company procedures or just
point out some special detail about the sign-off.
@@ -484,7 +485,7 @@ Greg Kroah-Hartman "How to piss off a kernel subsystem maintainer".
<http://www.kroah.com/log/2005/10/19/>
<http://www.kroah.com/log/2006/01/11/>
-NO!!!! No more huge patch bombs to linux-kernel@vger.kernel.org people!.
+NO!!!! No more huge patch bombs to linux-kernel@vger.kernel.org people!
<http://marc.theaimsgroup.com/?l=linux-kernel&m=112112749912944&w=2>
Kernel Documentation/CodingStyle
@@ -493,4 +494,3 @@ Kernel Documentation/CodingStyle
Linus Torvald's mail on the canonical patch format:
<http://lkml.org/lkml/2005/4/7/183>
--
-Last updated on 17 Nov 2005.
diff --git a/Documentation/accounting/delay-accounting.txt b/Documentation/accounting/delay-accounting.txt
new file mode 100644
index 0000000..1443cd7
--- /dev/null
+++ b/Documentation/accounting/delay-accounting.txt
@@ -0,0 +1,112 @@
+Delay accounting
+----------------
+
+Tasks encounter delays in execution when they wait
+for some kernel resource to become available e.g. a
+runnable task may wait for a free CPU to run on.
+
+The per-task delay accounting functionality measures
+the delays experienced by a task while
+
+a) waiting for a CPU (while being runnable)
+b) completion of synchronous block I/O initiated by the task
+c) swapping in pages
+
+and makes these statistics available to userspace through
+the taskstats interface.
+
+Such delays provide feedback for setting a task's cpu priority,
+io priority and rss limit values appropriately. Long delays for
+important tasks could be a trigger for raising its corresponding priority.
+
+The functionality, through its use of the taskstats interface, also provides
+delay statistics aggregated for all tasks (or threads) belonging to a
+thread group (corresponding to a traditional Unix process). This is a commonly
+needed aggregation that is more efficiently done by the kernel.
+
+Userspace utilities, particularly resource management applications, can also
+aggregate delay statistics into arbitrary groups. To enable this, delay
+statistics of a task are available both during its lifetime as well as on its
+exit, ensuring continuous and complete monitoring can be done.
+
+
+Interface
+---------
+
+Delay accounting uses the taskstats interface which is described
+in detail in a separate document in this directory. Taskstats returns a
+generic data structure to userspace corresponding to per-pid and per-tgid
+statistics. The delay accounting functionality populates specific fields of
+this structure. See
+ include/linux/taskstats.h
+for a description of the fields pertaining to delay accounting.
+It will generally be in the form of counters returning the cumulative
+delay seen for cpu, sync block I/O, swapin etc.
+
+Taking the difference of two successive readings of a given
+counter (say cpu_delay_total) for a task will give the delay
+experienced by the task waiting for the corresponding resource
+in that interval.
+
+When a task exits, records containing the per-task statistics
+are sent to userspace without requiring a command. If it is the last exiting
+task of a thread group, the per-tgid statistics are also sent. More details
+are given in the taskstats interface description.
+
+The getdelays.c userspace utility in this directory allows simple commands to
+be run and the corresponding delay statistics to be displayed. It also serves
+as an example of using the taskstats interface.
+
+Usage
+-----
+
+Compile the kernel with
+ CONFIG_TASK_DELAY_ACCT=y
+ CONFIG_TASKSTATS=y
+
+Delay accounting is enabled by default at boot up.
+To disable, add
+ nodelayacct
+to the kernel boot options. The rest of the instructions
+below assume this has not been done.
+
+After the system has booted up, use a utility
+similar to getdelays.c to access the delays
+seen by a given task or a task group (tgid).
+The utility also allows a given command to be
+executed and the corresponding delays to be
+seen.
+
+General format of the getdelays command
+
+getdelays [-t tgid] [-p pid] [-c cmd...]
+
+
+Get delays, since system boot, for pid 10
+# ./getdelays -p 10
+(output similar to next case)
+
+Get sum of delays, since system boot, for all pids with tgid 5
+# ./getdelays -t 5
+
+
+CPU count real total virtual total delay total
+ 7876 92005750 100000000 24001500
+IO count delay total
+ 0 0
+MEM count delay total
+ 0 0
+
+Get delays seen in executing a given simple command
+# ./getdelays -c ls /
+
+bin data1 data3 data5 dev home media opt root srv sys usr
+boot data2 data4 data6 etc lib mnt proc sbin subdomain tmp var
+
+
+CPU count real total virtual total delay total
+ 6 4000250 4000000 0
+IO count delay total
+ 0 0
+MEM count delay total
+ 0 0
diff --git a/Documentation/accounting/getdelays.c b/Documentation/accounting/getdelays.c
new file mode 100644
index 0000000..795ca39
--- /dev/null
+++ b/Documentation/accounting/getdelays.c
@@ -0,0 +1,396 @@
+/* getdelays.c
+ *
+ * Utility to get per-pid and per-tgid delay accounting statistics
+ * Also illustrates usage of the taskstats interface
+ *
+ * Copyright (C) Shailabh Nagar, IBM Corp. 2005
+ * Copyright (C) Balbir Singh, IBM Corp. 2006
+ * Copyright (c) Jay Lan, SGI. 2006
+ *
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <errno.h>
+#include <unistd.h>
+#include <poll.h>
+#include <string.h>
+#include <fcntl.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/socket.h>
+#include <sys/types.h>
+#include <signal.h>
+
+#include <linux/genetlink.h>
+#include <linux/taskstats.h>
+
+/*
+ * Generic macros for dealing with netlink sockets. Might be duplicated
+ * elsewhere. It is recommended that commercial grade applications use
+ * libnl or libnetlink and use the interfaces provided by the library
+ */
+#define GENLMSG_DATA(glh) ((void *)(NLMSG_DATA(glh) + GENL_HDRLEN))
+#define GENLMSG_PAYLOAD(glh) (NLMSG_PAYLOAD(glh, 0) - GENL_HDRLEN)
+#define NLA_DATA(na) ((void *)((char*)(na) + NLA_HDRLEN))
+#define NLA_PAYLOAD(len) (len - NLA_HDRLEN)
+
+#define err(code, fmt, arg...) do { printf(fmt, ##arg); exit(code); } while (0)
+int done = 0;
+int rcvbufsz=0;
+
+ char name[100];
+int dbg=0, print_delays=0;
+__u64 stime, utime;
+#define PRINTF(fmt, arg...) { \
+ if (dbg) { \
+ printf(fmt, ##arg); \
+ } \
+ }
+
+/* Maximum size of response requested or message sent */
+#define MAX_MSG_SIZE 256
+/* Maximum number of cpus expected to be specified in a cpumask */
+#define MAX_CPUS 32
+/* Maximum length of pathname to log file */
+#define MAX_FILENAME 256
+
+struct msgtemplate {
+ struct nlmsghdr n;
+ struct genlmsghdr g;
+ char buf[MAX_MSG_SIZE];
+};
+
+char cpumask[100+6*MAX_CPUS];
+
+/*
+ * Create a raw netlink socket and bind
+ */
+static int create_nl_socket(int protocol)
+{
+ int fd;
+ struct sockaddr_nl local;
+
+ fd = socket(AF_NETLINK, SOCK_RAW, protocol);
+ if (fd < 0)
+ return -1;
+
+ if (rcvbufsz)
+ if (setsockopt(fd, SOL_SOCKET, SO_RCVBUF,
+ &rcvbufsz, sizeof(rcvbufsz)) < 0) {
+ printf("Unable to set socket rcv buf size to %d\n",
+ rcvbufsz);
+ return -1;
+ }
+
+ memset(&local, 0, sizeof(local));
+ local.nl_family = AF_NETLINK;
+
+ if (bind(fd, (struct sockaddr *) &local, sizeof(local)) < 0)
+ goto error;
+
+ return fd;
+error:
+ close(fd);
+ return -1;
+}
+
+
+int send_cmd(int sd, __u16 nlmsg_type, __u32 nlmsg_pid,
+ __u8 genl_cmd, __u16 nla_type,
+ void *nla_data, int nla_len)
+{
+ struct nlattr *na;
+ struct sockaddr_nl nladdr;
+ int r, buflen;
+ char *buf;
+
+ struct msgtemplate msg;
+
+ msg.n.nlmsg_len = NLMSG_LENGTH(GENL_HDRLEN);
+ msg.n.nlmsg_type = nlmsg_type;
+ msg.n.nlmsg_flags = NLM_F_REQUEST;
+ msg.n.nlmsg_seq = 0;
+ msg.n.nlmsg_pid = nlmsg_pid;
+ msg.g.cmd = genl_cmd;
+ msg.g.version = 0x1;
+ na = (struct nlattr *) GENLMSG_DATA(&msg);
+ na->nla_type = nla_type;
+ na->nla_len = nla_len + 1 + NLA_HDRLEN;
+ memcpy(NLA_DATA(na), nla_data, nla_len);
+ msg.n.nlmsg_len += NLMSG_ALIGN(na->nla_len);
+
+ buf = (char *) &msg;
+ buflen = msg.n.nlmsg_len ;
+ memset(&nladdr, 0, sizeof(nladdr));
+ nladdr.nl_family = AF_NETLINK;
+ while ((r = sendto(sd, buf, buflen, 0, (struct sockaddr *) &nladdr,
+ sizeof(nladdr))) < buflen) {
+ if (r > 0) {
+ buf += r;
+ buflen -= r;
+ } else if (errno != EAGAIN)
+ return -1;
+ }
+ return 0;
+}
+
+
+/*
+ * Probe the controller in genetlink to find the family id
+ * for the TASKSTATS family
+ */
+int get_family_id(int sd)
+{
+ struct {
+ struct nlmsghdr n;
+ struct genlmsghdr g;
+ char buf[256];
+ } ans;
+
+ int id, rc;
+ struct nlattr *na;
+ int rep_len;
+
+ strcpy(name, TASKSTATS_GENL_NAME);
+ rc = send_cmd(sd, GENL_ID_CTRL, getpid(), CTRL_CMD_GETFAMILY,
+ CTRL_ATTR_FAMILY_NAME, (void *)name,
+ strlen(TASKSTATS_GENL_NAME)+1);
+
+ rep_len = recv(sd, &ans, sizeof(ans), 0);
+ if (ans.n.nlmsg_type == NLMSG_ERROR ||
+ (rep_len < 0) || !NLMSG_OK((&ans.n), rep_len))
+ return 0;
+
+ na = (struct nlattr *) GENLMSG_DATA(&ans);
+ na = (struct nlattr *) ((char *) na + NLA_ALIGN(na->nla_len));
+ if (na->nla_type == CTRL_ATTR_FAMILY_ID) {
+ id = *(__u16 *) NLA_DATA(na);
+ }
+ return id;
+}
+
+void print_delayacct(struct taskstats *t)
+{
+ printf("\n\nCPU %15s%15s%15s%15s\n"
+ " %15llu%15llu%15llu%15llu\n"
+ "IO %15s%15s\n"
+ " %15llu%15llu\n"
+ "MEM %15s%15s\n"
+ " %15llu%15llu\n\n",
+ "count", "real total", "virtual total", "delay total",
+ t->cpu_count, t->cpu_run_real_total, t->cpu_run_virtual_total,
+ t->cpu_delay_total,
+ "count", "delay total",
+ t->blkio_count, t->blkio_delay_total,
+ "count", "delay total", t->swapin_count, t->swapin_delay_total);
+}
+
+int main(int argc, char *argv[])
+{
+ int c, rc, rep_len, aggr_len, len2, cmd_type;
+ __u16 id;
+ __u32 mypid;
+
+ struct nlattr *na;
+ int nl_sd = -1;
+ int len = 0;
+ pid_t tid = 0;
+ pid_t rtid = 0;
+
+ int fd = 0;
+ int count = 0;
+ int write_file = 0;
+ int maskset = 0;
+ char logfile[128];
+ int loop = 0;
+
+ struct msgtemplate msg;
+
+ while (1) {
+ c = getopt(argc, argv, "dw:r:m:t:p:v:l");
+ if (c < 0)
+ break;
+
+ switch (c) {
+ case 'd':
+ printf("print delayacct stats ON\n");
+ print_delays = 1;
+ break;
+ case 'w':
+ strncpy(logfile, optarg, MAX_FILENAME);
+ printf("write to file %s\n", logfile);
+ write_file = 1;
+ break;
+ case 'r':
+ rcvbufsz = atoi(optarg);
+ printf("receive buf size %d\n", rcvbufsz);
+ if (rcvbufsz < 0)
+ err(1, "Invalid rcv buf size\n");
+ break;
+ case 'm':
+ strncpy(cpumask, optarg, sizeof(cpumask));
+ maskset = 1;
+ printf("cpumask %s maskset %d\n", cpumask, maskset);
+ break;
+ case 't':
+ tid = atoi(optarg);
+ if (!tid)
+ err(1, "Invalid tgid\n");
+ cmd_type = TASKSTATS_CMD_ATTR_TGID;
+ print_delays = 1;
+ break;
+ case 'p':
+ tid = atoi(optarg);
+ if (!tid)
+ err(1, "Invalid pid\n");
+ cmd_type = TASKSTATS_CMD_ATTR_PID;
+ print_delays = 1;
+ break;
+ case 'v':
+ printf("debug on\n");
+ dbg = 1;
+ break;
+ case 'l':
+ printf("listen forever\n");
+ loop = 1;
+ break;
+ default:
+ printf("Unknown option %d\n", c);
+ exit(-1);
+ }
+ }
+
+ if (write_file) {
+ fd = open(logfile, O_WRONLY | O_CREAT | O_TRUNC,
+ S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH);
+ if (fd == -1) {
+ perror("Cannot open output file\n");
+ exit(1);
+ }
+ }
+
+ if ((nl_sd = create_nl_socket(NETLINK_GENERIC)) < 0)
+ err(1, "error creating Netlink socket\n");
+
+
+ mypid = getpid();
+ id = get_family_id(nl_sd);
+ if (!id) {
+ printf("Error getting family id, errno %d", errno);
+ goto err;
+ }
+ PRINTF("family id %d\n", id);
+
+ if (maskset) {
+ rc = send_cmd(nl_sd, id, mypid, TASKSTATS_CMD_GET,
+ TASKSTATS_CMD_ATTR_REGISTER_CPUMASK,
+ &cpumask, sizeof(cpumask));
+ PRINTF("Sent register cpumask, retval %d\n", rc);
+ if (rc < 0) {
+ printf("error sending register cpumask\n");
+ goto err;
+ }
+ }
+
+ if (tid) {
+ rc = send_cmd(nl_sd, id, mypid, TASKSTATS_CMD_GET,
+ cmd_type, &tid, sizeof(__u32));
+ PRINTF("Sent pid/tgid, retval %d\n", rc);
+ if (rc < 0) {
+ printf("error sending tid/tgid cmd\n");
+ goto done;
+ }
+ }
+
+ do {
+ int i;
+
+ rep_len = recv(nl_sd, &msg, sizeof(msg), 0);
+ PRINTF("received %d bytes\n", rep_len);
+
+ if (rep_len < 0) {
+ printf("nonfatal reply error: errno %d\n", errno);
+ continue;
+ }
+ if (msg.n.nlmsg_type == NLMSG_ERROR ||
+ !NLMSG_OK((&msg.n), rep_len)) {
+ printf("fatal reply error, errno %d\n", errno);
+ goto done;
+ }
+
+ PRINTF("nlmsghdr size=%d, nlmsg_len=%d, rep_len=%d\n",
+ sizeof(struct nlmsghdr), msg.n.nlmsg_len, rep_len);
+
+
+ rep_len = GENLMSG_PAYLOAD(&msg.n);
+
+ na = (struct nlattr *) GENLMSG_DATA(&msg);
+ len = 0;
+ i = 0;
+ while (len < rep_len) {
+ len += NLA_ALIGN(na->nla_len);
+ switch (na->nla_type) {
+ case TASKSTATS_TYPE_AGGR_TGID:
+ /* Fall through */
+ case TASKSTATS_TYPE_AGGR_PID:
+ aggr_len = NLA_PAYLOAD(na->nla_len);
+ len2 = 0;
+ /* For nested attributes, na follows */
+ na = (struct nlattr *) NLA_DATA(na);
+ done = 0;
+ while (len2 < aggr_len) {
+ switch (na->nla_type) {
+ case TASKSTATS_TYPE_PID:
+ rtid = *(int *) NLA_DATA(na);
+ if (print_delays)
+ printf("PID\t%d\n", rtid);
+ break;
+ case TASKSTATS_TYPE_TGID:
+ rtid = *(int *) NLA_DATA(na);
+ if (print_delays)
+ printf("TGID\t%d\n", rtid);
+ break;
+ case TASKSTATS_TYPE_STATS:
+ count++;
+ if (print_delays)
+ print_delayacct((struct taskstats *) NLA_DATA(na));
+ if (fd) {
+ if (write(fd, NLA_DATA(na), na->nla_len) < 0) {
+ err(1,"write error\n");
+ }
+ }
+ if (!loop)
+ goto done;
+ break;
+ default:
+ printf("Unknown nested nla_type %d\n", na->nla_type);
+ break;
+ }
+ len2 += NLA_ALIGN(na->nla_len);
+ na = (struct nlattr *) ((char *) na + len2);
+ }
+ break;
+
+ default:
+ printf("Unknown nla_type %d\n", na->nla_type);
+ break;
+ }
+ na = (struct nlattr *) (GENLMSG_DATA(&msg) + len);
+ }
+ } while (loop);
+done:
+ if (maskset) {
+ rc = send_cmd(nl_sd, id, mypid, TASKSTATS_CMD_GET,
+ TASKSTATS_CMD_ATTR_DEREGISTER_CPUMASK,
+ &cpumask, sizeof(cpumask));
+ printf("Sent deregister mask, retval %d\n", rc);
+ if (rc < 0)
+ err(rc, "error sending deregister cpumask\n");
+ }
+err:
+ close(nl_sd);
+ if (fd)
+ close(fd);
+ return 0;
+}
diff --git a/Documentation/accounting/taskstats.txt b/Documentation/accounting/taskstats.txt
new file mode 100644
index 0000000..92ebf29
--- /dev/null
+++ b/Documentation/accounting/taskstats.txt
@@ -0,0 +1,181 @@
+Per-task statistics interface
+-----------------------------
+
+
+Taskstats is a netlink-based interface for sending per-task and
+per-process statistics from the kernel to userspace.
+
+Taskstats was designed for the following benefits:
+
+- efficiently provide statistics during lifetime of a task and on its exit
+- unified interface for multiple accounting subsystems
+- extensibility for use by future accounting patches
+
+Terminology
+-----------
+
+"pid", "tid" and "task" are used interchangeably and refer to the standard
+Linux task defined by struct task_struct. per-pid stats are the same as
+per-task stats.
+
+"tgid", "process" and "thread group" are used interchangeably and refer to the
+tasks that share an mm_struct i.e. the traditional Unix process. Despite the
+use of tgid, there is no special treatment for the task that is thread group
+leader - a process is deemed alive as long as it has any task belonging to it.
+
+Usage
+-----
+
+To get statistics during a task's lifetime, userspace opens a unicast netlink
+socket (NETLINK_GENERIC family) and sends commands specifying a pid or a tgid.
+The response contains statistics for a task (if pid is specified) or the sum of
+statistics for all tasks of the process (if tgid is specified).
+
+To obtain statistics for tasks which are exiting, the userspace listener
+sends a register command and specifies a cpumask. Whenever a task exits on
+one of the cpus in the cpumask, its per-pid statistics are sent to the
+registered listener. Using cpumasks allows the data received by one listener
+to be limited and assists in flow control over the netlink interface and is
+explained in more detail below.
+
+If the exiting task is the last thread exiting its thread group,
+an additional record containing the per-tgid stats is also sent to userspace.
+The latter contains the sum of per-pid stats for all threads in the thread
+group, both past and present.
+
+getdelays.c is a simple utility demonstrating usage of the taskstats interface
+for reporting delay accounting statistics. Users can register cpumasks,
+send commands and process responses, listen for per-tid/tgid exit data,
+write the data received to a file and do basic flow control by increasing
+receive buffer sizes.
+
+Interface
+---------
+
+The user-kernel interface is encapsulated in include/linux/taskstats.h
+
+To avoid this documentation becoming obsolete as the interface evolves, only
+an outline of the current version is given. taskstats.h always overrides the
+description here.
+
+struct taskstats is the common accounting structure for both per-pid and
+per-tgid data. It is versioned and can be extended by each accounting subsystem
+that is added to the kernel. The fields and their semantics are defined in the
+taskstats.h file.
+
+The data exchanged between user and kernel space is a netlink message belonging
+to the NETLINK_GENERIC family and using the netlink attributes interface.
+The messages are in the format
+
+ +----------+- - -+-------------+-------------------+
+ | nlmsghdr | Pad | genlmsghdr | taskstats payload |
+ +----------+- - -+-------------+-------------------+
+
+
+The taskstats payload is one of the following three kinds:
+
+1. Commands: Sent from user to kernel. Commands to get data on
+a pid/tgid consist of one attribute, of type TASKSTATS_CMD_ATTR_PID/TGID,
+containing a u32 pid or tgid in the attribute payload. The pid/tgid denotes
+the task/process for which userspace wants statistics.
+
+Commands to register/deregister interest in exit data from a set of cpus
+consist of one attribute, of type
+TASKSTATS_CMD_ATTR_REGISTER/DEREGISTER_CPUMASK and contain a cpumask in the
+attribute payload. The cpumask is specified as an ascii string of
+comma-separated cpu ranges e.g. to listen to exit data from cpus 1,2,3,5,7,8
+the cpumask would be "1-3,5,7-8". If userspace forgets to deregister interest
+in cpus before closing the listening socket, the kernel cleans up its interest
+set over time. However, for the sake of efficiency, an explicit deregistration
+is advisable.
+
+2. Response for a command: sent from the kernel in response to a userspace
+command. The payload is a series of three attributes of type:
+
+a) TASKSTATS_TYPE_AGGR_PID/TGID : attribute containing no payload but indicates
+a pid/tgid will be followed by some stats.
+
+b) TASKSTATS_TYPE_PID/TGID: attribute whose payload is the pid/tgid whose stats
+is being returned.
+
+c) TASKSTATS_TYPE_STATS: attribute with a struct taskstsats as payload. The
+same structure is used for both per-pid and per-tgid stats.
+
+3. New message sent by kernel whenever a task exits. The payload consists of a
+ series of attributes of the following type:
+
+a) TASKSTATS_TYPE_AGGR_PID: indicates next two attributes will be pid+stats
+b) TASKSTATS_TYPE_PID: contains exiting task's pid
+c) TASKSTATS_TYPE_STATS: contains the exiting task's per-pid stats
+d) TASKSTATS_TYPE_AGGR_TGID: indicates next two attributes will be tgid+stats
+e) TASKSTATS_TYPE_TGID: contains tgid of process to which task belongs
+f) TASKSTATS_TYPE_STATS: contains the per-tgid stats for exiting task's process
+
+
+per-tgid stats
+--------------
+
+Taskstats provides per-process stats, in addition to per-task stats, since
+resource management is often done at a process granularity and aggregating task
+stats in userspace alone is inefficient and potentially inaccurate (due to lack
+of atomicity).
+
+However, maintaining per-process, in addition to per-task stats, within the
+kernel has space and time overheads. To address this, the taskstats code
+accumalates each exiting task's statistics into a process-wide data structure.
+When the last task of a process exits, the process level data accumalated also
+gets sent to userspace (along with the per-task data).
+
+When a user queries to get per-tgid data, the sum of all other live threads in
+the group is added up and added to the accumalated total for previously exited
+threads of the same thread group.
+
+Extending taskstats
+-------------------
+
+There are two ways to extend the taskstats interface to export more
+per-task/process stats as patches to collect them get added to the kernel
+in future:
+
+1. Adding more fields to the end of the existing struct taskstats. Backward
+ compatibility is ensured by the version number within the
+ structure. Userspace will use only the fields of the struct that correspond
+ to the version its using.
+
+2. Defining separate statistic structs and using the netlink attributes
+ interface to return them. Since userspace processes each netlink attribute
+ independently, it can always ignore attributes whose type it does not
+ understand (because it is using an older version of the interface).
+
+
+Choosing between 1. and 2. is a matter of trading off flexibility and
+overhead. If only a few fields need to be added, then 1. is the preferable
+path since the kernel and userspace don't need to incur the overhead of
+processing new netlink attributes. But if the new fields expand the existing
+struct too much, requiring disparate userspace accounting utilities to
+unnecessarily receive large structures whose fields are of no interest, then
+extending the attributes structure would be worthwhile.
+
+Flow control for taskstats
+--------------------------
+
+When the rate of task exits becomes large, a listener may not be able to keep
+up with the kernel's rate of sending per-tid/tgid exit data leading to data
+loss. This possibility gets compounded when the taskstats structure gets
+extended and the number of cpus grows large.
+
+To avoid losing statistics, userspace should do one or more of the following:
+
+- increase the receive buffer sizes for the netlink sockets opened by
+listeners to receive exit data.
+
+- create more listeners and reduce the number of cpus being listened to by
+each listener. In the extreme case, there could be one listener for each cpu.
+Users may also consider setting the cpu affinity of the listener to the subset
+of cpus to which it listens, especially if they are listening to just one cpu.
+
+Despite these measures, if the userspace receives ENOBUFS error messages
+indicated overflow of receive buffers, it should take measures to handle the
+loss of data.
+
+----
diff --git a/Documentation/cciss.txt b/Documentation/cciss.txt
index 1537842..9c629ff 100644
--- a/Documentation/cciss.txt
+++ b/Documentation/cciss.txt
@@ -20,6 +20,7 @@ This driver is known to work with the following cards:
* SA P400i
* SA E200
* SA E200i
+ * SA E500
If nodes are not already created in the /dev/cciss directory, run as root:
diff --git a/Documentation/connector/ucon.c b/Documentation/connector/ucon.c
new file mode 100644
index 0000000..d738cde
--- /dev/null
+++ b/Documentation/connector/ucon.c
@@ -0,0 +1,206 @@
+/*
+ * ucon.c
+ *
+ * Copyright (c) 2004+ Evgeniy Polyakov <johnpol@2ka.mipt.ru>
+ *
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#include <asm/types.h>
+
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/poll.h>
+
+#include <linux/netlink.h>
+#include <linux/rtnetlink.h>
+
+#include <arpa/inet.h>
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <string.h>
+#include <errno.h>
+#include <time.h>
+
+#include <linux/connector.h>
+
+#define DEBUG
+#define NETLINK_CONNECTOR 11
+
+#ifdef DEBUG
+#define ulog(f, a...) fprintf(stdout, f, ##a)
+#else
+#define ulog(f, a...) do {} while (0)
+#endif
+
+static int need_exit;
+static __u32 seq;
+
+static int netlink_send(int s, struct cn_msg *msg)
+{
+ struct nlmsghdr *nlh;
+ unsigned int size;
+ int err;
+ char buf[128];
+ struct cn_msg *m;
+
+ size = NLMSG_SPACE(sizeof(struct cn_msg) + msg->len);
+
+ nlh = (struct nlmsghdr *)buf;
+ nlh->nlmsg_seq = seq++;
+ nlh->nlmsg_pid = getpid();
+ nlh->nlmsg_type = NLMSG_DONE;
+ nlh->nlmsg_len = NLMSG_LENGTH(size - sizeof(*nlh));
+ nlh->nlmsg_flags = 0;
+
+ m = NLMSG_DATA(nlh);
+#if 0
+ ulog("%s: [%08x.%08x] len=%u, seq=%u, ack=%u.\n",
+ __func__, msg->id.idx, msg->id.val, msg->len, msg->seq, msg->ack);
+#endif
+ memcpy(m, msg, sizeof(*m) + msg->len);
+
+ err = send(s, nlh, size, 0);
+ if (err == -1)
+ ulog("Failed to send: %s [%d].\n",
+ strerror(errno), errno);
+
+ return err;
+}
+
+int main(int argc, char *argv[])
+{
+ int s;
+ char buf[1024];
+ int len;
+ struct nlmsghdr *reply;
+ struct sockaddr_nl l_local;
+ struct cn_msg *data;
+ FILE *out;
+ time_t tm;
+ struct pollfd pfd;
+
+ if (argc < 2)
+ out = stdout;
+ else {
+ out = fopen(argv[1], "a+");
+ if (!out) {
+ ulog("Unable to open %s for writing: %s\n",
+ argv[1], strerror(errno));
+ out = stdout;
+ }
+ }
+
+ memset(buf, 0, sizeof(buf));
+
+ s = socket(PF_NETLINK, SOCK_DGRAM, NETLINK_CONNECTOR);
+ if (s == -1) {
+ perror("socket");
+ return -1;
+ }
+
+ l_local.nl_family = AF_NETLINK;
+ l_local.nl_groups = 0x123; /* bitmask of requested groups */
+ l_local.nl_pid = 0;
+
+ if (bind(s, (struct sockaddr *)&l_local, sizeof(struct sockaddr_nl)) == -1) {
+ perror("bind");
+ close(s);
+ return -1;
+ }
+
+#if 0
+ {
+ int on = 0x57; /* Additional group number */
+ setsockopt(s, SOL_NETLINK, NETLINK_ADD_MEMBERSHIP, &on, sizeof(on));
+ }
+#endif
+ if (0) {
+ int i, j;
+
+ memset(buf, 0, sizeof(buf));
+
+ data = (struct cn_msg *)buf;
+
+ data->id.idx = 0x123;
+ data->id.val = 0x456;
+ data->seq = seq++;
+ data->ack = 0;
+ data->len = 0;
+
+ for (j=0; j<10; ++j) {
+ for (i=0; i<1000; ++i) {
+ len = netlink_send(s, data);
+ }
+
+ ulog("%d messages have been sent to %08x.%08x.\n", i, data->id.idx, data->id.val);
+ }
+
+ return 0;
+ }
+
+
+ pfd.fd = s;
+
+ while (!need_exit) {
+ pfd.events = POLLIN;
+ pfd.revents = 0;
+ switch (poll(&pfd, 1, -1)) {
+ case 0:
+ need_exit = 1;
+ break;
+ case -1:
+ if (errno != EINTR) {
+ need_exit = 1;
+ break;
+ }
+ continue;
+ }
+ if (need_exit)
+ break;
+
+ memset(buf, 0, sizeof(buf));
+ len = recv(s, buf, sizeof(buf), 0);
+ if (len == -1) {
+ perror("recv buf");
+ close(s);
+ return -1;
+ }
+ reply = (struct nlmsghdr *)buf;
+
+ switch (reply->nlmsg_type) {
+ case NLMSG_ERROR:
+ fprintf(out, "Error message received.\n");
+ fflush(out);
+ break;
+ case NLMSG_DONE:
+ data = (struct cn_msg *)NLMSG_DATA(reply);
+
+ time(&tm);
+ fprintf(out, "%.24s : [%x.%x] [%08u.%08u].\n",
+ ctime(&tm), data->id.idx, data->id.val, data->seq, data->ack);
+ fflush(out);
+ break;
+ default:
+ break;
+ }
+ }
+
+ close(s);
+ return 0;
+}
diff --git a/Documentation/cpu-freq/user-guide.txt b/Documentation/cpu-freq/user-guide.txt
index 7fedc00..555c8cf 100644
--- a/Documentation/cpu-freq/user-guide.txt
+++ b/Documentation/cpu-freq/user-guide.txt
@@ -153,10 +153,13 @@ scaling_governor, and by "echoing" the name of another
that some governors won't load - they only
work on some specific architectures or
processors.
-scaling_min_freq and
+scaling_min_freq and
scaling_max_freq show the current "policy limits" (in
kHz). By echoing new values into these
files, you can change these limits.
+ NOTE: when setting a policy you need to
+ first set scaling_max_freq, then
+ scaling_min_freq.
If you have selected the "userspace" governor which allows you to
diff --git a/Documentation/cpu-hotplug.txt b/Documentation/cpu-hotplug.txt
index 1bcf699..bc107cb 100644
--- a/Documentation/cpu-hotplug.txt
+++ b/Documentation/cpu-hotplug.txt
@@ -251,16 +251,24 @@ A: This is what you would need in your kernel code to receive notifications.
return NOTIFY_OK;
}
- static struct notifier_block foobar_cpu_notifer =
+ static struct notifier_block __cpuinitdata foobar_cpu_notifer =
{
.notifier_call = foobar_cpu_callback,
};
+You need to call register_cpu_notifier() from your init function.
+Init functions could be of two types:
+1. early init (init function called when only the boot processor is online).
+2. late init (init function called _after_ all the CPUs are online).
-In your init function,
+For the first case, you should add the following to your init function
register_cpu_notifier(&foobar_cpu_notifier);
+For the second case, you should add the following to your init function
+
+ register_hotcpu_notifier(&foobar_cpu_notifier);
+
You can fail PREPARE notifiers if something doesn't work to prepare resources.
This will stop the activity and send a following CANCELED event back.
diff --git a/Documentation/cpusets.txt b/Documentation/cpusets.txt
index 159e2a0..76b4429 100644
--- a/Documentation/cpusets.txt
+++ b/Documentation/cpusets.txt
@@ -217,6 +217,12 @@ exclusive cpuset. Also, the use of a Linux virtual file system (vfs)
to represent the cpuset hierarchy provides for a familiar permission
and name space for cpusets, with a minimum of additional kernel code.
+The cpus file in the root (top_cpuset) cpuset is read-only.
+It automatically tracks the value of cpu_online_map, using a CPU
+hotplug notifier. If and when memory nodes can be hotplugged,
+we expect to make the mems file in the root cpuset read-only
+as well, and have it track the value of node_online_map.
+
1.4 What are exclusive cpusets ?
--------------------------------
diff --git a/Documentation/devices.txt b/Documentation/devices.txt
index 4aaf68f..66c725f 100644
--- a/Documentation/devices.txt
+++ b/Documentation/devices.txt
@@ -2565,10 +2565,10 @@ Your cooperation is appreciated.
243 = /dev/usb/dabusb3 Fourth dabusb device
180 block USB block devices
- 0 = /dev/uba First USB block device
- 8 = /dev/ubb Second USB block device
- 16 = /dev/ubc Thrid USB block device
- ...
+ 0 = /dev/uba First USB block device
+ 8 = /dev/ubb Second USB block device
+ 16 = /dev/ubc Third USB block device
+ ...
181 char Conrad Electronic parallel port radio clocks
0 = /dev/pcfclock0 First Conrad radio clock
diff --git a/Documentation/drivers/edac/edac.txt b/Documentation/drivers/edac/edac.txt
index 70d96a6..7b3d969 100644
--- a/Documentation/drivers/edac/edac.txt
+++ b/Documentation/drivers/edac/edac.txt
@@ -35,15 +35,14 @@ the vendor should tie the parity status bits to 0 if they do not intend
to generate parity. Some vendors do not do this, and thus the parity bit
can "float" giving false positives.
-The PCI Parity EDAC device has the ability to "skip" known flaky
-cards during the parity scan. These are set by the parity "blacklist"
-interface in the sysfs for PCI Parity. (See the PCI section in the sysfs
-section below.) There is also a parity "whitelist" which is used as
-an explicit list of devices to scan, while the blacklist is a list
-of devices to skip.
+[There are patches in the kernel queue which will allow for storage of
+quirks of PCI devices reporting false parity positives. The 2.6.18
+kernel should have those patches included. When that becomes available,
+then EDAC will be patched to utilize that information to "skip" such
+devices.]
-EDAC will have future error detectors that will be added or integrated
-into EDAC in the following list:
+EDAC will have future error detectors that will be integrated with
+EDAC or added to it, in the following list:
MCE Machine Check Exception
MCA Machine Check Architecture
@@ -93,22 +92,24 @@ EDAC lives in the /sys/devices/system/edac directory. Within this directory
there currently reside 2 'edac' components:
mc memory controller(s) system
- pci PCI status system
+ pci PCI control and status system
============================================================================
Memory Controller (mc) Model
First a background on the memory controller's model abstracted in EDAC.
-Each mc device controls a set of DIMM memory modules. These modules are
+Each 'mc' device controls a set of DIMM memory modules. These modules are
laid out in a Chip-Select Row (csrowX) and Channel table (chX). There can
-be multiple csrows and two channels.
+be multiple csrows and multiple channels.
Memory controllers allow for several csrows, with 8 csrows being a typical value.
Yet, the actual number of csrows depends on the electrical "loading"
of a given motherboard, memory controller and DIMM characteristics.
Dual channels allows for 128 bit data transfers to the CPU from memory.
+Some newer chipsets allow for more than 2 channels, like Fully Buffered DIMMs
+(FB-DIMMs). The following example will assume 2 channels:
Channel 0 Channel 1
@@ -234,23 +235,15 @@ Polling period control file:
The time period, in milliseconds, for polling for error information.
Too small a value wastes resources. Too large a value might delay
necessary handling of errors and might loose valuable information for
- locating the error. 1000 milliseconds (once each second) is about
- right for most uses.
+ locating the error. 1000 milliseconds (once each second) is the current
+ default. Systems which require all the bandwidth they can get, may
+ increase this.
LOAD TIME: module/kernel parameter: poll_msec=[0|1]
RUN TIME: echo "1000" >/sys/devices/system/edac/mc/poll_msec
-Module Version read-only attribute file:
-
- 'mc_version'
-
- The EDAC CORE module's version and compile date are shown here to
- indicate what EDAC is running.
-
-
-
============================================================================
'mcX' DIRECTORIES
@@ -284,35 +277,6 @@ Seconds since last counter reset control file:
-DIMM capability attribute file:
-
- 'edac_capability'
-
- The EDAC (Error Detection and Correction) capabilities/modes of
- the memory controller hardware.
-
-
-DIMM Current Capability attribute file:
-
- 'edac_current_capability'
-
- The EDAC capabilities available with the hardware
- configuration. This may not be the same as "EDAC capability"
- if the correct memory is not used. If a memory controller is
- capable of EDAC, but DIMMs without check bits are in use, then
- Parity, SECDED, S4ECD4ED capabilities will not be available
- even though the memory controller might be capable of those
- modes with the proper memory loaded.
-
-
-Memory Type supported on this controller attribute file:
-
- 'supported_mem_type'
-
- This attribute file displays the memory type, usually
- buffered and unbuffered DIMMs.
-
-
Memory Controller name attribute file:
'mc_name'
@@ -321,16 +285,6 @@ Memory Controller name attribute file:
that is being utilized.
-Memory Controller Module name attribute file:
-
- 'module_name'
-
- This attribute file displays the memory controller module name,
- version and date built. The name of the memory controller
- hardware - some drivers work with multiple controllers and
- this field shows which hardware is present.
-
-
Total memory managed by this memory controller attribute file:
'size_mb'
@@ -432,6 +386,9 @@ Memory Type attribute file:
This attribute file will display what type of memory is currently
on this csrow. Normally, either buffered or unbuffered memory.
+ Examples:
+ Registered-DDR
+ Unbuffered-DDR
EDAC Mode of operation attribute file:
@@ -446,8 +403,13 @@ Device type attribute file:
'dev_type'
- This attribute file will display what type of DIMM device is
- being utilized. Example: x4
+ This attribute file will display what type of DRAM device is
+ being utilized on this DIMM.
+ Examples:
+ x1
+ x2
+ x4
+ x8
Channel 0 CE Count attribute file:
@@ -522,10 +484,10 @@ SYSTEM LOGGING
If logging for UEs and CEs are enabled then system logs will have
error notices indicating errors that have been detected:
-MC0: CE page 0x283, offset 0xce0, grain 8, syndrome 0x6ec3, row 0,
+EDAC MC0: CE page 0x283, offset 0xce0, grain 8, syndrome 0x6ec3, row 0,
channel 1 "DIMM_B1": amd76x_edac
-MC0: CE page 0x1e5, offset 0xfb0, grain 8, syndrome 0xb741, row 0,
+EDAC MC0: CE page 0x1e5, offset 0xfb0, grain 8, syndrome 0xb741, row 0,
channel 1 "DIMM_B1": amd76x_edac
@@ -610,64 +572,4 @@ Parity Count:
-PCI Device Whitelist:
-
- 'pci_parity_whitelist'
-
- This control file allows for an explicit list of PCI devices to be
- scanned for parity errors. Only devices found on this list will
- be examined. The list is a line of hexadecimal VENDOR and DEVICE
- ID tuples:
-
- 1022:7450,1434:16a6
-
- One or more can be inserted, separated by a comma.
-
- To write the above list doing the following as one command line:
-
- echo "1022:7450,1434:16a6"
- > /sys/devices/system/edac/pci/pci_parity_whitelist
-
-
-
- To display what the whitelist is, simply 'cat' the same file.
-
-
-PCI Device Blacklist:
-
- 'pci_parity_blacklist'
-
- This control file allows for a list of PCI devices to be
- skipped for scanning.
- The list is a line of hexadecimal VENDOR and DEVICE ID tuples:
-
- 1022:7450,1434:16a6
-
- One or more can be inserted, separated by a comma.
-
- To write the above list doing the following as one command line:
-
- echo "1022:7450,1434:16a6"
- > /sys/devices/system/edac/pci/pci_parity_blacklist
-
-
- To display what the whitelist currently contains,
- simply 'cat' the same file.
-
=======================================================================
-
-PCI Vendor and Devices IDs can be obtained with the lspci command. Using
-the -n option lspci will display the vendor and device IDs. The system
-administrator will have to determine which devices should be scanned or
-skipped.
-
-
-
-The two lists (white and black) are prioritized. blacklist is the lower
-priority and will NOT be utilized when a whitelist has been set.
-Turn OFF a whitelist by an empty echo command:
-
- echo > /sys/devices/system/edac/pci/pci_parity_whitelist
-
-and any previous blacklist will be utilized.
-
diff --git a/Documentation/fb/imacfb.txt b/Documentation/fb/imacfb.txt
new file mode 100644
index 0000000..7590285
--- /dev/null
+++ b/Documentation/fb/imacfb.txt
@@ -0,0 +1,31 @@
+
+What is imacfb?
+===============
+
+This is a generic EFI platform driver for Intel based Apple computers.
+Imacfb is only for EFI booted Intel Macs.
+
+Supported Hardware
+==================
+
+iMac 17"/20"
+Macbook
+Macbook Pro 15"/17"
+MacMini
+
+How to use it?
+==============
+
+Imacfb does not have any kind of autodetection of your machine.
+You have to add the fillowing kernel parameters in your elilo.conf:
+ Macbook :
+ video=imacfb:macbook
+ MacMini :
+ video=imacfb:mini
+ Macbook Pro 15", iMac 17" :
+ video=imacfb:i17
+ Macbook Pro 17", iMac 20" :
+ video=imacfb:i20
+
+--
+Edgar Hucek <gimli@dark-green.com>
diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index 99f219a..d1cd5f9 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -55,14 +55,6 @@ Who: Mauro Carvalho Chehab <mchehab@brturbo.com.br>
---------------------------
-What: remove EXPORT_SYMBOL(insert_resource)
-When: April 2006
-Files: kernel/resource.c
-Why: No modular usage in the kernel.
-Who: Adrian Bunk <bunk@stusta.de>
-
----------------------------
-
What: PCMCIA control ioctl (needed for pcmcia-cs [cardmgr, cardctl])
When: November 2005
Files: drivers/pcmcia/: pcmcia_ioctl.c
@@ -128,6 +120,13 @@ Who: Adrian Bunk <bunk@stusta.de>
---------------------------
+What: drivers depending on OSS_OBSOLETE_DRIVER
+When: options in 2.6.20, code in 2.6.22
+Why: OSS drivers with ALSA replacements
+Who: Adrian Bunk <bunk@stusta.de>
+
+---------------------------
+
What: pci_module_init(driver)
When: January 2007
Why: Is replaced by pci_register_driver(pci_driver).
@@ -166,17 +165,6 @@ Who: Arjan van de Ven <arjan@linux.intel.com>
---------------------------
-What: remove EXPORT_SYMBOL(tasklist_lock)
-When: August 2006
-Files: kernel/fork.c
-Why: tasklist_lock protects the kernel internal task list. Modules have
- no business looking at it, and all instances in drivers have been due
- to use of too-lowlevel APIs. Having this symbol exported prevents
- moving to more scalable locking schemes for the task list.
-Who: Christoph Hellwig <hch@lst.de>
-
----------------------------
-
What: mount/umount uevents
When: February 2007
Why: These events are not correct, and do not properly let userspace know
@@ -266,3 +254,30 @@ Why: The interrupt related SA_* flags are replaced by IRQF_* to move them
Who: Thomas Gleixner <tglx@linutronix.de>
---------------------------
+
+What: i2c-ite and i2c-algo-ite drivers
+When: September 2006
+Why: These drivers never compiled since they were added to the kernel
+ tree 5 years ago. This feature removal can be reevaluated if
+ someone shows interest in the drivers, fixes them and takes over
+ maintenance.
+ http://marc.theaimsgroup.com/?l=linux-mips&m=115040510817448
+Who: Jean Delvare <khali@linux-fr.org>
+
+---------------------------
+
+What: Bridge netfilter deferred IPv4/IPv6 output hook calling
+When: January 2007
+Why: The deferred output hooks are a layering violation causing unusual
+ and broken behaviour on bridge devices. Examples of things they
+ break include QoS classifation using the MARK or CLASSIFY targets,
+ the IPsec policy match and connection tracking with VLANs on a
+ bridge. Their only use is to enable bridge output port filtering
+ within iptables with the physdev match, which can also be done by
+ combining iptables and ebtables using netfilter marks. Until it
+ will get removed the hook deferral is disabled by default and is
+ only enabled when needed.
+
+Who: Patrick McHardy <kaber@trash.net>
+
+---------------------------
diff --git a/Documentation/filesystems/00-INDEX b/Documentation/filesystems/00-INDEX
index 66fdc07..16dec61 100644
--- a/Documentation/filesystems/00-INDEX
+++ b/Documentation/filesystems/00-INDEX
@@ -62,8 +62,8 @@ ramfs-rootfs-initramfs.txt
- info on the 'in memory' filesystems ramfs, rootfs and initramfs.
reiser4.txt
- info on the Reiser4 filesystem based on dancing tree algorithms.
-relayfs.txt
- - info on relayfs, for efficient streaming from kernel to user space.
+relay.txt
+ - info on relay, for efficient streaming from kernel to user space.
romfs.txt
- description of the ROMFS filesystem.
smbfs.txt
diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index d31efbb..247d7f6 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -142,8 +142,8 @@ see also dquot_operations section.
--------------------------- file_system_type ---------------------------
prototypes:
- struct int (*get_sb) (struct file_system_type *, int,
- const char *, void *, struct vfsmount *);
+ int (*get_sb) (struct file_system_type *, int,
+ const char *, void *, struct vfsmount *);
void (*kill_sb) (struct super_block *);
locking rules:
may block BKL
diff --git a/Documentation/filesystems/relay.txt b/Documentation/filesystems/relay.txt
new file mode 100644
index 0000000..d6788da
--- /dev/null
+++ b/Documentation/filesystems/relay.txt
@@ -0,0 +1,479 @@
+relay interface (formerly relayfs)
+==================================
+
+The relay interface provides a means for kernel applications to
+efficiently log and transfer large quantities of data from the kernel
+to userspace via user-defined 'relay channels'.
+
+A 'relay channel' is a kernel->user data relay mechanism implemented
+as a set of per-cpu kernel buffers ('channel buffers'), each
+represented as a regular file ('relay file') in user space. Kernel
+clients write into the channel buffers using efficient write
+functions; these automatically log into the current cpu's channel
+buffer. User space applications mmap() or read() from the relay files
+and retrieve the data as it becomes available. The relay files
+themselves are files created in a host filesystem, e.g. debugfs, and
+are associated with the channel buffers using the API described below.
+
+The format of the data logged into the channel buffers is completely
+up to the kernel client; the relay interface does however provide
+hooks which allow kernel clients to impose some structure on the
+buffer data. The relay interface doesn't implement any form of data
+filtering - this also is left to the kernel client. The purpose is to
+keep things as simple as possible.
+
+This document provides an overview of the relay interface API. The
+details of the function parameters are documented along with the
+functions in the relay interface code - please see that for details.
+
+Semantics
+=========
+
+Each relay channel has one buffer per CPU, each buffer has one or more
+sub-buffers. Messages are written to the first sub-buffer until it is
+too full to contain a new message, in which case it it is written to
+the next (if available). Messages are never split across sub-buffers.
+At this point, userspace can be notified so it empties the first
+sub-buffer, while the kernel continues writing to the next.
+
+When notified that a sub-buffer is full, the kernel knows how many
+bytes of it are padding i.e. unused space occurring because a complete
+message couldn't fit into a sub-buffer. Userspace can use this
+knowledge to copy only valid data.
+
+After copying it, userspace can notify the kernel that a sub-buffer
+has been consumed.
+
+A relay channel can operate in a mode where it will overwrite data not
+yet collected by userspace, and not wait for it to be consumed.
+
+The relay channel itself does not provide for communication of such
+data between userspace and kernel, allowing the kernel side to remain
+simple and not impose a single interface on userspace. It does
+provide a set of examples and a separate helper though, described
+below.
+
+The read() interface both removes padding and internally consumes the
+read sub-buffers; thus in cases where read(2) is being used to drain
+the channel buffers, special-purpose communication between kernel and
+user isn't necessary for basic operation.
+
+One of the major goals of the relay interface is to provide a low
+overhead mechanism for conveying kernel data to userspace. While the
+read() interface is easy to use, it's not as efficient as the mmap()
+approach; the example code attempts to make the tradeoff between the
+two approaches as small as possible.
+
+klog and relay-apps example code
+================================
+
+The relay interface itself is ready to use, but to make things easier,
+a couple simple utility functions and a set of examples are provided.
+
+The relay-apps example tarball, available on the relay sourceforge
+site, contains a set of self-contained examples, each consisting of a
+pair of .c files containing boilerplate code for each of the user and
+kernel sides of a relay application. When combined these two sets of
+boilerplate code provide glue to easily stream data to disk, without
+having to bother with mundane housekeeping chores.
+
+The 'klog debugging functions' patch (klog.patch in the relay-apps
+tarball) provides a couple of high-level logging functions to the
+kernel which allow writing formatted text or raw data to a channel,
+regardless of whether a channel to write into exists or not, or even
+whether the relay interface is compiled into the kernel or not. These
+functions allow you to put unconditional 'trace' statements anywhere
+in the kernel or kernel modules; only when there is a 'klog handler'
+registered will data actually be logged (see the klog and kleak
+examples for details).
+
+It is of course possible to use the relay interface from scratch,
+i.e. without using any of the relay-apps example code or klog, but
+you'll have to implement communication between userspace and kernel,
+allowing both to convey the state of buffers (full, empty, amount of
+padding). The read() interface both removes padding and internally
+consumes the read sub-buffers; thus in cases where read(2) is being
+used to drain the channel buffers, special-purpose communication
+between kernel and user isn't necessary for basic operation. Things
+such as buffer-full conditions would still need to be communicated via
+some channel though.
+
+klog and the relay-apps examples can be found in the relay-apps
+tarball on http://relayfs.sourceforge.net
+
+The relay interface user space API
+==================================
+
+The relay interface implements basic file operations for user space
+access to relay channel buffer data. Here are the file operations
+that are available and some comments regarding their behavior:
+
+open() enables user to open an _existing_ channel buffer.
+
+mmap() results in channel buffer being mapped into the caller's
+ memory space. Note that you can't do a partial mmap - you
+ must map the entire file, which is NRBUF * SUBBUFSIZE.
+
+read() read the contents of a channel buffer. The bytes read are
+ 'consumed' by the reader, i.e. they won't be available
+ again to subsequent reads. If the channel is being used
+ in no-overwrite mode (the default), it can be read at any
+ time even if there's an active kernel writer. If the
+ channel is being used in overwrite mode and there are
+ active channel writers, results may be unpredictable -
+ users should make sure that all logging to the channel has
+ ended before using read() with overwrite mode. Sub-buffer
+ padding is automatically removed and will not be seen by
+ the reader.
+
+sendfile() transfer data from a channel buffer to an output file
+ descriptor. Sub-buffer padding is automatically removed
+ and will not be seen by the reader.
+
+poll() POLLIN/POLLRDNORM/POLLERR supported. User applications are
+ notified when sub-buffer boundaries are crossed.
+
+close() decrements the channel buffer's refcount. When the refcount
+ reaches 0, i.e. when no process or kernel client has the
+ buffer open, the channel buffer is freed.
+
+In order for a user application to make use of relay files, the
+host filesystem must be mounted. For example,
+
+ mount -t debugfs debugfs /debug
+
+NOTE: the host filesystem doesn't need to be mounted for kernel
+ clients to create or use channels - it only needs to be
+ mounted when user space applications need access to the buffer
+ data.
+
+
+The relay interface kernel API
+==============================
+
+Here's a summary of the API the relay interface provides to in-kernel clients:
+
+TBD(curr. line MT:/API/)
+ channel management functions:
+
+ relay_open(base_filename, parent, subbuf_size, n_subbufs,
+ callbacks)
+ relay_close(chan)
+ relay_flush(chan)
+ relay_reset(chan)
+
+ channel management typically called on instigation of userspace:
+
+ relay_subbufs_consumed(chan, cpu, subbufs_consumed)
+
+ write functions:
+
+ relay_write(chan, data, length)
+ __relay_write(chan, data, length)
+ relay_reserve(chan, length)
+
+ callbacks:
+
+ subbuf_start(buf, subbuf, prev_subbuf, prev_padding)
+ buf_mapped(buf, filp)
+ buf_unmapped(buf, filp)
+ create_buf_file(filename, parent, mode, buf, is_global)
+ remove_buf_file(dentry)
+
+ helper functions:
+
+ relay_buf_full(buf)
+ subbuf_start_reserve(buf, length)
+
+
+Creating a channel
+------------------
+
+relay_open() is used to create a channel, along with its per-cpu
+channel buffers. Each channel buffer will have an associated file
+created for it in the host filesystem, which can be and mmapped or
+read from in user space. The files are named basename0...basenameN-1
+where N is the number of online cpus, and by default will be created
+in the root of the filesystem (if the parent param is NULL). If you
+want a directory structure to contain your relay files, you should
+create it using the host filesystem's directory creation function,
+e.g. debugfs_create_dir(), and pass the parent directory to
+relay_open(). Users are responsible for cleaning up any directory
+structure they create, when the channel is closed - again the host
+filesystem's directory removal functions should be used for that,
+e.g. debugfs_remove().
+
+In order for a channel to be created and the host filesystem's files
+associated with its channel buffers, the user must provide definitions
+for two callback functions, create_buf_file() and remove_buf_file().
+create_buf_file() is called once for each per-cpu buffer from
+relay_open() and allows the user to create the file which will be used
+to represent the corresponding channel buffer. The callback should
+return the dentry of the file created to represent the channel buffer.
+remove_buf_file() must also be defined; it's responsible for deleting
+the file(s) created in create_buf_file() and is called during
+relay_close().
+
+Here are some typical definitions for these callbacks, in this case
+using debugfs:
+
+/*
+ * create_buf_file() callback. Creates relay file in debugfs.
+ */
+static struct dentry *create_buf_file_handler(const char *filename,
+ struct dentry *parent,
+ int mode,
+ struct rchan_buf *buf,
+ int *is_global)
+{
+ return debugfs_create_file(filename, mode, parent, buf,
+ &relay_file_operations);
+}
+
+/*
+ * remove_buf_file() callback. Removes relay file from debugfs.
+ */
+static int remove_buf_file_handler(struct dentry *dentry)
+{
+ debugfs_remove(dentry);
+
+ return 0;
+}
+
+/*
+ * relay interface callbacks
+ */
+static struct rchan_callbacks relay_callbacks =
+{
+ .create_buf_file = create_buf_file_handler,
+ .remove_buf_file = remove_buf_file_handler,
+};
+
+And an example relay_open() invocation using them:
+
+ chan = relay_open("cpu", NULL, SUBBUF_SIZE, N_SUBBUFS, &relay_callbacks);
+
+If the create_buf_file() callback fails, or isn't defined, channel
+creation and thus relay_open() will fail.
+
+The total size of each per-cpu buffer is calculated by multiplying the
+number of sub-buffers by the sub-buffer size passed into relay_open().
+The idea behind sub-buffers is that they're basically an extension of
+double-buffering to N buffers, and they also allow applications to
+easily implement random-access-on-buffer-boundary schemes, which can
+be important for some high-volume applications. The number and size
+of sub-buffers is completely dependent on the application and even for
+the same application, different conditions will warrant different
+values for these parameters at different times. Typically, the right
+values to use are best decided after some experimentation; in general,
+though, it's safe to assume that having only 1 sub-buffer is a bad
+idea - you're guaranteed to either overwrite data or lose events
+depending on the channel mode being used.
+
+The create_buf_file() implementation can also be defined in such a way
+as to allow the creation of a single 'global' buffer instead of the
+default per-cpu set. This can be useful for applications interested
+mainly in seeing the relative ordering of system-wide events without
+the need to bother with saving explicit timestamps for the purpose of
+merging/sorting per-cpu files in a postprocessing step.
+
+To have relay_open() create a global buffer, the create_buf_file()
+implementation should set the value of the is_global outparam to a
+non-zero value in addition to creating the file that will be used to
+represent the single buffer. In the case of a global buffer,
+create_buf_file() and remove_buf_file() will be called only once. The
+normal channel-writing functions, e.g. relay_write(), can still be
+used - writes from any cpu will transparently end up in the global
+buffer - but since it is a global buffer, callers should make sure
+they use the proper locking for such a buffer, either by wrapping
+writes in a spinlock, or by copying a write function from relay.h and
+creating a local version that internally does the proper locking.
+
+Channel 'modes'
+---------------
+
+relay channels can be used in either of two modes - 'overwrite' or
+'no-overwrite'. The mode is entirely determined by the implementation
+of the subbuf_start() callback, as described below. The default if no
+subbuf_start() callback is defined is 'no-overwrite' mode. If the
+default mode suits your needs, and you plan to use the read()
+interface to retrieve channel data, you can ignore the details of this
+section, as it pertains mainly to mmap() implementations.
+
+In 'overwrite' mode, also known as 'flight recorder' mode, writes
+continuously cycle around the buffer and will never fail, but will
+unconditionally overwrite old data regardless of whether it's actually
+been consumed. In no-overwrite mode, writes will fail, i.e. data will
+be lost, if the number of unconsumed sub-buffers equals the total
+number of sub-buffers in the channel. It should be clear that if
+there is no consumer or if the consumer can't consume sub-buffers fast
+enough, data will be lost in either case; the only difference is
+whether data is lost from the beginning or the end of a buffer.
+
+As explained above, a relay channel is made of up one or more
+per-cpu channel buffers, each implemented as a circular buffer
+subdivided into one or more sub-buffers. Messages are written into
+the current sub-buffer of the channel's current per-cpu buffer via the
+write functions described below. Whenever a message can't fit into
+the current sub-buffer, because there's no room left for it, the
+client is notified via the subbuf_start() callback that a switch to a
+new sub-buffer is about to occur. The client uses this callback to 1)
+initialize the next sub-buffer if appropriate 2) finalize the previous
+sub-buffer if appropriate and 3) return a boolean value indicating
+whether or not to actually move on to the next sub-buffer.
+
+To implement 'no-overwrite' mode, the userspace client would provide
+an implementation of the subbuf_start() callback something like the
+following:
+
+static int subbuf_start(struct rchan_buf *buf,
+ void *subbuf,
+ void *prev_subbuf,
+ unsigned int prev_padding)
+{
+ if (prev_subbuf)
+ *((unsigned *)prev_subbuf) = prev_padding;
+
+ if (relay_buf_full(buf))
+ return 0;
+
+ subbuf_start_reserve(buf, sizeof(unsigned int));
+
+ return 1;
+}
+
+If the current buffer is full, i.e. all sub-buffers remain unconsumed,
+the callback returns 0 to indicate that the buffer switch should not
+occur yet, i.e. until the consumer has had a chance to read the
+current set of ready sub-buffers. For the relay_buf_full() function
+to make sense, the consumer is reponsible for notifying the relay
+interface when sub-buffers have been consumed via
+relay_subbufs_consumed(). Any subsequent attempts to write into the
+buffer will again invoke the subbuf_start() callback with the same
+parameters; only when the consumer has consumed one or more of the
+ready sub-buffers will relay_buf_full() return 0, in which case the
+buffer switch can continue.
+
+The implementation of the subbuf_start() callback for 'overwrite' mode
+would be very similar:
+
+static int subbuf_start(struct rchan_buf *buf,
+ void *subbuf,
+ void *prev_subbuf,
+ unsigned int prev_padding)
+{
+ if (prev_subbuf)
+ *((unsigned *)prev_subbuf) = prev_padding;
+
+ subbuf_start_reserve(buf, sizeof(unsigned int));
+
+ return 1;
+}
+
+In this case, the relay_buf_full() check is meaningless and the
+callback always returns 1, causing the buffer switch to occur
+unconditionally. It's also meaningless for the client to use the
+relay_subbufs_consumed() function in this mode, as it's never
+consulted.
+
+The default subbuf_start() implementation, used if the client doesn't
+define any callbacks, or doesn't define the subbuf_start() callback,
+implements the simplest possible 'no-overwrite' mode, i.e. it does
+nothing but return 0.
+
+Header information can be reserved at the beginning of each sub-buffer
+by calling the subbuf_start_reserve() helper function from within the
+subbuf_start() callback. This reserved area can be used to store
+whatever information the client wants. In the example above, room is
+reserved in each sub-buffer to store the padding count for that
+sub-buffer. This is filled in for the previous sub-buffer in the
+subbuf_start() implementation; the padding value for the previous
+sub-buffer is passed into the subbuf_start() callback along with a
+pointer to the previous sub-buffer, since the padding value isn't
+known until a sub-buffer is filled. The subbuf_start() callback is
+also called for the first sub-buffer when the channel is opened, to
+give the client a chance to reserve space in it. In this case the
+previous sub-buffer pointer passed into the callback will be NULL, so
+the client should check the value of the prev_subbuf pointer before
+writing into the previous sub-buffer.
+
+Writing to a channel
+--------------------
+
+Kernel clients write data into the current cpu's channel buffer using
+relay_write() or __relay_write(). relay_write() is the main logging
+function - it uses local_irqsave() to protect the buffer and should be
+used if you might be logging from interrupt context. If you know
+you'll never be logging from interrupt context, you can use
+__relay_write(), which only disables preemption. These functions
+don't return a value, so you can't determine whether or not they
+failed - the assumption is that you wouldn't want to check a return
+value in the fast logging path anyway, and that they'll always succeed
+unless the buffer is full and no-overwrite mode is being used, in
+which case you can detect a failed write in the subbuf_start()
+callback by calling the relay_buf_full() helper function.
+
+relay_reserve() is used to reserve a slot in a channel buffer which
+can be written to later. This would typically be used in applications
+that need to write directly into a channel buffer without having to
+stage data in a temporary buffer beforehand. Because the actual write
+may not happen immediately after the slot is reserved, applications
+using relay_reserve() can keep a count of the number of bytes actually
+written, either in space reserved in the sub-buffers themselves or as
+a separate array. See the 'reserve' example in the relay-apps tarball
+at http://relayfs.sourceforge.net for an example of how this can be
+done. Because the write is under control of the client and is
+separated from the reserve, relay_reserve() doesn't protect the buffer
+at all - it's up to the client to provide the appropriate
+synchronization when using relay_reserve().
+
+Closing a channel
+-----------------
+
+The client calls relay_close() when it's finished using the channel.
+The channel and its associated buffers are destroyed when there are no
+longer any references to any of the channel buffers. relay_flush()
+forces a sub-buffer switch on all the channel buffers, and can be used
+to finalize and process the last sub-buffers before the channel is
+closed.
+
+Misc
+----
+
+Some applications may want to keep a channel around and re-use it
+rather than open and close a new channel for each use. relay_reset()
+can be used for this purpose - it resets a channel to its initial
+state without reallocating channel buffer memory or destroying
+existing mappings. It should however only be called when it's safe to
+do so, i.e. when the channel isn't currently being written to.
+
+Finally, there are a couple of utility callbacks that can be used for
+different purposes. buf_mapped() is called whenever a channel buffer
+is mmapped from user space and buf_unmapped() is called when it's
+unmapped. The client can use this notification to trigger actions
+within the kernel application, such as enabling/disabling logging to
+the channel.
+
+
+Resources
+=========
+
+For news, example code, mailing list, etc. see the relay interface homepage:
+
+ http://relayfs.sourceforge.net
+
+
+Credits
+=======
+
+The ideas and specs for the relay interface came about as a result of
+discussions on tracing involving the following:
+
+Michel Dagenais <michel.dagenais@polymtl.ca>
+Richard Moore <richardj_moore@uk.ibm.com>
+Bob Wisniewski <bob@watson.ibm.com>
+Karim Yaghmour <karim@opersys.com>
+Tom Zanussi <zanussi@us.ibm.com>
+
+Also thanks to Hubertus Franke for a lot of useful suggestions and bug
+reports.
diff --git a/Documentation/filesystems/relayfs.txt b/Documentation/filesystems/relayfs.txt
deleted file mode 100644
index 5832377..0000000
--- a/Documentation/filesystems/relayfs.txt
+++ /dev/null
@@ -1,442 +0,0 @@
-
-relayfs - a high-speed data relay filesystem
-============================================
-
-relayfs is a filesystem designed to provide an efficient mechanism for
-tools and facilities to relay large and potentially sustained streams
-of data from kernel space to user space.
-
-The main abstraction of relayfs is the 'channel'. A channel consists
-of a set of per-cpu kernel buffers each represented by a file in the
-relayfs filesystem. Kernel clients write into a channel using
-efficient write functions which automatically log to the current cpu's
-channel buffer. User space applications mmap() the per-cpu files and
-retrieve the data as it becomes available.
-
-The format of the data logged into the channel buffers is completely
-up to the relayfs client; relayfs does however provide hooks which
-allow clients to impose some structure on the buffer data. Nor does
-relayfs implement any form of data filtering - this also is left to
-the client. The purpose is to keep relayfs as simple as possible.
-
-This document provides an overview of the relayfs API. The details of
-the function parameters are documented along with the functions in the
-filesystem code - please see that for details.
-
-Semantics
-=========
-
-Each relayfs channel has one buffer per CPU, each buffer has one or
-more sub-buffers. Messages are written to the first sub-buffer until
-it is too full to contain a new message, in which case it it is
-written to the next (if available). Messages are never split across
-sub-buffers. At this point, userspace can be notified so it empties
-the first sub-buffer, while the kernel continues writing to the next.
-
-When notified that a sub-buffer is full, the kernel knows how many
-bytes of it are padding i.e. unused. Userspace can use this knowledge
-to copy only valid data.
-
-After copying it, userspace can notify the kernel that a sub-buffer
-has been consumed.
-
-relayfs can operate in a mode where it will overwrite data not yet
-collected by userspace, and not wait for it to consume it.
-
-relayfs itself does not provide for communication of such data between
-userspace and kernel, allowing the kernel side to remain simple and
-not impose a single interface on userspace. It does provide a set of
-examples and a separate helper though, described below.
-
-klog and relay-apps example code
-================================
-
-relayfs itself is ready to use, but to make things easier, a couple
-simple utility functions and a set of examples are provided.
-
-The relay-apps example tarball, available on the relayfs sourceforge
-site, contains a set of self-contained examples, each consisting of a
-pair of .c files containing boilerplate code for each of the user and
-kernel sides of a relayfs application; combined these two sets of
-boilerplate code provide glue to easily stream data to disk, without
-having to bother with mundane housekeeping chores.
-
-The 'klog debugging functions' patch (klog.patch in the relay-apps
-tarball) provides a couple of high-level logging functions to the
-kernel which allow writing formatted text or raw data to a channel,
-regardless of whether a channel to write into exists or not, or
-whether relayfs is compiled into the kernel or is configured as a
-module. These functions allow you to put unconditional 'trace'
-statements anywhere in the kernel or kernel modules; only when there
-is a 'klog handler' registered will data actually be logged (see the
-klog and kleak examples for details).
-
-It is of course possible to use relayfs from scratch i.e. without
-using any of the relay-apps example code or klog, but you'll have to
-implement communication between userspace and kernel, allowing both to
-convey the state of buffers (full, empty, amount of padding).
-
-klog and the relay-apps examples can be found in the relay-apps
-tarball on http://relayfs.sourceforge.net
-
-
-The relayfs user space API
-==========================
-
-relayfs implements basic file operations for user space access to
-relayfs channel buffer data. Here are the file operations that are
-available and some comments regarding their behavior:
-
-open() enables user to open an _existing_ buffer.
-
-mmap() results in channel buffer being mapped into the caller's
- memory space. Note that you can't do a partial mmap - you must
- map the entire file, which is NRBUF * SUBBUFSIZE.
-
-read() read the contents of a channel buffer. The bytes read are
- 'consumed' by the reader i.e. they won't be available again
- to subsequent reads. If the channel is being used in
- no-overwrite mode (the default), it can be read at any time
- even if there's an active kernel writer. If the channel is
- being used in overwrite mode and there are active channel
- writers, results may be unpredictable - users should make
- sure that all logging to the channel has ended before using
- read() with overwrite mode.
-
-poll() POLLIN/POLLRDNORM/POLLERR supported. User applications are
- notified when sub-buffer boundaries are crossed.
-
-close() decrements the channel buffer's refcount. When the refcount
- reaches 0 i.e. when no process or kernel client has the buffer
- open, the channel buffer is freed.
-
-
-In order for a user application to make use of relayfs files, the
-relayfs filesystem must be mounted. For example,
-
- mount -t relayfs relayfs /mnt/relay
-
-NOTE: relayfs doesn't need to be mounted for kernel clients to create
- or use channels - it only needs to be mounted when user space
- applications need access to the buffer data.
-
-
-The relayfs kernel API
-======================
-
-Here's a summary of the API relayfs provides to in-kernel clients:
-
-
- channel management functions:
-
- relay_open(base_filename, parent, subbuf_size, n_subbufs,
- callbacks)
- relay_close(chan)
- relay_flush(chan)
- relay_reset(chan)
- relayfs_create_dir(name, parent)
- relayfs_remove_dir(dentry)
- relayfs_create_file(name, parent, mode, fops, data)
- relayfs_remove_file(dentry)
-
- channel management typically called on instigation of userspace:
-
- relay_subbufs_consumed(chan, cpu, subbufs_consumed)
-
- write functions:
-
- relay_write(chan, data, length)
- __relay_write(chan, data, length)
- relay_reserve(chan, length)
-
- callbacks:
-
- subbuf_start(buf, subbuf, prev_subbuf, prev_padding)
- buf_mapped(buf, filp)
- buf_unmapped(buf, filp)
- create_buf_file(filename, parent, mode, buf, is_global)
- remove_buf_file(dentry)
-
- helper functions:
-
- relay_buf_full(buf)
- subbuf_start_reserve(buf, length)
-
-
-Creating a channel
-------------------
-
-relay_open() is used to create a channel, along with its per-cpu
-channel buffers. Each channel buffer will have an associated file
-created for it in the relayfs filesystem, which can be opened and
-mmapped from user space if desired. The files are named
-basename0...basenameN-1 where N is the number of online cpus, and by
-default will be created in the root of the filesystem. If you want a
-directory structure to contain your relayfs files, you can create it
-with relayfs_create_dir() and pass the parent directory to
-relay_open(). Clients are responsible for cleaning up any directory
-structure they create when the channel is closed - use
-relayfs_remove_dir() for that.
-
-The total size of each per-cpu buffer is calculated by multiplying the
-number of sub-buffers by the sub-buffer size passed into relay_open().
-The idea behind sub-buffers is that they're basically an extension of
-double-buffering to N buffers, and they also allow applications to
-easily implement random-access-on-buffer-boundary schemes, which can
-be important for some high-volume applications. The number and size
-of sub-buffers is completely dependent on the application and even for
-the same application, different conditions will warrant different
-values for these parameters at different times. Typically, the right
-values to use are best decided after some experimentation; in general,
-though, it's safe to assume that having only 1 sub-buffer is a bad
-idea - you're guaranteed to either overwrite data or lose events
-depending on the channel mode being used.
-
-Channel 'modes'
----------------
-
-relayfs channels can be used in either of two modes - 'overwrite' or
-'no-overwrite'. The mode is entirely determined by the implementation
-of the subbuf_start() callback, as described below. In 'overwrite'
-mode, also known as 'flight recorder' mode, writes continuously cycle
-around the buffer and will never fail, but will unconditionally
-overwrite old data regardless of whether it's actually been consumed.
-In no-overwrite mode, writes will fail i.e. data will be lost, if the
-number of unconsumed sub-buffers equals the total number of
-sub-buffers in the channel. It should be clear that if there is no
-consumer or if the consumer can't consume sub-buffers fast enought,
-data will be lost in either case; the only difference is whether data
-is lost from the beginning or the end of a buffer.
-
-As explained above, a relayfs channel is made of up one or more
-per-cpu channel buffers, each implemented as a circular buffer
-subdivided into one or more sub-buffers. Messages are written into
-the current sub-buffer of the channel's current per-cpu buffer via the
-write functions described below. Whenever a message can't fit into
-the current sub-buffer, because there's no room left for it, the
-client is notified via the subbuf_start() callback that a switch to a
-new sub-buffer is about to occur. The client uses this callback to 1)
-initialize the next sub-buffer if appropriate 2) finalize the previous
-sub-buffer if appropriate and 3) return a boolean value indicating
-whether or not to actually go ahead with the sub-buffer switch.
-
-To implement 'no-overwrite' mode, the userspace client would provide
-an implementation of the subbuf_start() callback something like the
-following:
-
-static int subbuf_start(struct rchan_buf *buf,
- void *subbuf,
- void *prev_subbuf,
- unsigned int prev_padding)
-{
- if (prev_subbuf)
- *((unsigned *)prev_subbuf) = prev_padding;
-
- if (relay_buf_full(buf))
- return 0;
-
- subbuf_start_reserve(buf, sizeof(unsigned int));
-
- return 1;
-}
-
-If the current buffer is full i.e. all sub-buffers remain unconsumed,
-the callback returns 0 to indicate that the buffer switch should not
-occur yet i.e. until the consumer has had a chance to read the current
-set of ready sub-buffers. For the relay_buf_full() function to make
-sense, the consumer is reponsible for notifying relayfs when
-sub-buffers have been consumed via relay_subbufs_consumed(). Any
-subsequent attempts to write into the buffer will again invoke the
-subbuf_start() callback with the same parameters; only when the
-consumer has consumed one or more of the ready sub-buffers will
-relay_buf_full() return 0, in which case the buffer switch can
-continue.
-
-The implementation of the subbuf_start() callback for 'overwrite' mode
-would be very similar:
-
-static int subbuf_start(struct rchan_buf *buf,
- void *subbuf,
- void *prev_subbuf,
- unsigned int prev_padding)
-{
- if (prev_subbuf)
- *((unsigned *)prev_subbuf) = prev_padding;
-
- subbuf_start_reserve(buf, sizeof(unsigned int));
-
- return 1;
-}
-
-In this case, the relay_buf_full() check is meaningless and the
-callback always returns 1, causing the buffer switch to occur
-unconditionally. It's also meaningless for the client to use the
-relay_subbufs_consumed() function in this mode, as it's never
-consulted.
-
-The default subbuf_start() implementation, used if the client doesn't
-define any callbacks, or doesn't define the subbuf_start() callback,
-implements the simplest possible 'no-overwrite' mode i.e. it does
-nothing but return 0.
-
-Header information can be reserved at the beginning of each sub-buffer
-by calling the subbuf_start_reserve() helper function from within the
-subbuf_start() callback. This reserved area can be used to store
-whatever information the client wants. In the example above, room is
-reserved in each sub-buffer to store the padding count for that
-sub-buffer. This is filled in for the previous sub-buffer in the
-subbuf_start() implementation; the padding value for the previous
-sub-buffer is passed into the subbuf_start() callback along with a
-pointer to the previous sub-buffer, since the padding value isn't
-known until a sub-buffer is filled. The subbuf_start() callback is
-also called for the first sub-buffer when the channel is opened, to
-give the client a chance to reserve space in it. In this case the
-previous sub-buffer pointer passed into the callback will be NULL, so
-the client should check the value of the prev_subbuf pointer before
-writing into the previous sub-buffer.
-
-Writing to a channel
---------------------
-
-kernel clients write data into the current cpu's channel buffer using
-relay_write() or __relay_write(). relay_write() is the main logging
-function - it uses local_irqsave() to protect the buffer and should be
-used if you might be logging from interrupt context. If you know
-you'll never be logging from interrupt context, you can use
-__relay_write(), which only disables preemption. These functions
-don't return a value, so you can't determine whether or not they
-failed - the assumption is that you wouldn't want to check a return
-value in the fast logging path anyway, and that they'll always succeed
-unless the buffer is full and no-overwrite mode is being used, in
-which case you can detect a failed write in the subbuf_start()
-callback by calling the relay_buf_full() helper function.
-
-relay_reserve() is used to reserve a slot in a channel buffer which
-can be written to later. This would typically be used in applications
-that need to write directly into a channel buffer without having to
-stage data in a temporary buffer beforehand. Because the actual write
-may not happen immediately after the slot is reserved, applications
-using relay_reserve() can keep a count of the number of bytes actually
-written, either in space reserved in the sub-buffers themselves or as
-a separate array. See the 'reserve' example in the relay-apps tarball
-at http://relayfs.sourceforge.net for an example of how this can be
-done. Because the write is under control of the client and is
-separated from the reserve, relay_reserve() doesn't protect the buffer
-at all - it's up to the client to provide the appropriate
-synchronization when using relay_reserve().
-
-Closing a channel
------------------
-
-The client calls relay_close() when it's finished using the channel.
-The channel and its associated buffers are destroyed when there are no
-longer any references to any of the channel buffers. relay_flush()
-forces a sub-buffer switch on all the channel buffers, and can be used
-to finalize and process the last sub-buffers before the channel is
-closed.
-
-Creating non-relay files
-------------------------
-
-relay_open() automatically creates files in the relayfs filesystem to
-represent the per-cpu kernel buffers; it's often useful for
-applications to be able to create their own files alongside the relay
-files in the relayfs filesystem as well e.g. 'control' files much like
-those created in /proc or debugfs for similar purposes, used to
-communicate control information between the kernel and user sides of a
-relayfs application. For this purpose the relayfs_create_file() and
-relayfs_remove_file() API functions exist. For relayfs_create_file(),
-the caller passes in a set of user-defined file operations to be used
-for the file and an optional void * to a user-specified data item,
-which will be accessible via inode->u.generic_ip (see the relay-apps
-tarball for examples). The file_operations are a required parameter
-to relayfs_create_file() and thus the semantics of these files are
-completely defined by the caller.
-
-See the relay-apps tarball at http://relayfs.sourceforge.net for
-examples of how these non-relay files are meant to be used.
-
-Creating relay files in other filesystems
------------------------------------------
-
-By default of course, relay_open() creates relay files in the relayfs
-filesystem. Because relay_file_operations is exported, however, it's
-also possible to create and use relay files in other pseudo-filesytems
-such as debugfs.
-
-For this purpose, two callback functions are provided,
-create_buf_file() and remove_buf_file(). create_buf_file() is called
-once for each per-cpu buffer from relay_open() to allow the client to
-create a file to be used to represent the corresponding buffer; if
-this callback is not defined, the default implementation will create
-and return a file in the relayfs filesystem to represent the buffer.
-The callback should return the dentry of the file created to represent
-the relay buffer. Note that the parent directory passed to
-relay_open() (and passed along to the callback), if specified, must
-exist in the same filesystem the new relay file is created in. If
-create_buf_file() is defined, remove_buf_file() must also be defined;
-it's responsible for deleting the file(s) created in create_buf_file()
-and is called during relay_close().
-
-The create_buf_file() implementation can also be defined in such a way
-as to allow the creation of a single 'global' buffer instead of the
-default per-cpu set. This can be useful for applications interested
-mainly in seeing the relative ordering of system-wide events without
-the need to bother with saving explicit timestamps for the purpose of
-merging/sorting per-cpu files in a postprocessing step.
-
-To have relay_open() create a global buffer, the create_buf_file()
-implementation should set the value of the is_global outparam to a
-non-zero value in addition to creating the file that will be used to
-represent the single buffer. In the case of a global buffer,
-create_buf_file() and remove_buf_file() will be called only once. The
-normal channel-writing functions e.g. relay_write() can still be used
-- writes from any cpu will transparently end up in the global buffer -
-but since it is a global buffer, callers should make sure they use the
-proper locking for such a buffer, either by wrapping writes in a
-spinlock, or by copying a write function from relayfs_fs.h and
-creating a local version that internally does the proper locking.
-
-See the 'exported-relayfile' examples in the relay-apps tarball for
-examples of creating and using relay files in debugfs.
-
-Misc
-----
-
-Some applications may want to keep a channel around and re-use it
-rather than open and close a new channel for each use. relay_reset()
-can be used for this purpose - it resets a channel to its initial
-state without reallocating channel buffer memory or destroying
-existing mappings. It should however only be called when it's safe to
-do so i.e. when the channel isn't currently being written to.
-
-Finally, there are a couple of utility callbacks that can be used for
-different purposes. buf_mapped() is called whenever a channel buffer
-is mmapped from user space and buf_unmapped() is called when it's
-unmapped. The client can use this notification to trigger actions
-within the kernel application, such as enabling/disabling logging to
-the channel.
-
-
-Resources
-=========
-
-For news, example code, mailing list, etc. see the relayfs homepage:
-
- http://relayfs.sourceforge.net
-
-
-Credits
-=======
-
-The ideas and specs for relayfs came about as a result of discussions
-on tracing involving the following:
-
-Michel Dagenais <michel.dagenais@polymtl.ca>
-Richard Moore <richardj_moore@uk.ibm.com>
-Bob Wisniewski <bob@watson.ibm.com>
-Karim Yaghmour <karim@opersys.com>
-Tom Zanussi <zanussi@us.ibm.com>
-
-Also thanks to Hubertus Franke for a lot of useful suggestions and bug
-reports.
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index 9d3aed6..1cb7e8b 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -113,8 +113,8 @@ members are defined:
struct file_system_type {
const char *name;
int fs_flags;
- struct int (*get_sb) (struct file_system_type *, int,
- const char *, void *, struct vfsmount *);
+ int (*get_sb) (struct file_system_type *, int,
+ const char *, void *, struct vfsmount *);
void (*kill_sb) (struct super_block *);
struct module *owner;
struct file_system_type * next;
diff --git a/Documentation/hwmon/abituguru b/Documentation/hwmon/abituguru
index 69cdb52..b2c0d61 100644
--- a/Documentation/hwmon/abituguru
+++ b/Documentation/hwmon/abituguru
@@ -2,13 +2,36 @@ Kernel driver abituguru
=======================
Supported chips:
- * Abit uGuru (Hardware Monitor part only)
+ * Abit uGuru revision 1-3 (Hardware Monitor part only)
Prefix: 'abituguru'
Addresses scanned: ISA 0x0E0
Datasheet: Not available, this driver is based on reverse engineering.
A "Datasheet" has been written based on the reverse engineering it
should be available in the same dir as this file under the name
abituguru-datasheet.
+ Note:
+ The uGuru is a microcontroller with onboard firmware which programs
+ it to behave as a hwmon IC. There are many different revisions of the
+ firmware and thus effectivly many different revisions of the uGuru.
+ Below is an incomplete list with which revisions are used for which
+ Motherboards:
+ uGuru 1.00 ~ 1.24 (AI7, KV8-MAX3, AN7) (1)
+ uGuru 2.0.0.0 ~ 2.0.4.2 (KV8-PRO)
+ uGuru 2.1.0.0 ~ 2.1.2.8 (AS8, AV8, AA8, AG8, AA8XE, AX8)
+ uGuru 2.2.0.0 ~ 2.2.0.6 (AA8 Fatal1ty)
+ uGuru 2.3.0.0 ~ 2.3.0.9 (AN8)
+ uGuru 3.0.0.0 ~ 3.0.1.2 (AW8, AL8, NI8)
+ uGuru 4.xxxxx? (AT8 32X) (2)
+ 1) For revisions 2 and 3 uGuru's the driver can autodetect the
+ sensortype (Volt or Temp) for bank1 sensors, for revision 1 uGuru's
+ this doesnot always work. For these uGuru's the autodection can
+ be overriden with the bank1_types module param. For all 3 known
+ revison 1 motherboards the correct use of this param is:
+ bank1_types=1,1,0,0,0,0,0,2,0,0,0,0,2,0,0,1
+ You may also need to specify the fan_sensors option for these boards
+ fan_sensors=5
+ 2) The current version of the abituguru driver is known to NOT work
+ on these Motherboards
Authors:
Hans de Goede <j.w.r.degoede@hhs.nl>,
@@ -22,6 +45,11 @@ Module Parameters
* force: bool Force detection. Note this parameter only causes the
detection to be skipped, if the uGuru can't be read
the module initialization (insmod) will still fail.
+* bank1_types: int[] Bank1 sensortype autodetection override:
+ -1 autodetect (default)
+ 0 volt sensor
+ 1 temp sensor
+ 2 not connected
* fan_sensors: int Tell the driver how many fan speed sensors there are
on your motherboard. Default: 0 (autodetect).
* pwms: int Tell the driver how many fan speed controls (fan
@@ -29,7 +57,7 @@ Module Parameters
* verbose: int How verbose should the driver be? (0-3):
0 normal output
1 + verbose error reporting
- 2 + sensors type probing info\n"
+ 2 + sensors type probing info (default)
3 + retryable error reporting
Default: 2 (the driver is still in the testing phase)
diff --git a/Documentation/i2c/busses/i2c-sis96x b/Documentation/i2c/busses/i2c-sis96x
index 00a009b..08d7b2d 100644
--- a/Documentation/i2c/busses/i2c-sis96x
+++ b/Documentation/i2c/busses/i2c-sis96x
@@ -42,8 +42,8 @@ I suspect that this driver could be made to work for the following SiS
chipsets as well: 635, and 635T. If anyone owns a board with those chips
AND is willing to risk crashing & burning an otherwise well-behaved kernel
in the name of progress... please contact me at <mhoffman@lightlink.com> or
-via the project's mailing list: <lm-sensors@lm-sensors.org>. Please
-send bug reports and/or success stories as well.
+via the project's mailing list: <i2c@lm-sensors.org>. Please send bug
+reports and/or success stories as well.
TO DOs
diff --git a/Documentation/i386/boot.txt b/Documentation/i386/boot.txt
index 10312be..c51314b 100644
--- a/Documentation/i386/boot.txt
+++ b/Documentation/i386/boot.txt
@@ -181,6 +181,7 @@ filled out, however:
5 ELILO
7 GRuB
8 U-BOOT
+ 9 Xen
Please contact <hpa@zytor.com> if you need a bootloader ID
value assigned.
diff --git a/Documentation/i386/zero-page.txt b/Documentation/i386/zero-page.txt
index df28c74..c04a421 100644
--- a/Documentation/i386/zero-page.txt
+++ b/Documentation/i386/zero-page.txt
@@ -63,6 +63,10 @@ Offset Type Description
2 for bootsect-loader
3 for SYSLINUX
4 for ETHERBOOT
+ 5 for ELILO
+ 7 for GRuB
+ 8 for U-BOOT
+ 9 for Xen
V = version
0x211 char loadflags:
bit0 = 1: kernel is loaded high (bzImage)
diff --git a/Documentation/infiniband/ipoib.txt b/Documentation/infiniband/ipoib.txt
index 1870355..864ff32 100644
--- a/Documentation/infiniband/ipoib.txt
+++ b/Documentation/infiniband/ipoib.txt
@@ -51,8 +51,6 @@ Debugging Information
References
- IETF IP over InfiniBand (ipoib) Working Group
- http://ietf.org/html.charters/ipoib-charter.html
Transmission of IP over InfiniBand (IPoIB) (RFC 4391)
http://ietf.org/rfc/rfc4391.txt
IP over InfiniBand (IPoIB) Architecture (RFC 4392)
diff --git a/Documentation/initrd.txt b/Documentation/initrd.txt
index b1b6440..15f1b35 100644
--- a/Documentation/initrd.txt
+++ b/Documentation/initrd.txt
@@ -72,6 +72,22 @@ initrd adds the following new options:
initrd is mounted as root, and the normal boot procedure is followed,
with the RAM disk still mounted as root.
+Compressed cpio images
+----------------------
+
+Recent kernels have support for populating a ramdisk from a compressed cpio
+archive, on such systems, the creation of a ramdisk image doesn't need to
+involve special block devices or loopbacks, you merely create a directory on
+disk with the desired initrd content, cd to that directory, and run (as an
+example):
+
+find . | cpio --quiet -c -o | gzip -9 -n > /boot/imagefile.img
+
+Examining the contents of an existing image file is just as simple:
+
+mkdir /tmp/imagefile
+cd /tmp/imagefile
+gzip -cd /boot/imagefile.img | cpio -imd --quiet
Installation
------------
diff --git a/Documentation/input/joystick.txt b/Documentation/input/joystick.txt
index d53b857..841c353 100644
--- a/Documentation/input/joystick.txt
+++ b/Documentation/input/joystick.txt
@@ -39,7 +39,6 @@ them. Bug reports and success stories are also welcome.
The input project website is at:
- http://www.suse.cz/development/input/
http://atrey.karlin.mff.cuni.cz/~vojtech/input/
There is also a mailing list for the driver at:
diff --git a/Documentation/irqflags-tracing.txt b/Documentation/irqflags-tracing.txt
new file mode 100644
index 0000000..6a44487
--- /dev/null
+++ b/Documentation/irqflags-tracing.txt
@@ -0,0 +1,57 @@
+IRQ-flags state tracing
+
+started by Ingo Molnar <mingo@redhat.com>
+
+the "irq-flags tracing" feature "traces" hardirq and softirq state, in
+that it gives interested subsystems an opportunity to be notified of
+every hardirqs-off/hardirqs-on, softirqs-off/softirqs-on event that
+happens in the kernel.
+
+CONFIG_TRACE_IRQFLAGS_SUPPORT is needed for CONFIG_PROVE_SPIN_LOCKING
+and CONFIG_PROVE_RW_LOCKING to be offered by the generic lock debugging
+code. Otherwise only CONFIG_PROVE_MUTEX_LOCKING and
+CONFIG_PROVE_RWSEM_LOCKING will be offered on an architecture - these
+are locking APIs that are not used in IRQ context. (the one exception
+for rwsems is worked around)
+
+architecture support for this is certainly not in the "trivial"
+category, because lots of lowlevel assembly code deal with irq-flags
+state changes. But an architecture can be irq-flags-tracing enabled in a
+rather straightforward and risk-free manner.
+
+Architectures that want to support this need to do a couple of
+code-organizational changes first:
+
+- move their irq-flags manipulation code from their asm/system.h header
+ to asm/irqflags.h
+
+- rename local_irq_disable()/etc to raw_local_irq_disable()/etc. so that
+ the linux/irqflags.h code can inject callbacks and can construct the
+ real local_irq_disable()/etc APIs.
+
+- add and enable TRACE_IRQFLAGS_SUPPORT in their arch level Kconfig file
+
+and then a couple of functional changes are needed as well to implement
+irq-flags-tracing support:
+
+- in lowlevel entry code add (build-conditional) calls to the
+ trace_hardirqs_off()/trace_hardirqs_on() functions. The lock validator
+ closely guards whether the 'real' irq-flags matches the 'virtual'
+ irq-flags state, and complains loudly (and turns itself off) if the
+ two do not match. Usually most of the time for arch support for
+ irq-flags-tracing is spent in this state: look at the lockdep
+ complaint, try to figure out the assembly code we did not cover yet,
+ fix and repeat. Once the system has booted up and works without a
+ lockdep complaint in the irq-flags-tracing functions arch support is
+ complete.
+- if the architecture has non-maskable interrupts then those need to be
+ excluded from the irq-tracing [and lock validation] mechanism via
+ lockdep_off()/lockdep_on().
+
+in general there is no risk from having an incomplete irq-flags-tracing
+implementation in an architecture: lockdep will detect that and will
+turn itself off. I.e. the lock validator will still be reliable. There
+should be no crashes due to irq-tracing bugs. (except if the assembly
+changes break other code by modifying conditions or registers that
+shouldnt be)
+
diff --git a/Documentation/kbuild/makefiles.txt b/Documentation/kbuild/makefiles.txt
index 14ef3868..0706699 100644
--- a/Documentation/kbuild/makefiles.txt
+++ b/Documentation/kbuild/makefiles.txt
@@ -407,6 +407,20 @@ more details, with real examples.
The second argument is optional, and if supplied will be used
if first argument is not supported.
+ ld-option
+ ld-option is used to check if $(CC) when used to link object files
+ supports the given option. An optional second option may be
+ specified if first option are not supported.
+
+ Example:
+ #arch/i386/kernel/Makefile
+ vsyscall-flags += $(call ld-option, -Wl$(comma)--hash-style=sysv)
+
+ In the above example vsyscall-flags will be assigned the option
+ -Wl$(comma)--hash-style=sysv if it is supported by $(CC).
+ The second argument is optional, and if supplied will be used
+ if first argument is not supported.
+
cc-option
cc-option is used to check if $(CC) support a given option, and not
supported to use an optional second option.
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 86e9282..7947ced 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -435,6 +435,15 @@ running once the system is up.
debug [KNL] Enable kernel debugging (events log level).
+ debug_locks_verbose=
+ [KNL] verbose self-tests
+ Format=<0|1>
+ Print debugging info while doing the locking API
+ self-tests.
+ We default to 0 (no extra messages), setting it to
+ 1 will print _a lot_ more information - normally
+ only useful to kernel developers.
+
decnet= [HW,NET]
Format: <area>[,<node>]
See also Documentation/networking/decnet.txt.
@@ -1020,6 +1029,8 @@ running once the system is up.
nocache [ARM]
+ nodelayacct [KNL] Disable per-task delay accounting
+
nodisconnect [HW,SCSI,M68K] Disables SCSI disconnects.
noexec [IA-64]
@@ -1172,6 +1183,8 @@ running once the system is up.
Mechanism 2.
nommconf [IA-32,X86_64] Disable use of MMCONFIG for PCI
Configuration
+ mmconf [IA-32,X86_64] Force MMCONFIG. This is useful
+ to override the builtin blacklist.
nomsi [MSI] If the PCI_MSI kernel config parameter is
enabled, this kernel boot option can be used to
disable the use of MSI interrupts system-wide.
diff --git a/Documentation/kobject.txt b/Documentation/kobject.txt
index 8d9bffb..949f7b5 100644
--- a/Documentation/kobject.txt
+++ b/Documentation/kobject.txt
@@ -247,7 +247,7 @@ the object-specific fields, which include:
- default_attrs: Default attributes to be exported via sysfs when the
object is registered.Note that the last attribute has to be
initialized to NULL ! You can find a complete implementation
- in drivers/block/genhd.c
+ in block/genhd.c
Instances of struct kobj_type are not registered; only referenced by
diff --git a/Documentation/lockdep-design.txt b/Documentation/lockdep-design.txt
new file mode 100644
index 0000000..00d9360
--- /dev/null
+++ b/Documentation/lockdep-design.txt
@@ -0,0 +1,197 @@
+Runtime locking correctness validator
+=====================================
+
+started by Ingo Molnar <mingo@redhat.com>
+additions by Arjan van de Ven <arjan@linux.intel.com>
+
+Lock-class
+----------
+
+The basic object the validator operates upon is a 'class' of locks.
+
+A class of locks is a group of locks that are logically the same with
+respect to locking rules, even if the locks may have multiple (possibly
+tens of thousands of) instantiations. For example a lock in the inode
+struct is one class, while each inode has its own instantiation of that
+lock class.
+
+The validator tracks the 'state' of lock-classes, and it tracks
+dependencies between different lock-classes. The validator maintains a
+rolling proof that the state and the dependencies are correct.
+
+Unlike an lock instantiation, the lock-class itself never goes away: when
+a lock-class is used for the first time after bootup it gets registered,
+and all subsequent uses of that lock-class will be attached to this
+lock-class.
+
+State
+-----
+
+The validator tracks lock-class usage history into 5 separate state bits:
+
+- 'ever held in hardirq context' [ == hardirq-safe ]
+- 'ever held in softirq context' [ == softirq-safe ]
+- 'ever held with hardirqs enabled' [ == hardirq-unsafe ]
+- 'ever held with softirqs and hardirqs enabled' [ == softirq-unsafe ]
+
+- 'ever used' [ == !unused ]
+
+Single-lock state rules:
+------------------------
+
+A softirq-unsafe lock-class is automatically hardirq-unsafe as well. The
+following states are exclusive, and only one of them is allowed to be
+set for any lock-class:
+
+ <hardirq-safe> and <hardirq-unsafe>
+ <softirq-safe> and <softirq-unsafe>
+
+The validator detects and reports lock usage that violate these
+single-lock state rules.
+
+Multi-lock dependency rules:
+----------------------------
+
+The same lock-class must not be acquired twice, because this could lead
+to lock recursion deadlocks.
+
+Furthermore, two locks may not be taken in different order:
+
+ <L1> -> <L2>
+ <L2> -> <L1>
+
+because this could lead to lock inversion deadlocks. (The validator
+finds such dependencies in arbitrary complexity, i.e. there can be any
+other locking sequence between the acquire-lock operations, the
+validator will still track all dependencies between locks.)
+
+Furthermore, the following usage based lock dependencies are not allowed
+between any two lock-classes:
+
+ <hardirq-safe> -> <hardirq-unsafe>
+ <softirq-safe> -> <softirq-unsafe>
+
+The first rule comes from the fact the a hardirq-safe lock could be
+taken by a hardirq context, interrupting a hardirq-unsafe lock - and
+thus could result in a lock inversion deadlock. Likewise, a softirq-safe
+lock could be taken by an softirq context, interrupting a softirq-unsafe
+lock.
+
+The above rules are enforced for any locking sequence that occurs in the
+kernel: when acquiring a new lock, the validator checks whether there is
+any rule violation between the new lock and any of the held locks.
+
+When a lock-class changes its state, the following aspects of the above
+dependency rules are enforced:
+
+- if a new hardirq-safe lock is discovered, we check whether it
+ took any hardirq-unsafe lock in the past.
+
+- if a new softirq-safe lock is discovered, we check whether it took
+ any softirq-unsafe lock in the past.
+
+- if a new hardirq-unsafe lock is discovered, we check whether any
+ hardirq-safe lock took it in the past.
+
+- if a new softirq-unsafe lock is discovered, we check whether any
+ softirq-safe lock took it in the past.
+
+(Again, we do these checks too on the basis that an interrupt context
+could interrupt _any_ of the irq-unsafe or hardirq-unsafe locks, which
+could lead to a lock inversion deadlock - even if that lock scenario did
+not trigger in practice yet.)
+
+Exception: Nested data dependencies leading to nested locking
+-------------------------------------------------------------
+
+There are a few cases where the Linux kernel acquires more than one
+instance of the same lock-class. Such cases typically happen when there
+is some sort of hierarchy within objects of the same type. In these
+cases there is an inherent "natural" ordering between the two objects
+(defined by the properties of the hierarchy), and the kernel grabs the
+locks in this fixed order on each of the objects.
+
+An example of such an object hieararchy that results in "nested locking"
+is that of a "whole disk" block-dev object and a "partition" block-dev
+object; the partition is "part of" the whole device and as long as one
+always takes the whole disk lock as a higher lock than the partition
+lock, the lock ordering is fully correct. The validator does not
+automatically detect this natural ordering, as the locking rule behind
+the ordering is not static.
+
+In order to teach the validator about this correct usage model, new
+versions of the various locking primitives were added that allow you to
+specify a "nesting level". An example call, for the block device mutex,
+looks like this:
+
+enum bdev_bd_mutex_lock_class
+{
+ BD_MUTEX_NORMAL,
+ BD_MUTEX_WHOLE,
+ BD_MUTEX_PARTITION
+};
+
+ mutex_lock_nested(&bdev->bd_contains->bd_mutex, BD_MUTEX_PARTITION);
+
+In this case the locking is done on a bdev object that is known to be a
+partition.
+
+The validator treats a lock that is taken in such a nested fasion as a
+separate (sub)class for the purposes of validation.
+
+Note: When changing code to use the _nested() primitives, be careful and
+check really thoroughly that the hiearchy is correctly mapped; otherwise
+you can get false positives or false negatives.
+
+Proof of 100% correctness:
+--------------------------
+
+The validator achieves perfect, mathematical 'closure' (proof of locking
+correctness) in the sense that for every simple, standalone single-task
+locking sequence that occured at least once during the lifetime of the
+kernel, the validator proves it with a 100% certainty that no
+combination and timing of these locking sequences can cause any class of
+lock related deadlock. [*]
+
+I.e. complex multi-CPU and multi-task locking scenarios do not have to
+occur in practice to prove a deadlock: only the simple 'component'
+locking chains have to occur at least once (anytime, in any
+task/context) for the validator to be able to prove correctness. (For
+example, complex deadlocks that would normally need more than 3 CPUs and
+a very unlikely constellation of tasks, irq-contexts and timings to
+occur, can be detected on a plain, lightly loaded single-CPU system as
+well!)
+
+This radically decreases the complexity of locking related QA of the
+kernel: what has to be done during QA is to trigger as many "simple"
+single-task locking dependencies in the kernel as possible, at least
+once, to prove locking correctness - instead of having to trigger every
+possible combination of locking interaction between CPUs, combined with
+every possible hardirq and softirq nesting scenario (which is impossible
+to do in practice).
+
+[*] assuming that the validator itself is 100% correct, and no other
+ part of the system corrupts the state of the validator in any way.
+ We also assume that all NMI/SMM paths [which could interrupt
+ even hardirq-disabled codepaths] are correct and do not interfere
+ with the validator. We also assume that the 64-bit 'chain hash'
+ value is unique for every lock-chain in the system. Also, lock
+ recursion must not be higher than 20.
+
+Performance:
+------------
+
+The above rules require _massive_ amounts of runtime checking. If we did
+that for every lock taken and for every irqs-enable event, it would
+render the system practically unusably slow. The complexity of checking
+is O(N^2), so even with just a few hundred lock-classes we'd have to do
+tens of thousands of checks for every event.
+
+This problem is solved by checking any given 'locking scenario' (unique
+sequence of locks taken after each other) only once. A simple stack of
+held locks is maintained, and a lightweight 64-bit hash value is
+calculated, which hash is unique for every lock chain. The hash value,
+when the chain is validated for the first time, is then put into a hash
+table, which hash-table can be checked in a lockfree manner. If the
+locking chain occurs again later on, the hash table tells us that we
+dont have to validate the chain again.
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 28d1bc3..46b9b38 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1015,10 +1015,9 @@ CPU from reordering them.
There are some more advanced barrier functions:
(*) set_mb(var, value)
- (*) set_wmb(var, value)
- These assign the value to the variable and then insert at least a write
- barrier after it, depending on the function. They aren't guaranteed to
+ This assigns the value to the variable and then inserts at least a write
+ barrier after it, depending on the function. It isn't guaranteed to
insert anything more than a compiler barrier in a UP compilation.
diff --git a/Documentation/mips/time.README b/Documentation/mips/time.README
index 70bc0dd..69ddc5c 100644
--- a/Documentation/mips/time.README
+++ b/Documentation/mips/time.README
@@ -65,7 +65,7 @@ the following functions or values:
1. (optional) set up RTC routines
2. (optional) calibrate and set the mips_counter_frequency
- b) board_timer_setup - a function pointer. Invoked at the end of time_init()
+ b) plat_timer_setup - a function pointer. Invoked at the end of time_init()
1. (optional) over-ride any decisions made in time_init()
2. set up the irqaction for timer interrupt.
3. enable the timer interrupt
@@ -116,19 +116,17 @@ Step 2: the machine setup() function
If you supply board_time_init(), set the function poointer.
- Set the function pointer board_timer_setup() (mandatory)
-
-Step 3: implement rtc routines, board_time_init() and board_timer_setup()
+Step 3: implement rtc routines, board_time_init() and plat_timer_setup()
if needed.
- board_time_init() -
+ board_time_init() -
a) (optional) set up RTC routines,
b) (optional) calibrate and set the mips_counter_frequency
(only needed if you intended to use fixed_rate_gettimeoffset
or use cpu counter as timer interrupt source)
- board_timer_setup() -
+ plat_timer_setup() -
a) (optional) over-write any choices made above by time_init().
b) machine specific code should setup the timer irqaction.
c) enable the timer interrupt
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index d46338a..3e0c017 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -294,15 +294,15 @@ tcp_rmem - vector of 3 INTEGERs: min, default, max
Default: 87380*2 bytes.
tcp_mem - vector of 3 INTEGERs: min, pressure, max
- low: below this number of pages TCP is not bothered about its
+ min: below this number of pages TCP is not bothered about its
memory appetite.
pressure: when amount of memory allocated by TCP exceeds this number
of pages, TCP moderates its memory consumption and enters memory
pressure mode, which is exited when memory consumption falls
- under "low".
+ under "min".
- high: number of pages allowed for queueing by all TCP sockets.
+ max: number of pages allowed for queueing by all TCP sockets.
Defaults are calculated at boot time from amount of available
memory.
diff --git a/Documentation/networking/ipvs-sysctl.txt b/Documentation/networking/ipvs-sysctl.txt
new file mode 100644
index 0000000..4ccdbca
--- /dev/null
+++ b/Documentation/networking/ipvs-sysctl.txt
@@ -0,0 +1,143 @@
+/proc/sys/net/ipv4/vs/* Variables:
+
+am_droprate - INTEGER
+ default 10
+
+ It sets the always mode drop rate, which is used in the mode 3
+ of the drop_rate defense.
+
+amemthresh - INTEGER
+ default 1024
+
+ It sets the available memory threshold (in pages), which is
+ used in the automatic modes of defense. When there is no
+ enough available memory, the respective strategy will be
+ enabled and the variable is automatically set to 2, otherwise
+ the strategy is disabled and the variable is set to 1.
+
+cache_bypass - BOOLEAN
+ 0 - disabled (default)
+ not 0 - enabled
+
+ If it is enabled, forward packets to the original destination
+ directly when no cache server is available and destination
+ address is not local (iph->daddr is RTN_UNICAST). It is mostly
+ used in transparent web cache cluster.
+
+debug_level - INTEGER
+ 0 - transmission error messages (default)
+ 1 - non-fatal error messages
+ 2 - configuration
+ 3 - destination trash
+ 4 - drop entry
+ 5 - service lookup
+ 6 - scheduling
+ 7 - connection new/expire, lookup and synchronization
+ 8 - state transition
+ 9 - binding destination, template checks and applications
+ 10 - IPVS packet transmission
+ 11 - IPVS packet handling (ip_vs_in/ip_vs_out)
+ 12 or more - packet traversal
+
+ Only available when IPVS is compiled with the CONFIG_IPVS_DEBUG
+
+ Higher debugging levels include the messages for lower debugging
+ levels, so setting debug level 2, includes level 0, 1 and 2
+ messages. Thus, logging becomes more and more verbose the higher
+ the level.
+
+drop_entry - INTEGER
+ 0 - disabled (default)
+
+ The drop_entry defense is to randomly drop entries in the
+ connection hash table, just in order to collect back some
+ memory for new connections. In the current code, the
+ drop_entry procedure can be activated every second, then it
+ randomly scans 1/32 of the whole and drops entries that are in
+ the SYN-RECV/SYNACK state, which should be effective against
+ syn-flooding attack.
+
+ The valid values of drop_entry are from 0 to 3, where 0 means
+ that this strategy is always disabled, 1 and 2 mean automatic
+ modes (when there is no enough available memory, the strategy
+ is enabled and the variable is automatically set to 2,
+ otherwise the strategy is disabled and the variable is set to
+ 1), and 3 means that that the strategy is always enabled.
+
+drop_packet - INTEGER
+ 0 - disabled (default)
+
+ The drop_packet defense is designed to drop 1/rate packets
+ before forwarding them to real servers. If the rate is 1, then
+ drop all the incoming packets.
+
+ The value definition is the same as that of the drop_entry. In
+ the automatic mode, the rate is determined by the follow
+ formula: rate = amemthresh / (amemthresh - available_memory)
+ when available memory is less than the available memory
+ threshold. When the mode 3 is set, the always mode drop rate
+ is controlled by the /proc/sys/net/ipv4/vs/am_droprate.
+
+expire_nodest_conn - BOOLEAN
+ 0 - disabled (default)
+ not 0 - enabled
+
+ The default value is 0, the load balancer will silently drop
+ packets when its destination server is not available. It may
+ be useful, when user-space monitoring program deletes the
+ destination server (because of server overload or wrong
+ detection) and add back the server later, and the connections
+ to the server can continue.
+
+ If this feature is enabled, the load balancer will expire the
+ connection immediately when a packet arrives and its
+ destination server is not available, then the client program
+ will be notified that the connection is closed. This is
+ equivalent to the feature some people requires to flush
+ connections when its destination is not available.
+
+expire_quiescent_template - BOOLEAN
+ 0 - disabled (default)
+ not 0 - enabled
+
+ When set to a non-zero value, the load balancer will expire
+ persistent templates when the destination server is quiescent.
+ This may be useful, when a user makes a destination server
+ quiescent by setting its weight to 0 and it is desired that
+ subsequent otherwise persistent connections are sent to a
+ different destination server. By default new persistent
+ connections are allowed to quiescent destination servers.
+
+ If this feature is enabled, the load balancer will expire the
+ persistence template if it is to be used to schedule a new
+ connection and the destination server is quiescent.
+
+nat_icmp_send - BOOLEAN
+ 0 - disabled (default)
+ not 0 - enabled
+
+ It controls sending icmp error messages (ICMP_DEST_UNREACH)
+ for VS/NAT when the load balancer receives packets from real
+ servers but the connection entries don't exist.
+
+secure_tcp - INTEGER
+ 0 - disabled (default)
+
+ The secure_tcp defense is to use a more complicated state
+ transition table and some possible short timeouts of each
+ state. In the VS/NAT, it delays the entering the ESTABLISHED
+ until the real server starts to send data and ACK packet
+ (after 3-way handshake).
+
+ The value definition is the same as that of drop_entry or
+ drop_packet.
+
+sync_threshold - INTEGER
+ default 3
+
+ It sets synchronization threshold, which is the minimum number
+ of incoming packets that a connection needs to receive before
+ the connection will be synchronized. A connection will be
+ synchronized, every time the number of its incoming packets
+ modulus 50 equals the threshold. The range of the threshold is
+ from 0 to 49.
diff --git a/Documentation/nfsroot.txt b/Documentation/nfsroot.txt
index d56dc71..3cc953c 100644
--- a/Documentation/nfsroot.txt
+++ b/Documentation/nfsroot.txt
@@ -4,15 +4,16 @@ Mounting the root filesystem via NFS (nfsroot)
Written 1996 by Gero Kuhlmann <gero@gkminix.han.de>
Updated 1997 by Martin Mares <mj@atrey.karlin.mff.cuni.cz>
Updated 2006 by Nico Schottelius <nico-kernel-nfsroot@schottelius.org>
+Updated 2006 by Horms <horms@verge.net.au>
-If you want to use a diskless system, as an X-terminal or printer
-server for example, you have to put your root filesystem onto a
-non-disk device. This can either be a ramdisk (see initrd.txt in
-this directory for further information) or a filesystem mounted
-via NFS. The following text describes on how to use NFS for the
-root filesystem. For the rest of this text 'client' means the
+In order to use a diskless system, such as an X-terminal or printer server
+for example, it is necessary for the root filesystem to be present on a
+non-disk device. This may be an initramfs (see Documentation/filesystems/
+ramfs-rootfs-initramfs.txt), a ramdisk (see Documenation/initrd.txt) or a
+filesystem mounted via NFS. The following text describes on how to use NFS
+for the root filesystem. For the rest of this text 'client' means the
diskless system, and 'server' means the NFS server.
@@ -21,11 +22,13 @@ diskless system, and 'server' means the NFS server.
1.) Enabling nfsroot capabilities
-----------------------------
-In order to use nfsroot you have to select support for NFS during
-kernel configuration. Note that NFS cannot be loaded as a module
-in this case. The configuration script will then ask you whether
-you want to use nfsroot, and if yes what kind of auto configuration
-system you want to use. Selecting both BOOTP and RARP is safe.
+In order to use nfsroot, NFS client support needs to be selected as
+built-in during configuration. Once this has been selected, the nfsroot
+option will become available, which should also be selected.
+
+In the networking options, kernel level autoconfiguration can be selected,
+along with the types of autoconfiguration to support. Selecting all of
+DHCP, BOOTP and RARP is safe.
@@ -33,11 +36,10 @@ system you want to use. Selecting both BOOTP and RARP is safe.
2.) Kernel command line
-------------------
-When the kernel has been loaded by a boot loader (either by loadlin,
-LILO or a network boot program) it has to be told what root fs device
-to use, and where to find the server and the name of the directory
-on the server to mount as root. This can be established by a couple
-of kernel command line parameters:
+When the kernel has been loaded by a boot loader (see below) it needs to be
+told what root fs device to use. And in the case of nfsroot, where to find
+both the server and the name of the directory on the server to mount as root.
+This can be established using the following kernel command line parameters:
root=/dev/nfs
@@ -49,23 +51,21 @@ root=/dev/nfs
nfsroot=[<server-ip>:]<root-dir>[,<nfs-options>]
- If the `nfsroot' parameter is NOT given on the command line, the default
- "/tftpboot/%s" will be used.
+ If the `nfsroot' parameter is NOT given on the command line,
+ the default "/tftpboot/%s" will be used.
- <server-ip> Specifies the IP address of the NFS server. If this field
- is not given, the default address as determined by the
- `ip' variable (see below) is used. One use of this
- parameter is for example to allow using different servers
- for RARP and NFS. Usually you can leave this blank.
+ <server-ip> Specifies the IP address of the NFS server.
+ The default address is determined by the `ip' parameter
+ (see below). This parameter allows the use of different
+ servers for IP autoconfiguration and NFS.
- <root-dir> Name of the directory on the server to mount as root. If
- there is a "%s" token in the string, the token will be
- replaced by the ASCII-representation of the client's IP
- address.
+ <root-dir> Name of the directory on the server to mount as root.
+ If there is a "%s" token in the string, it will be
+ replaced by the ASCII-representation of the client's
+ IP address.
<nfs-options> Standard NFS options. All options are separated by commas.
- If the options field is not given, the following defaults
- will be used:
+ The following defaults are used:
port = as given by server portmap daemon
rsize = 1024
wsize = 1024
@@ -81,129 +81,174 @@ nfsroot=[<server-ip>:]<root-dir>[,<nfs-options>]
ip=<client-ip>:<server-ip>:<gw-ip>:<netmask>:<hostname>:<device>:<autoconf>
This parameter tells the kernel how to configure IP addresses of devices
- and also how to set up the IP routing table. It was originally called `nfsaddrs',
- but now the boot-time IP configuration works independently of NFS, so it
- was renamed to `ip' and the old name remained as an alias for compatibility
- reasons.
+ and also how to set up the IP routing table. It was originally called
+ `nfsaddrs', but now the boot-time IP configuration works independently of
+ NFS, so it was renamed to `ip' and the old name remained as an alias for
+ compatibility reasons.
If this parameter is missing from the kernel command line, all fields are
assumed to be empty, and the defaults mentioned below apply. In general
- this means that the kernel tries to configure everything using both
- RARP and BOOTP (depending on what has been enabled during kernel confi-
- guration, and if both what protocol answer got in first).
+ this means that the kernel tries to configure everything using
+ autoconfiguration.
+
+ The <autoconf> parameter can appear alone as the value to the `ip'
+ parameter (without all the ':' characters before) in which case auto-
+ configuration is used.
+
+ <client-ip> IP address of the client.
- <client-ip> IP address of the client. If empty, the address will either
- be determined by RARP or BOOTP. What protocol is used de-
- pends on what has been enabled during kernel configuration
- and on the <autoconf> parameter. If this parameter is not
- empty, neither RARP nor BOOTP will be used.
+ Default: Determined using autoconfiguration.
<server-ip> IP address of the NFS server. If RARP is used to determine
the client address and this parameter is NOT empty only
- replies from the specified server are accepted. To use
- different RARP and NFS server, specify your RARP server
- here (or leave it blank), and specify your NFS server in
- the `nfsroot' parameter (see above). If this entry is blank
- the address of the server is used which answered the RARP
- or BOOTP request.
-
- <gw-ip> IP address of a gateway if the server is on a different
- subnet. If this entry is empty no gateway is used and the
- server is assumed to be on the local network, unless a
- value has been received by BOOTP.
-
- <netmask> Netmask for local network interface. If this is empty,
+ replies from the specified server are accepted.
+
+ Only required for for NFS root. That is autoconfiguration
+ will not be triggered if it is missing and NFS root is not
+ in operation.
+
+ Default: Determined using autoconfiguration.
+ The address of the autoconfiguration server is used.
+
+ <gw-ip> IP address of a gateway if the server is on a different subnet.
+
+ Default: Determined using autoconfiguration.
+
+ <netmask> Netmask for local network interface. If unspecified
the netmask is derived from the client IP address assuming
- classful addressing, unless overridden in BOOTP reply.
+ classful addressing.
- <hostname> Name of the client. If empty, the client IP address is
- used in ASCII notation, or the value received by BOOTP.
+ Default: Determined using autoconfiguration.
- <device> Name of network device to use. If this is empty, all
- devices are used for RARP and BOOTP requests, and the
- first one we receive a reply on is configured. If you have
- only one device, you can safely leave this blank.
+ <hostname> Name of the client. May be supplied by autoconfiguration,
+ but its absence will not trigger autoconfiguration.
- <autoconf> Method to use for autoconfiguration. If this is either
- 'rarp' or 'bootp', the specified protocol is used.
- If the value is 'both' or empty, both protocols are used
- so far as they have been enabled during kernel configura-
- tion. 'off' means no autoconfiguration.
+ Default: Client IP address is used in ASCII notation.
- The <autoconf> parameter can appear alone as the value to the `ip'
- parameter (without all the ':' characters before) in which case auto-
- configuration is used.
+ <device> Name of network device to use.
+
+ Default: If the host only has one device, it is used.
+ Otherwise the device is determined using
+ autoconfiguration. This is done by sending
+ autoconfiguration requests out of all devices,
+ and using the device that received the first reply.
+ <autoconf> Method to use for autoconfiguration. In the case of options
+ which specify multiple autoconfiguration protocols,
+ requests are sent using all protocols, and the first one
+ to reply is used.
+ Only autoconfiguration protocols that have been compiled
+ into the kernel will be used, regardless of the value of
+ this option.
+ off or none: don't use autoconfiguration (default)
+ on or any: use any protocol available in the kernel
+ dhcp: use DHCP
+ bootp: use BOOTP
+ rarp: use RARP
+ both: use both BOOTP and RARP but not DHCP
+ (old option kept for backwards compatibility)
-3.) Kernel loader
- -------------
+ Default: any
-To get the kernel into memory different approaches can be used. They
-depend on what facilities are available:
-3.1) Writing the kernel onto a floppy using dd:
- As always you can just write the kernel onto a floppy using dd,
- but then it's not possible to use kernel command lines at all.
- To substitute the 'root=' parameter, create a dummy device on any
- linux system with major number 0 and minor number 255 using mknod:
- mknod /dev/boot255 c 0 255
+3.) Boot Loader
+ ----------
- Then copy the kernel zImage file onto a floppy using dd:
+To get the kernel into memory different approaches can be used.
+They depend on various facilities being available:
- dd if=/usr/src/linux/arch/i386/boot/zImage of=/dev/fd0
- And finally use rdev to set the root device:
+3.1) Booting from a floppy using syslinux
- rdev /dev/fd0 /dev/boot255
+ When building kernels, an easy way to create a boot floppy that uses
+ syslinux is to use the zdisk or bzdisk make targets which use
+ and bzimage images respectively. Both targets accept the
+ FDARGS parameter which can be used to set the kernel command line.
- You can then remove the dummy device /dev/boot255 again. There
- is no real device available for it.
- The other two kernel command line parameters cannot be substi-
- tuted with rdev. Therefore, using this method the kernel will
- by default use RARP and/or BOOTP, and if it gets an answer via
- RARP will mount the directory /tftpboot/<client-ip>/ as its
- root. If it got a BOOTP answer the directory name in that answer
- is used.
+ e.g.
+ make bzdisk FDARGS="root=/dev/nfs"
+
+ Note that the user running this command will need to have
+ access to the floppy drive device, /dev/fd0
+
+ For more information on syslinux, including how to create bootdisks
+ for prebuilt kernels, see http://syslinux.zytor.com/
+
+ N.B: Previously it was possible to write a kernel directly to
+ a floppy using dd, configure the boot device using rdev, and
+ boot using the resulting floppy. Linux no longer supports this
+ method of booting.
+
+3.2) Booting from a cdrom using isolinux
+
+ When building kernels, an easy way to create a bootable cdrom that
+ uses isolinux is to use the isoimage target which uses a bzimage
+ image. Like zdisk and bzdisk, this target accepts the FDARGS
+ parameter which can be used to set the kernel command line.
+
+ e.g.
+ make isoimage FDARGS="root=/dev/nfs"
+
+ The resulting iso image will be arch/<ARCH>/boot/image.iso
+ This can be written to a cdrom using a variety of tools including
+ cdrecord.
+
+ e.g.
+ cdrecord dev=ATAPI:1,0,0 arch/i386/boot/image.iso
+
+ For more information on isolinux, including how to create bootdisks
+ for prebuilt kernels, see http://syslinux.zytor.com/
3.2) Using LILO
- When using LILO you can specify all necessary command line
- parameters with the 'append=' command in the LILO configuration
- file. However, to use the 'root=' command you also need to
- set up a dummy device as described in 3.1 above. For how to use
- LILO and its 'append=' command please refer to the LILO
- documentation.
+ When using LILO all the necessary command line parameters may be
+ specified using the 'append=' directive in the LILO configuration
+ file.
+
+ However, to use the 'root=' directive you also need to create
+ a dummy root device, which may be removed after LILO is run.
+
+ mknod /dev/boot255 c 0 255
+
+ For information on configuring LILO, please refer to its documentation.
3.3) Using GRUB
- When you use GRUB, you simply append the parameters after the kernel
- specification: "kernel <kernel> <parameters>" (without the quotes).
+ When using GRUB, kernel parameter are simply appended after the kernel
+ specification: kernel <kernel> <parameters>
3.4) Using loadlin
- When you want to boot Linux from a DOS command prompt without
- having a local hard disk to mount as root, you can use loadlin.
- I was told that it works, but haven't used it myself yet. In
- general you should be able to create a kernel command line simi-
- lar to how LILO is doing it. Please refer to the loadlin docu-
- mentation for further information.
+ loadlin may be used to boot Linux from a DOS command prompt without
+ requiring a local hard disk to mount as root. This has not been
+ thoroughly tested by the authors of this document, but in general
+ it should be possible configure the kernel command line similarly
+ to the configuration of LILO.
+
+ Please refer to the loadlin documentation for further information.
3.5) Using a boot ROM
- This is probably the most elegant way of booting a diskless
- client. With a boot ROM the kernel gets loaded using the TFTP
- protocol. As far as I know, no commercial boot ROMs yet
- support booting Linux over the network, but there are two
- free implementations of a boot ROM available on sunsite.unc.edu
- and its mirrors. They are called 'netboot-nfs' and 'etherboot'.
- Both contain everything you need to boot a diskless Linux client.
+ This is probably the most elegant way of booting a diskless client.
+ With a boot ROM the kernel is loaded using the TFTP protocol. The
+ authors of this document are not aware of any no commercial boot
+ ROMs that support booting Linux over the network. However, there
+ are two free implementations of a boot ROM, netboot-nfs and
+ etherboot, both of which are available on sunsite.unc.edu, and both
+ of which contain everything you need to boot a diskless Linux client.
3.6) Using pxelinux
- Using pxelinux you specify the kernel you built with
+ Pxelinux may be used to boot linux using the PXE boot loader
+ which is present on many modern network cards.
+
+ When using pxelinux, the kernel image is specified using
"kernel <relative-path-below /tftpboot>". The nfsroot parameters
are passed to the kernel by adding them to the "append" line.
- You may perhaps also want to fine tune the console output,
- see Documentation/serial-console.txt for serial console help.
+ It is common to use serial console in conjunction with pxeliunx,
+ see Documentation/serial-console.txt for more information.
+
+ For more information on isolinux, including how to create bootdisks
+ for prebuilt kernels, see http://syslinux.zytor.com/
diff --git a/Documentation/powerpc/booting-without-of.txt b/Documentation/powerpc/booting-without-of.txt
index 217e517..5c0ba23 100644
--- a/Documentation/powerpc/booting-without-of.txt
+++ b/Documentation/powerpc/booting-without-of.txt
@@ -1136,10 +1136,10 @@ Sense and level information should be encoded as follows:
Devices connected to openPIC-compatible controllers should encode
sense and polarity as follows:
- 0 = high to low edge sensitive type enabled
+ 0 = low to high edge sensitive type enabled
1 = active low level sensitive type enabled
- 2 = low to high edge sensitive type enabled
- 3 = active high level sensitive type enabled
+ 2 = active high level sensitive type enabled
+ 3 = high to low edge sensitive type enabled
ISA PIC interrupt controllers should adhere to the ISA PIC
encodings listed below:
@@ -1196,7 +1196,7 @@ platforms are moved over to use the flattened-device-tree model.
- model : Model of the device. Can be "TSEC", "eTSEC", or "FEC"
- compatible : Should be "gianfar"
- reg : Offset and length of the register set for the device
- - address : List of bytes representing the ethernet address of
+ - mac-address : List of bytes representing the ethernet address of
this controller
- interrupts : <a b> where a is the interrupt number and b is a
field that represents an encoding of the sense and level
@@ -1216,7 +1216,7 @@ platforms are moved over to use the flattened-device-tree model.
model = "TSEC";
compatible = "gianfar";
reg = <24000 1000>;
- address = [ 00 E0 0C 00 73 00 ];
+ mac-address = [ 00 E0 0C 00 73 00 ];
interrupts = <d 3 e 3 12 3>;
interrupt-parent = <40000>;
phy-handle = <2452000>
@@ -1436,9 +1436,9 @@ platforms are moved over to use the flattened-device-tree model.
interrupts = <1d 3>;
interrupt-parent = <40000>;
num-channels = <4>;
- channel-fifo-len = <24>;
+ channel-fifo-len = <18>;
exec-units-mask = <000000fe>;
- descriptor-types-mask = <073f1127>;
+ descriptor-types-mask = <012b0ebf>;
};
@@ -1498,7 +1498,7 @@ not necessary as they are usually the same as the root node.
model = "TSEC";
compatible = "gianfar";
reg = <24000 1000>;
- address = [ 00 E0 0C 00 73 00 ];
+ mac-address = [ 00 E0 0C 00 73 00 ];
interrupts = <d 3 e 3 12 3>;
interrupt-parent = <40000>;
phy-handle = <2452000>;
@@ -1511,7 +1511,7 @@ not necessary as they are usually the same as the root node.
model = "TSEC";
compatible = "gianfar";
reg = <25000 1000>;
- address = [ 00 E0 0C 00 73 01 ];
+ mac-address = [ 00 E0 0C 00 73 01 ];
interrupts = <13 3 14 3 18 3>;
interrupt-parent = <40000>;
phy-handle = <2452001>;
@@ -1524,7 +1524,7 @@ not necessary as they are usually the same as the root node.
model = "FEC";
compatible = "gianfar";
reg = <26000 1000>;
- address = [ 00 E0 0C 00 73 02 ];
+ mac-address = [ 00 E0 0C 00 73 02 ];
interrupts = <19 3>;
interrupt-parent = <40000>;
phy-handle = <2452002>;
diff --git a/Documentation/ramdisk.txt b/Documentation/ramdisk.txt
index 7c25584..52f75b7 100644
--- a/Documentation/ramdisk.txt
+++ b/Documentation/ramdisk.txt
@@ -6,7 +6,7 @@ Contents:
1) Overview
2) Kernel Command Line Parameters
3) Using "rdev -r"
- 4) An Example of Creating a Compressed RAM Disk
+ 4) An Example of Creating a Compressed RAM Disk
1) Overview
@@ -34,7 +34,7 @@ make it clearer. The original "ramdisk=<ram_size>" has been kept around for
compatibility reasons, but it may be removed in the future.
The new RAM disk also has the ability to load compressed RAM disk images,
-allowing one to squeeze more programs onto an average installation or
+allowing one to squeeze more programs onto an average installation or
rescue floppy disk.
@@ -51,7 +51,7 @@ default is 4096 (4 MB) (8192 (8 MB) on S390).
===================
This parameter tells the RAM disk driver how many bytes to use per block. The
-default is 512.
+default is 1024 (BLOCK_SIZE).
3) Using "rdev -r"
@@ -70,7 +70,7 @@ These numbers are no magical secrets, as seen below:
./arch/i386/kernel/setup.c:#define RAMDISK_PROMPT_FLAG 0x8000
./arch/i386/kernel/setup.c:#define RAMDISK_LOAD_FLAG 0x4000
-Consider a typical two floppy disk setup, where you will have the
+Consider a typical two floppy disk setup, where you will have the
kernel on disk one, and have already put a RAM disk image onto disk #2.
Hence you want to set bits 0 to 13 as 0, meaning that your RAM disk
@@ -97,12 +97,12 @@ Since the default start = 0 and the default prompt = 1, you could use:
append = "load_ramdisk=1"
-4) An Example of Creating a Compressed RAM Disk
+4) An Example of Creating a Compressed RAM Disk
----------------------------------------------
To create a RAM disk image, you will need a spare block device to
construct it on. This can be the RAM disk device itself, or an
-unused disk partition (such as an unmounted swap partition). For this
+unused disk partition (such as an unmounted swap partition). For this
example, we will use the RAM disk device, "/dev/ram0".
Note: This technique should not be done on a machine with less than 8 MB
diff --git a/Documentation/scsi/ChangeLog.megaraid b/Documentation/scsi/ChangeLog.megaraid
index c173806..a056bbe 100644
--- a/Documentation/scsi/ChangeLog.megaraid
+++ b/Documentation/scsi/ChangeLog.megaraid
@@ -1,3 +1,126 @@
+Release Date : Fri May 19 09:31:45 EST 2006 - Seokmann Ju <sju@lsil.com>
+Current Version : 2.20.4.9 (scsi module), 2.20.2.6 (cmm module)
+Older Version : 2.20.4.8 (scsi module), 2.20.2.6 (cmm module)
+
+1. Fixed a bug in megaraid_init_mbox().
+ Customer reported "garbage in file on x86_64 platform".
+ Root Cause: the driver registered controllers as 64-bit DMA capable
+ for those which are not support it.
+ Fix: Made change in the function inserting identification machanism
+ identifying 64-bit DMA capable controllers.
+
+ > -----Original Message-----
+ > From: Vasily Averin [mailto:vvs@sw.ru]
+ > Sent: Thursday, May 04, 2006 2:49 PM
+ > To: linux-scsi@vger.kernel.org; Kolli, Neela; Mukker, Atul;
+ > Ju, Seokmann; Bagalkote, Sreenivas;
+ > James.Bottomley@SteelEye.com; devel@openvz.org
+ > Subject: megaraid_mbox: garbage in file
+ >
+ > Hello all,
+ >
+ > I've investigated customers claim on the unstable work of
+ > their node and found a
+ > strange effect: reading from some files leads to the
+ > "attempt to access beyond end of device" messages.
+ >
+ > I've checked filesystem, memory on the node, motherboard BIOS
+ > version, but it
+ > does not help and issue still has been reproduced by simple
+ > file reading.
+ >
+ > Reproducer is simple:
+ >
+ > echo 0xffffffff >/proc/sys/dev/scsi/logging_level ;
+ > cat /vz/private/101/root/etc/ld.so.cache >/tmp/ttt ;
+ > echo 0 >/proc/sys/dev/scsi/logging
+ >
+ > It leads to the following messages in dmesg
+ >
+ > sd_init_command: disk=sda, block=871769260, count=26
+ > sda : block=871769260
+ > sda : reading 26/26 512 byte blocks.
+ > scsi_add_timer: scmd: f79ed980, time: 7500, (c02b1420)
+ > sd 0:1:0:0: send 0xf79ed980 sd 0:1:0:0:
+ > command: Read (10): 28 00 33 f6 24 ac 00 00 1a 00
+ > buffer = 0xf7cfb540, bufflen = 13312, done = 0xc0366b40,
+ > queuecommand 0xc0344010
+ > leaving scsi_dispatch_cmnd()
+ > scsi_delete_timer: scmd: f79ed980, rtn: 1
+ > sd 0:1:0:0: done 0xf79ed980 SUCCESS 0 sd 0:1:0:0:
+ > command: Read (10): 28 00 33 f6 24 ac 00 00 1a 00
+ > scsi host busy 1 failed 0
+ > sd 0:1:0:0: Notifying upper driver of completion (result 0)
+ > sd_rw_intr: sda: res=0x0
+ > 26 sectors total, 13312 bytes done.
+ > use_sg is 4
+ > attempt to access beyond end of device
+ > sda6: rw=0, want=1044134458, limit=951401367
+ > Buffer I/O error on device sda6, logical block 522067228
+ > attempt to access beyond end of device
+
+2. When INQUIRY with EVPD bit set issued to the MegaRAID controller,
+ system memory gets corrupted.
+ Root Cause: MegaRAID F/W handle the INQUIRY with EVPD bit set
+ incorrectly.
+ Fix: MegaRAID F/W has fixed the problem and being process of release,
+ soon. Meanwhile, driver will filter out the request.
+
+3. One of member in the data structure of the driver leads unaligne
+ issue on 64-bit platform.
+ Customer reporeted "kernel unaligned access addrss" issue when
+ application communicates with MegaRAID HBA driver.
+ Root Cause: in uioc_t structure, one of member had misaligned and it
+ led system to display the error message.
+ Fix: A patch submitted to community from following folk.
+
+ > -----Original Message-----
+ > From: linux-scsi-owner@vger.kernel.org
+ > [mailto:linux-scsi-owner@vger.kernel.org] On Behalf Of Sakurai Hiroomi
+ > Sent: Wednesday, July 12, 2006 4:20 AM
+ > To: linux-scsi@vger.kernel.org; linux-kernel@vger.kernel.org
+ > Subject: Re: Help: strange messages from kernel on IA64 platform
+ >
+ > Hi,
+ >
+ > I saw same message.
+ >
+ > When GAM(Global Array Manager) is started, The following
+ > message output.
+ > kernel: kernel unaligned access to 0xe0000001fe1080d4,
+ > ip=0xa000000200053371
+ >
+ > The uioc structure used by ioctl is defined by packed,
+ > the allignment of each member are disturbed.
+ > In a 64 bit structure, the allignment of member doesn't fit 64 bit
+ > boundary. this causes this messages.
+ > In a 32 bit structure, we don't see the message because the allinment
+ > of member fit 32 bit boundary even if packed is specified.
+ >
+ > patch
+ > I Add 32 bit dummy member to fit 64 bit boundary. I tested.
+ > We confirmed this patch fix the problem by IA64 server.
+ >
+ > **************************************************************
+ > ****************
+ > --- linux-2.6.9/drivers/scsi/megaraid/megaraid_ioctl.h.orig
+ > 2006-04-03 17:13:03.000000000 +0900
+ > +++ linux-2.6.9/drivers/scsi/megaraid/megaraid_ioctl.h
+ > 2006-04-03 17:14:09.000000000 +0900
+ > @@ -132,6 +132,10 @@
+ > /* Driver Data: */
+ > void __user * user_data;
+ > uint32_t user_data_len;
+ > +
+ > + /* 64bit alignment */
+ > + uint32_t pad_0xBC;
+ > +
+ > mraid_passthru_t __user *user_pthru;
+ >
+ > mraid_passthru_t *pthru32;
+ > **************************************************************
+ > ****************
+
Release Date : Mon Apr 11 12:27:22 EST 2006 - Seokmann Ju <sju@lsil.com>
Current Version : 2.20.4.8 (scsi module), 2.20.2.6 (cmm module)
Older Version : 2.20.4.7 (scsi module), 2.20.2.6 (cmm module)
diff --git a/Documentation/scsi/ChangeLog.megaraid_sas b/Documentation/scsi/ChangeLog.megaraid_sas
index 0a85a7e..d9e5960 100644
--- a/Documentation/scsi/ChangeLog.megaraid_sas
+++ b/Documentation/scsi/ChangeLog.megaraid_sas
@@ -1,4 +1,20 @@
+1 Release Date : Sun May 14 22:49:52 PDT 2006 - Sumant Patro <Sumant.Patro@lsil.com>
+2 Current Version : 00.00.03.01
+3 Older Version : 00.00.02.04
+
+i. Added support for ZCR controller.
+
+ New device id 0x413 added.
+
+ii. Bug fix : Disable controller interrupt before firing INIT cmd to FW.
+
+ Interrupt is enabled after required initialization is over.
+ This is done to ensure that driver is ready to handle interrupts when
+ it is generated by the controller.
+
+ -Sumant Patro <Sumant.Patro@lsil.com>
+
1 Release Date : Wed Feb 03 14:31:44 PST 2006 - Sumant Patro <Sumant.Patro@lsil.com>
2 Current Version : 00.00.02.04
3 Older Version : 00.00.02.04
diff --git a/Documentation/sound/alsa/DocBook/writing-an-alsa-driver.tmpl b/Documentation/sound/alsa/DocBook/writing-an-alsa-driver.tmpl
index 69866d5..b8dc51c 100644
--- a/Documentation/sound/alsa/DocBook/writing-an-alsa-driver.tmpl
+++ b/Documentation/sound/alsa/DocBook/writing-an-alsa-driver.tmpl
@@ -1172,7 +1172,7 @@
}
/* PCI IDs */
- static struct pci_device_id snd_mychip_ids[] __devinitdata = {
+ static struct pci_device_id snd_mychip_ids[] = {
{ PCI_VENDOR_ID_FOO, PCI_DEVICE_ID_BAR,
PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0, },
....
@@ -1565,7 +1565,7 @@
<informalexample>
<programlisting>
<![CDATA[
- static struct pci_device_id snd_mychip_ids[] __devinitdata = {
+ static struct pci_device_id snd_mychip_ids[] = {
{ PCI_VENDOR_ID_FOO, PCI_DEVICE_ID_BAR,
PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0, },
....
diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt
index 0b62c62..5c3a519 100644
--- a/Documentation/sysctl/fs.txt
+++ b/Documentation/sysctl/fs.txt
@@ -25,6 +25,7 @@ Currently, these files are in /proc/sys/fs:
- inode-state
- overflowuid
- overflowgid
+- suid_dumpable
- super-max
- super-nr
@@ -131,6 +132,25 @@ The default is 65534.
==============================================================
+suid_dumpable:
+
+This value can be used to query and set the core dump mode for setuid
+or otherwise protected/tainted binaries. The modes are
+
+0 - (default) - traditional behaviour. Any process which has changed
+ privilege levels or is execute only will not be dumped
+1 - (debug) - all processes dump core when possible. The core dump is
+ owned by the current user and no security is applied. This is
+ intended for system debugging situations only. Ptrace is unchecked.
+2 - (suidsafe) - any binary which normally would not be dumped is dumped
+ readable by root only. This allows the end user to remove
+ such a dump but not access it directly. For security reasons
+ core dumps in this mode will not overwrite one another or
+ other files. This mode is appropriate when adminstrators are
+ attempting to debug problems in a normal environment.
+
+==============================================================
+
super-max & super-nr:
These numbers control the maximum number of superblocks, and
diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index b0c7ab9..89bf8c2 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -50,7 +50,6 @@ show up in /proc/sys/kernel:
- shmmax [ sysv ipc ]
- shmmni
- stop-a [ SPARC only ]
-- suid_dumpable
- sysrq ==> Documentation/sysrq.txt
- tainted
- threads-max
@@ -211,9 +210,8 @@ Controls the kernel's behaviour when an oops or BUG is encountered.
0: try to continue operation
-1: delay a few seconds (to give klogd time to record the oops output) and
- then panic. If the `panic' sysctl is also non-zero then the machine will
- be rebooted.
+1: panic immediatly. If the `panic' sysctl is also non-zero then the
+ machine will be rebooted.
==============================================================
@@ -311,25 +309,6 @@ kernel. This value defaults to SHMMAX.
==============================================================
-suid_dumpable:
-
-This value can be used to query and set the core dump mode for setuid
-or otherwise protected/tainted binaries. The modes are
-
-0 - (default) - traditional behaviour. Any process which has changed
- privilege levels or is execute only will not be dumped
-1 - (debug) - all processes dump core when possible. The core dump is
- owned by the current user and no security is applied. This is
- intended for system debugging situations only. Ptrace is unchecked.
-2 - (suidsafe) - any binary which normally would not be dumped is dumped
- readable by root only. This allows the end user to remove
- such a dump but not access it directly. For security reasons
- core dumps in this mode will not overwrite one another or
- other files. This mode is appropriate when adminstrators are
- attempting to debug problems in a normal environment.
-
-==============================================================
-
tainted:
Non-zero if the kernel has been tainted. Numeric values, which
diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index 86754eb..7cee902 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -28,6 +28,7 @@ Currently, these files are in /proc/sys/vm:
- block_dump
- drop-caches
- zone_reclaim_mode
+- min_unmapped_ratio
- panic_on_oom
==============================================================
@@ -168,6 +169,19 @@ in all nodes of the system.
=============================================================
+min_unmapped_ratio:
+
+This is available only on NUMA kernels.
+
+A percentage of the file backed pages in each zone. Zone reclaim will only
+occur if more than this percentage of pages are file backed and unmapped.
+This is to insure that a minimal amount of local pages is still available for
+file I/O even if the node is overallocated.
+
+The default is 1 percent.
+
+=============================================================
+
panic_on_oom
This enables or disables panic on out-of-memory feature. If this is set to 1,
diff --git a/Documentation/usb/proc_usb_info.txt b/Documentation/usb/proc_usb_info.txt
index f86550f..22c5331 100644
--- a/Documentation/usb/proc_usb_info.txt
+++ b/Documentation/usb/proc_usb_info.txt
@@ -59,7 +59,7 @@ bind to an interface (or perhaps several) using an ioctl call. You
would issue more ioctls to the device to communicate to it using
control, bulk, or other kinds of USB transfers. The IOCTLs are
listed in the <linux/usbdevice_fs.h> file, and at this writing the
-source code (linux/drivers/usb/devio.c) is the primary reference
+source code (linux/drivers/usb/core/devio.c) is the primary reference
for how to access devices through those files.
Note that since by default these BBB/DDD files are writable only by
diff --git a/Documentation/usb/usb-help.txt b/Documentation/usb/usb-help.txt
index b7c3249..a740859 100644
--- a/Documentation/usb/usb-help.txt
+++ b/Documentation/usb/usb-help.txt
@@ -5,8 +5,7 @@ For USB help other than the readme files that are located in
Documentation/usb/*, see the following:
Linux-USB project: http://www.linux-usb.org
- mirrors at http://www.suse.cz/development/linux-usb/
- and http://usb.in.tum.de/linux-usb/
+ mirrors at http://usb.in.tum.de/linux-usb/
and http://it.linux-usb.org
Linux USB Guide: http://linux-usb.sourceforge.net
Linux-USB device overview (working devices and drivers):
diff --git a/Documentation/usb/usb-serial.txt b/Documentation/usb/usb-serial.txt
index f001cd9..02b0f7b 100644
--- a/Documentation/usb/usb-serial.txt
+++ b/Documentation/usb/usb-serial.txt
@@ -399,10 +399,10 @@ REINER SCT cyberJack pinpad/e-com USB chipcard reader
Prolific PL2303 Driver
- This driver support any device that has the PL2303 chip from Prolific
+ This driver supports any device that has the PL2303 chip from Prolific
in it. This includes a number of single port USB to serial
converters and USB GPS devices. Devices from Aten (the UC-232) and
- IO-Data work with this driver.
+ IO-Data work with this driver, as does the DCU-11 mobile-phone cable.
For any questions or problems with this driver, please contact Greg
Kroah-Hartman at greg@kroah.com
diff --git a/Documentation/x86_64/boot-options.txt b/Documentation/x86_64/boot-options.txt
index 6887d44..6da24e7 100644
--- a/Documentation/x86_64/boot-options.txt
+++ b/Documentation/x86_64/boot-options.txt
@@ -238,6 +238,13 @@ Debugging
pagefaulttrace Dump all page faults. Only useful for extreme debugging
and will create a lot of output.
+ call_trace=[old|both|newfallback|new]
+ old: use old inexact backtracer
+ new: use new exact dwarf2 unwinder
+ both: print entries from both
+ newfallback: use new unwinder but fall back to old if it gets
+ stuck (default)
+
Misc
noreplacement Don't replace instructions with more appropriate ones