Update net triage docs.

In particular, rework triage instructions, and clean up net-internals doc a little. NOTRY=true BUG=none Review URL: https://codereview.chromium.org/1774733002 Cr-Commit-Position: refs/heads/master@{#380421}
author: mmenke <mmenke@chromium.org> 2016-03-10 08:51:33 -0800
committer: Commit bot <commit-bot@chromium.org> 2016-03-10 16:53:01 +0000
commit: 212fe432b87a4ee2aaaff40a55239b4740eb09e5 (patch)
tree: 8e43c3d5d7758b6a28fa33794454a3c115913af1 /net
parent: 381b049a926507ef30b493deb57e12528d0ef3fd (diff)
download: chromium_src-212fe432b87a4ee2aaaff40a55239b4740eb09e5.zip
chromium_src-212fe432b87a4ee2aaaff40a55239b4740eb09e5.tar.gz
chromium_src-212fe432b87a4ee2aaaff40a55239b4740eb09e5.tar.bz2
3 files changed, 155 insertions, 154 deletions
diff --git a/net/docs/bug-triage-suggested-workflow.md b/net/docs/bug-triage-suggested-workflow.md
index 2d49cb3..a54d008 100644
--- a/net/docs/bug-triage-suggested-workflow.md
+++ b/net/docs/bug-triage-suggested-workflow.md
@@ -2,59 +2,6 @@
 
 [TOC]
 
-## Looking for new crashers
-
-1. Go to [go/chromecrash](https://goto.google.com/chromecrash).
-
-2. For each platform, look through the releases for which releases to
-   investigate.  As per bug-triage.txt, this should be the most recent canary,
-   the previous canary (if the most recent is less than a day old), and any of
-   dev/beta/stable that were released in the last couple of days.
-
-3. For each release, in the "Process Type" frame, click on "browser".
-
-4. At the bottom of the "Magic Signature" frame,  click "limit 1000".  Reported
-   crashers are sorted in decreasing order of the number of reports for that
-   crash signature.
-
-5. Search the page for *"net::"*.
-
-6. For each found signature:
-    * If there is a bug already filed, make sure it is correctly describing the
-      current bug (e.g. not closed, or not describing a long-past issue), and
-      make sure that if it is a *net* bug, that it is labeled as such.
-    * Ignore signatures that only occur once, as memory corruption can easily
-      cause one-off failures when the sample size is large enough.
-    * Ignore signatures that only come from a single client ID, as individual
-      machine malware and breakage can also easily cause one-off failures.
-    * Click on the number of reports field to see details of crash. Ignore it
-      if it doesn't appear to be a network bug.
-    * Otherwise, file a new bug directly from chromecrash.  Note that this may
-      result in filing bugs for low- and very-low- frequency crashes.  That's
-      ok; the bug tracker is a better tool to figure out whether or not we put
-      resources into those crashes than a snap judgement when filing bugs.
-    * For each bug you file, include the following information:
-        * The backtrace.  Note that the backtrace should not be added to the
-          bug if Restrict-View-Google isn't set on the bug as it may contain
-          PII.  Filing the bug from the crash reporter should do this
-          automatically, but check.
-        * The channel in which the bug is seen (canary/dev/beta/stable), its
-          frequency in that channel, and its rank among crashers in the
-          channel.
-        * The frequency of this signature in recent releases.  This information
-          is available by:
-            1. Clicking on the signature in the "Magic Signature" list
-            2. Clicking "Edit" on the dremel query at the top of the page
-            3. Removing the "product.version='X.Y.Z.W' AND" string and clicking
-               "Update".
-            4. Clicking "Limit 1000" in the Product Version list in the
-               resulting page (without this, the listing will be restricted to
-               the releases in which the signature is most common, which will
-               often not include the canary/dev release being investigated).
-            5. Choose some subset of that list, or all of it, to include in the
-               bug.  Make sure to indicate if there is a defined point in the
-               past before which the signature is not present.
-
 ## Identifying unlabeled network bugs on the tracker
 
 * Look at new uncomfirmed bugs since noon PST on the last triager's rotation.
@@ -72,8 +19,7 @@
   related.  Be sure to check if other bug reports have that stack trace, and
   mark as a dupe if so.  Even if the bug isn't network related, paste the stack
   trace in the bug, so no one else has to look up the crash stack from the ID.
-    * If there's no other information than the crash ID, ask for more details
-      and add the Needs-Feedback label.
+    * If there's just a blank form and a crash ID, just ignore the bug.
 
 * If network causes are possible, ask for a net-internals log (If it's not a
   browser crash) and attach the most specific internals-network label that's
@@ -96,11 +42,10 @@
 
 * Look through uncomfirmed and untriaged component=Internals>Network bugs,
   prioritizing those updated within the last week. [Use this issue tracker
-  query](https://bugs.chromium.org/p/chromium/issues/list?can=2&q=component%3DInternals%3ENetwork+-status%3AAssigned+-status%3AStarted+-status%3AAvailable&sort=-modified&colspec=ID+Pri+M+Stars+ReleaseBlock+Component+Status+Owner+Summary+OS+Modified&x=m&y=releaseblock&cells=ids).
+  query](https://bugs.chromium.org/p/chromium/issues/list?can=2&q=component%3DInternals%3ENetwork+status%3AUnconfirmed,Untriaged+-label:Needs-Feedback&sort=-modified).
 
 * If more information is needed from the reporter, ask for it and add the
-  Needs-Feedback label.  If the reporter has answered an earlier request for
-  information, remove that label.
+  Needs-Feedback label.
 
 * While investigating a new issue, change the status to Untriaged.
 
@@ -112,7 +57,8 @@
   mark it with component Privacy.
 
 * For bugs that already have a more specific network component, go ahead and
-  remove the Internals>Network component and move on.
+  remove the Internals>Network component to get them off the next triager's
+  radar and move on.
 
 * Try to figure out if it's really a network bug.  See common non-network
   components section for description of common components for issues incorrectly
@@ -161,14 +107,7 @@
   subcomponent applies, or only the Internals>Network>HTTP subcomponent
   applies, and there's no clear owner), try to figure out the exact cause.
 
-## Monitoring UMA histograms and Chirp/Gasper alerts
-
-Sign up to chrome-network-debugging@google.com mailing list to receive automated
-e-mails about UMA alerts.  Chirp is the new alert system, sending automated
-e-mails with sender address finch-chirp@google.com.  Gasper is the old alert
-system, sending automated e-mails with sender address gasper-alerts@google.com.
-While Chirp is of higher priority, Gasper is not deprecated yet, so both alerts
-should be monitored for the time being.
+## Investigate UMA notifications
 
 For each alert that fires, determine if it's a real alert and file a bug if so.
 
@@ -186,8 +125,56 @@ For each alert that fires, determine if it's a real alert and file a bug if so.
     * SimpleCache on Windows
     * DiskCache on Android.
 
-For each alert, respond to chrome-network-debugging@google.com with a summary of
-the action you've taken and why, including issue link if an issue was filed.
+## Looking for new crashers
+
+1. Go to [go/chromecrash](https://goto.google.com/chromecrash).
+
+2. For each platform, look through the releases for which releases to
+   investigate.  As per bug-triage.txt, this should be the most recent canary,
+   the previous canary (if the most recent is less than a day old), and any of
+   dev/beta/stable that were released in the last couple of days.
+
+3. For each release, in the "Process Type" frame, click on "browser".
+
+4. At the bottom of the "Magic Signature" frame,  click "limit 1000" (Or reduce
+   the limit to 100 first, as that's all the triager needs to look at).
+   Reported crashers are sorted in decreasing order of the number of reports for
+   that crash signature.
+
+5. Search the page for *"net::"*.
+
+6. For each found signature:
+    * Ignore signatures that only occur once or twice, as memory corruption can
+      easily cause one-off failures when the sample size is large enough.  Also
+      ignore crashers that are not in the top 100 for that platform / release.
+    * If there is a bug already filed, make sure it is correctly describing the
+      current bug (e.g. not closed, or not describing a long-past issue), and
+      make sure that if it is a *net* bug, that it is labeled as such.
+    * Ignore signatures that only come from one or two client IDs, as individual
+      machine malware and breakage can cause one-off failures.
+    * Click on the number of reports field to see details of crash. Ignore it
+      if it doesn't appear to be a network bug.
+    * Otherwise, file a new bug directly from chromecrash.
+    * For each bug you file, include the following information:
+        * The backtrace.  Note that the backtrace should not be added to the
+          bug if Restrict-View-Google isn't set on the bug as it may contain
+          PII.  Filing the bug from the crash reporter should do this
+          automatically, but check.
+        * The channel in which the bug is seen (canary/dev/beta/stable), and its
+          rank among crashers in the channel.
+        * The frequency of this signature in recent releases.  This information
+          is available by:
+            1. Clicking on the signature in the "Magic Signature" list
+            2. Clicking "Edit" on the dremel query at the top of the page
+            3. Removing the "product.version='X.Y.Z.W' AND" string and clicking
+               "Update".
+            4. Clicking "Limit 1000" in the Product Version list in the
+               resulting page (without this, the listing will be restricted to
+               the releases in which the signature is most common, which will
+               often not include the canary/dev release being investigated).
+            5. Choose some subset of that list, or all of it, to include in the
+               bug.  Make sure to indicate if there is a defined point in the
+               past before which the signature is not present.
 
 ## Investigating crashers
 
@@ -221,26 +208,3 @@ the action you've taken and why, including issue link if an issue was filed.
 
 * Load crash dumps, try to figure out a cause.  See
   http://www.chromium.org/developers/crash-reports for more information
-
-## Dealing with old bugs
-
-* For all network issues (Even those with owners, or a more specific component):
-
-    * If the issue has had the Needs-Feedback label for over a month, verify it
-      is waiting on feedback from the user.  If not, remove the label.
-      Otherwise, go ahead and mark the issue WontFix due to lack of response
-      and suggest the user file a new bug if the issue is still present. [Use
-      this issue tracker query for old Needs-Feedback
-      issues](https://code.google.com/p/chromium/issues/list?can=2&q=component%3AInternals>Network%20Needs=Feedback+modified-before%3Atoday-30&sort=-modified).
-
-    * If a bug is over 2 months old, and the underlying problem was never
-      reproduced or really understood:
-        * If it's over a year old, go ahead and mark the issue as Archived.
-        * Otherwise, ask reporters if the issue is still present, and attach
-          the Needs-Feedback label.
-
-* Old unconfirmed or untriaged Internals>Network issues can be investigated
-  just like newer ones.  Crashers should generally be given higher priority,
-  since we can verify if they still occur, and then newer issues, as they're
-  more likely to still be present, and more likely to have a still responsive
-  bug reporter.
diff --git a/net/docs/bug-triage.md b/net/docs/bug-triage.md
index 8a03d85..ac32403 100644
--- a/net/docs/bug-triage.md
+++ b/net/docs/bug-triage.md
@@ -6,16 +6,16 @@ label seems suitable.
 
 ## Responsibilities
 
-### Required:
-* Identify new crashers
-* Identify new network issues.
-* Request data about recent Internals>Network issue.
-* Investigate each recent Internals>Network issue.
-* Monitor UMA histograms and Chirp/Gasper alerts.
+### Required, in rough order of priority:
+* Identify new network bugs on the tracker.
+* Investigate UMA notifications.
+* Investigate recent Internals>Network issues with no subcomponent.
+* Follow up on Needs-Feedback issues for all network components.
+* Identify and file bugs for significant new crashers.
 
 ### Best effort:
-* Investigate unowned and owned-but-forgotten net/ crashers
-* Investigate old bugs
+* Investigate unowned and owned-but-forgotten net/ crashers.
+* Investigate old bugs.
 * Close obsolete bugs.
 
 All of the above is to be done on each rotation.  These responsibilities should
@@ -30,67 +30,104 @@ uniform, predictable two day commitment for all triagers.
 
 ### Required:
 
-* Identify new crashers that are potentially network related.  You should check
-  the most recent canary, the previous canary (if the most recent less than a
-  day old), and any of dev/beta/stable that were released in the last couple of
-  days, for each platform.  File Internals>Network bugs on the tracker when
-  new crashers are found.
-
-* Identify new network bugs, both on the bug tracker and on the crash server.
-  All Unconfirmed issues filed during your triage rotation should be scanned,
-  and, for suspected network bugs, a network component assigned.  A triager is
-  responsible for looking at bugs reported from noon PST / 3:00 pm EST of the
-  last day of the previous triager's rotation until the same time on the last
-  day of their rotation.
-
-* Investigate each recent (new comment within the past week or so)
-  Internals>Network issue, driving getting information from reporters as
-  needed, until you can do one of the following:
+* Identify new network bugs on the bug tracker.  All Unconfirmed issues filed
+  during your triage rotation should be scanned, and, for suspected network
+  bugs, a network component assigned and an about:net-internals log requested.
+  A triager is responsible for looking at bugs reported from noon PST / 3:00 pm
+  EST of the last day of the previous triager's rotation until the same time on
+  the last day of their rotation.  Once you've changed labels on a bug, mark it
+  Untriaged, so other triagers sorting through Unconfirmed bugs won't see it.
+  
+    * For desktop bugs, ask for a net-internals log and give the user a link to
+      https://sites.google.com/a/chromium.org/dev/for-testers/providing-network-details
+      (A link there appears on about:net-internals, for easy reference) for
+      instructions.  On mobile, point them to about:net-export.  In either case,
+      attach the Needs-Feedback label.
+
+* Investigate UMA notifications.
+
+    * UMA notifications ("chirps") are alerts based on UMA histograms that are
+      sent to   chrome-network-debugging@google.com.  Triagers should subscribe
+      to this list.  When an alert fires, the triager should determine if the
+      alert looks to be real and file a bug with the appropriate label if so.
+      Note that if no label more specific than Internals>Network is appropriate,
+      the responsibility remains with the triager to continue investigating the
+      bug, as above.
+      
+    * The triager is responsible for looking at any notification previous
+      triagers did not, so when an issue is investigated, the person who did
+      so should respond to chrome-network-debugging@google.com with a short
+      email, describing their conclusions.  Future triagers can then use the
+      fact an alert was responded to as an inidicator of which of them need
+      to be followed up on.
+
+* Investigate [Uncomfirmed / Untriaged Internals>Network issues that don't
+  belong to a more specific network component](https://bugs.chromium.org/p/chromium/issues/list?can=2&q=component%3DInternals%3ENetwork+status%3AUnconfirmed,Untriaged+-label:Needs-Feedback&sort=-modified),
+  prioritizing the most recent issues, ones with the most responsive reporters,
+  and major crashers.  This will generally take up the majority of your time as
+  triager. Continue digging until you can do one of the following:
 
     * Mark it as *WontFix* (working as intended, obsolete issue) or a
       duplicate.
 
     * Mark it as a feature request.
 
-    * Remove the Internals>Network component, replacing it with at least one
-      more specific network component or non-network component.  Promptly adding
-      non-network components when appropriate is important to get new bugs in front
-      of someone familiar with the relevant code, and to remove them from the
-      next triager's radar.  Because of the way the bug report wizard works, a
-      lot of bugs incorrectly end up with the network component.
-
-    * The issue is assigned to an appropriate owner.
-
-    * If there is no more specific component for a bug, it should be investigated
-      until we have a good understanding of the cause of the problem, and some
-      idea how it should be fixed, at which point its status should be set to
-      Available.  Future triagers should ignore bugs with this status, unless
-      investigating stale bugs.
+    * Mark it as Needs-Feedback.
 
-* Monitor UMA histograms and Chirp/Gasper alerts.
-
-    * For each Chirp and Gasper alert that fires, the triager should determine
-      if the alert is real (not due to noise), and file a bug with the
-      appropriate component if so.  Note that if no component more specific than
-      Internals>Network is appropriate, the responsibility remains with the
-      triager to continue investigating the bug, as above.
+    * Remove the Internals>Network component, replacing it with at least one
+      more specific network component or non-network component. Replacing the
+      Internals>Network component gets it off the next triager's radar, and
+      in front of someone more familiar with the relevant code.  Note that
+      due to the way the bug report wizard works, a lot of bugs incorrectly end
+      up with the network component.
+
+    * The issue is assigned to an appropriate owner, and make sure to mark it
+      as "assigned" so the next triager doesn't run into it.
+
+    * If there is no more specific component for a bug, it should be
+      investigated by the triager until we have a good understanding of the
+      cause of the problem, and some idea how it should be fixed, at which point
+      its status should be set to Available.  Future triagers should ignore bugs
+      with this status, unless investigating stale bugs.
+
+* Follow up on [Needs-Feedback issues for all components owned by the network
+  stack team](https://bugs.chromium.org/p/chromium/issues/list?can=2&q=component%3AInternals%3ENetwork%2CUI>Browser>Downloads+-component%3AInternals%3ENetwork%3EDataProxy+-component%3AInternals%3ENetwork%3EDataUse+-component%3AInternals%3ENetwork%3EVPN+Needs%3DFeedback).
+
+    * Remove label once feedback is provided.  Continue to investigate, if
+      the previous section applies.
+
+    * If the Needs-Feedback label has been present for one week, ping the
+      reporter.
+
+    * Archive after two weeks with no feedback, telling users to file a new
+      bug if they still have the issue, with the requested information, unless
+      the reporter indicates they'll provide data when they can.  In that case,
+      use your own judgment for further pings or archiving.
+
+* Identify significant new browser process
+  [crashers](https://goto.google.com/chromecrash) that are potentially network
+  related.  You should look at crashes for the most recent canary that has at
+  least a day of data, and if there's been a dev or beta release from the start
+  of the last triager's shift to the start of yours, you should also look at
+  that once it has at least a day of data.  Recent releases available
+  [here](https://omahaproxy.appspot.com/).  If both dev and beta have been
+  released in that period, just look at beta.  File Internals>Network bugs on
+  the tracker when new crashers are found.  Bugs  should only be filed for
+  crashes that are both in the top 100 for each release and occurred for more
+  than two users.
 
 ### Best Effort (As you have time):
 
+* Investigate old bugs, and bugs associated with Internals>Network
+  subcomponents.
+
 * Investigate unowned and owned but forgotten net/ crashers that are still
   occurring (As indicated by
   [go/chromecrash](https://goto.google.com/chromecrash)), prioritizing frequent
   and long standing crashers.
 
-* Investigate old bugs, prioritizing the most recent.
-
 * Close obsolete bugs.
 
-If you've investigated an issue (in code you don't normally work on) to an
-extent that you know how to fix it, and the fix is simple, feel free to take
-ownership of the issue and create a patch while on triage duty, but other tasks
-should take priority.
-
 See [bug-triage-suggested-workflow.md](bug-triage-suggested-workflow.md) for
 suggested workflows.
 
diff --git a/net/docs/crash-course-in-net-internals.md b/net/docs/crash-course-in-net-internals.md
index 5dcfb24..fa2094c 100644
--- a/net/docs/crash-course-in-net-internals.md
+++ b/net/docs/crash-course-in-net-internals.md
@@ -5,8 +5,8 @@ with about:net-internals, with some commonly useful tips and tricks.  This
 document is aimed more at how to get started using some of its features to
 investigate bug reports, rather than as a feature overview.
 
-It would probably be useful to read [life-of-a-url-request.md](
-life-of-a-url-request.md) before this document.
+It would probably be useful to read
+[life-of-a-url-request.md](life-of-a-url-request.md) before this document.
 
 # What Data Net-Internals Contains
 
@@ -34,11 +34,12 @@ covers), but it's good to be aware of this distinction.
 
 The Event View shows events logged by the NetLog.  The NetLog model is that
 long-lived network stack objects, called sources, emit events over their
-lifetime.  Some events have a beginning and end point (during which other
-subevents may occur), and some only occur at a single point in time.  Generally
-only one event can be occuring for a source at a time.  If there can be multiple
-events doing completely independent thing, the code often uses new sources to
-represent the parallelism.
+lifetime.  When looking at the code, a "BoundNetLog" object contains a source
+ID, and a pointer to the NetLog the source emits events to.  Some events have a
+beginning and end point (during which other subevents may occur), and some only
+occur at a single point in time.  Generally only one event can be occuring for a
+source at a time.  If there can be multiple events doing completely independent
+thing, the code often uses new sources to represent the parallelism.
 
 "Sources" correspond to certain net objects, however, multiple layers of net/
 will often log to a single source.  Here are the main source types and what they
@@ -99,9 +100,8 @@ in a lot of cases:
 an error of some sort (red background).  Cache errors are often non-fatal, so
 you should generally ignore those, and look for a more interesting one.
 
-* "type:URL_REQUEST sort:duration" will show the lonest-lived requests (as of
-when about:net-internals was opened) first.  This is often useful in finding
-hung or slow requests.
+* "type:URL_REQUEST sort:duration" will show the longest-lived requests first.
+This is often useful in finding hung or slow requests.
 
 For a list of other filter commands, you can mouse over the question mark on
 about:net-internals.
author	mmenke <mmenke@chromium.org>	2016-03-10 08:51:33 -0800
committer	Commit bot <commit-bot@chromium.org>	2016-03-10 16:53:01 +0000
commit	212fe432b87a4ee2aaaff40a55239b4740eb09e5 (patch)
tree	8e43c3d5d7758b6a28fa33794454a3c115913af1 /net
parent	381b049a926507ef30b493deb57e12528d0ef3fd (diff)
download	chromium_src-212fe432b87a4ee2aaaff40a55239b4740eb09e5.zip chromium_src-212fe432b87a4ee2aaaff40a55239b4740eb09e5.tar.gz chromium_src-212fe432b87a4ee2aaaff40a55239b4740eb09e5.tar.bz2