summaryrefslogtreecommitdiffstats
path: root/net/docs/bug-triage-suggested-workflow.md
blob: 4b6ee7b36fc2d5fa497c558d1cdf0359b5bf97e1 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
# Chrome Network Bug Triage : Suggested Workflow

[TOC]

## Looking for new crashers

1. Go to [go/chromecrash](https://goto.google.com/chromecrash).

2. For each platform, look through the releases for which releases to
   investigate.  As per bug-triage.txt, this should be the most recent canary,
   the previous canary (if the most recent is less than a day old), and any of
   dev/beta/stable that were released in the last couple of days.

3. For each release, in the "Process Type" frame, click on "browser".

4. At the bottom of the "Magic Signature" frame,  click "limit 1000".  Reported
   crashers are sorted in decreasing order of the number of reports for that
   crash signature.

5. Search the page for *"net::"*.

6. For each found signature:
    * If there is a bug already filed, make sure it is correctly describing the
      current bug (e.g. not closed, or not describing a long-past issue), and
      make sure that if it is a *net* bug, that it is labeled as such.
    * Ignore signatures that only occur once, as memory corruption can easily
      cause one-off failures when the sample size is large enough.
    * Ignore signatures that only come from a single client ID, as individual
      machine malware and breakage can also easily cause one-off failures.
    * Click on the number of reports field to see details of crash. Ignore it
      if it doesn't appear to be a network bug.
    * Otherwise, file a new bug directly from chromecrash.  Note that this may
      result in filing bugs for low- and very-low- frequency crashes.  That's
      ok; the bug tracker is a better tool to figure out whether or not we put
      resources into those crashes than a snap judgement when filing bugs.
    * For each bug you file, include the following information:
        * The backtrace.  Note that the backtrace should not be added to the
          bug if Restrict-View-Google isn't set on the bug as it may contain
          PII.  Filing the bug from the crash reporter should do this
          automatically, but check.
        * The channel in which the bug is seen (canary/dev/beta/stable), its
          frequency in that channel, and its rank among crashers in the
          channel.
        * The frequency of this signature in recent releases.  This information
          is available by:
            1. Clicking on the signature in the "Magic Signature" list
            2. Clicking "Edit" on the dremel query at the top of the page
            3. Removing the "product.version='X.Y.Z.W' AND" string and clicking
               "Update".
            4. Clicking "Limit 1000" in the Product Version list in the
               resulting page (without this, the listing will be restricted to
               the releases in which the signature is most common, which will
               often not include the canary/dev release being investigated).
            5. Choose some subset of that list, or all of it, to include in the
               bug.  Make sure to indicate if there is a defined point in the
               past before which the signature is not present.

## Identifying unlabeled network bugs on the tracker

* Look at new uncomfirmed bugs since noon PST on the last triager's rotation.
  [Use this issue tracker
  query](https://code.google.com/p/chromium/issues/list?can=2&q=status%3Aunconfirmed&sort=-id&num=1000).

* Press **h** to bring up a preview of the bug text.

* Use **j** and **k** to advance through bugs.

* If a bug looks like it might be network/download/safe-browsing related,
  middle click (or command-click on OSX) to open in new tab.

* If a user provides a crash ID for a crasher for a bug that could be
  net-related, look at the crash stack at
  [go/crash](https://goto.google.com/crash), and see if it looks to be network
  related.  Be sure to check if other bug reports have that stack trace, and
  mark as a dupe if so.  Even if the bug isn't network related, paste the stack
  trace in the bug, so no one else has to look up the crash stack from the ID.
    * If there's no other information than the crash ID, ask for more details
      and add the Needs-Feedback label.

* If network causes are possible, ask for a net-internals log (If it's not a
  browser crash) and attach the most specific internals-network label that's
  applicable.  If there isn't an applicable narrower label, a clear owner for
  the issue, or there are multiple possibilities, attach the internals-network
  label and proceed with further investigation.

* If non-network causes also seem possible, attach those labels as well.

## Investigating Cr-Internals-Network bugs

* It's recommended that while on triage duty, you subscribe to the
  Cr-Internals-Network label.  To do this, go to
  https://code.google.com/p/chromium/issues/ and click on "Subscriptions".
  Enter "Cr-Internals-Network" and click submit.

* Look through uncomfirmed and untriaged Cr-Internals-Network bugs,
  prioritizing those updated within the last week. [Use this issue tracker
  query](https://code.google.com/p/chromium/issues/list?can=2&q=Cr%3DInternals-Network+-status%3AAssigned+-status%3AStarted+-status%3AAvailable+&sort=-modified).

* If more information is needed from the reporter, ask for it and add the
  Needs-Feedback label.  If the reporter has answered an earlier request for
  information, remove that label.

* While investigating a new issue, change the status to Untriaged.

* If a bug is a potential security issue (Allows for code execution from remote
  site, allows crossing security boundaries, unchecked array bounds, etc) mark
  it Type-Bug-Security.  If it has privacy implication (History, cookies
  discoverable by an entity that shouldn't be able to do so, incognito state
  being saved in memory or on disk beyond the lifetime of incognito tabs, etc),
  mark it Cr-Privacy.

* For bugs that already have a more specific network label, go ahead and remove
  the Cr-Internals-Network label and move on.

* Try to figure out if it's really a network bug.  See common non-network
  labels section for description of common labels needed for issues incorrectly
  tagged as Cr-Internals-Network.

* If it's not, attach appropriate labels and go no further.

* If it may be a network bug, attach additional possibly relevant labels if
  any, and continue investigating.  Once you either determine it's a
  non-network bug, or figure out accurate more specific network labels, your
  job is done, though you should still ask for a net-internals dump if it seems
  likely to be useful.

* Note that ChromeOS-specific network-related code (Captive portal detection,
  connectivity detection, login, etc) may not all have appropriate more
  specific labels, but are not in areas handled by the network stack team.
  Just make sure those have the OS-Chrome label, and any more specific labels
  if applicable, and then move on.

* Gather data and investigate.
    * Remember to add the Needs-Feedback label whenever waiting for the user to
      respond with more information, and remove it when not waiting on the
      user.
    * Try to reproduce locally.  If you can, and it's a regression, use
      src/tools/bisect-builds.py to figure out when it regressed.
    * Ask more data from the user as needed (net-internals dumps, repro case,
      crash ID from about:crashes, run tests, etc).
    * If asking for an about:net-internals dump, provide this link:
      https://sites.google.com/a/chromium.org/dev/for-testers/providing-network-details.
      Can just grab the link from about:net-internals, as needed.

* Try to figure out what's going on, and which more specific network label is
  most appropriate.

* If it's a regression, browse through the git history of relevant files to try
  and figure out when it regressed.  CC authors / primary reviewers of any
  strongly suspect CLs.

* If you are having trouble with an issue, particularly for help understanding
  net-internals logs, email the public net-dev@chromium.org list for help
  debugging.  If it's a crasher, or for some other reason discussion needs to
  be done in private, use chrome-network-debugging@google.com.  TODO(mmenke):
  Write up a net-internals tips and tricks docs.

* If it appears to be a bug in the unowned core of the network stack (i.e. no
  sublabel applies, or only the Cr-Internals-Network-HTTP sublabel applies, and
  there's no clear owner), try to figure out the exact cause.

## Monitoring UMA histograms and gasper alerts

For each Gasper alert that fires, determine if it's a real alert and file a bug
if so.

* Don't file if the alert is coincident with a major volume change.  The volume
  at a particular date can be determined by hovering the mouse over the
  appropriate location on the alert line.

* Don't file if the alert is on a graph with very low volume (< ~200 data
  points); it's probably noise, and we probably don't care even if it isn't.

* Don't file if the graph is really noisy (but eyeball it to decide if there is
  an underlying important shift under the noise).

* Don't file if the alert is in the "Known Ignorable" list:
    * SimpleCache on Windows
    * DiskCache on Android.

For each Gasper alert, respond to chrome-network-debugging@google.com with a
summary of the action you've taken and why, including issue link if an issue
was filed.

## Investigating crashers

* Only investigate crashers that are still occurring, as identified by above
  section.  If a search on go/crash indicates a crasher is no longer occurring,
  mark it as WontFix.

* On Windows, you may want to look for weird dlls associated with the crashes.
    This generally needs crashes from a fair number of different users to reach
    any conclusions.
    * To get a list of loaded modules in related crash dumps, select
      modules->3rd party in the left pane.  It can be difficult to distinguish
      between safe dlls and those likely to cause problems, but even if you're
      not that familiar with windows, some may stick out.  Anti-virus programs,
      download managers, and more gray hat badware often have meaningful dll
      names or dll paths (Generally product names or company names).  If you
      see one of these in a significant number of the crash dumps, it may well
      be the cause.
    * You can also try selecting the "has malware" option, though that's much
      less reliable than looking manually.

* See if the same users are repeatedly running into the same issue.  This can
  be accomplished by search for (Or clicking on) the client ID associated with
  a crash report, and seeing if there are multiple reports for the same crash.
  If this is the case, it may be also be malware, or an issue with an unusual
  system/chrome/network config.

* Dig through crash reports to figure out when the crash first appeared, and
  dig through revision history in related files to try and locate a suspect CL.
  TODO(mmenke):  Add more detail here.

* Load crash dumps, try to figure out a cause.  See
  http://www.chromium.org/developers/crash-reports for more information

## Dealing with old bugs

* For all network issues (Even those with owners, or a more specific labels):

    * If the issue has had the Needs-Feedback label for over a month, verify it
      is waiting on feedback from the user.  If not, remove the label.
      Otherwise, go ahead and mark the issue WontFix due to lack of response
      and suggest the user file a new bug if the issue is still present. [Use
      this issue tracker query for old Needs-Feedback
      issues](https://code.google.com/p/chromium/issues/list?can=2&q=Cr%3AInternals-Network%20Needs=Feedback+modified-before%3Atoday-30&sort=-modified).

    * If a bug is over 2 months old, and the underlying problem was never
      reproduced or really understood:
        * If it's over a year old, go ahead and mark the issue as Archived.
        * Otherwise, ask reporters if the issue is still present, and attach
          the Needs-Feedback label.

* Old unconfirmed or untriaged Cr-Internals-Network issues can be investigated
  just like newer ones.  Crashers should generally be given higher priority,
  since we can verify if they still occur, and then newer issues, as they're
  more likely to still be present, and more likely to have a still responsive
  bug reporter.