summaryrefslogtreecommitdiffstats
path: root/net/sdch/README.md
blob: 8200ee50528128417ca44e0269df9adb78cb9f47 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
# SDCH 

"SDCH" stands for "Shared Dictionary Compression over HTTP".  It is a
protocol for compressing URL responses used when the server and
the client share a dictionary that can be referred to for
compression/encoding and decompression/decoding.  The details of the
SDCH protocol are specified in 
[the spec](https://docs.google.com/a/chromium.org/document/d/1REMkwjXY5yFOkJwtJPjCMwZ4Shx3D9vfdAytV_KQCUo/edit?pli=1)
(soon to be moved to github) but in brief:

1. If the client supports SDCH decoding, it advertises "sdch" in the
   "Accept-Encoding" header.
2. If the server could have encoded a response with a dictionary (but
   didn't, because the client didn't have the dictionary), it includes
   an advisory "Get-Dictionary: <url>" header in its response.
3. If the client has a dictionary that the server has previously
   advertised as being usable for encoding a particular requests, it
   advertises that dictionary as being available via an
   "Avail-Dictionary: <hash>" header in the request.
4. If the server chooses to encode a response with a dictionary, it
   includes "sdch" in a "Content-Encoding" header, in which case the
   body will reference the dictionary to be used for decoding (which
   must be one the client advertised in the original request).
   Encodings may be chained; often responses are SDCH encoded, and then
   gzip encoded.

## SDCH in Chromium: Overview

The SDCH implementation in Chromium is spread across several classes
in several different directories:

* SdchManager (in net/base): This class contains all
  dictionaries currently known to Chromium.  Each URLRequestContext
  points to an SdchManager; at the chrome/ level, there is one
  SdchManager per profile.  URLRequestHttpJob consults the SdchManager
  for what dictionaries should be advertised with a URLRequest, and
  notifies the SdchManager whenever it sees a "Get-Dictionary"
  header.  The SdchManager does *not* mediate fetching of
  dictionaries; it is conceptually layered underneath URLRequest and
  has no knowledge of URLRequests.  There are several nested classes of
  SdchManager (Dictionary, DictionarySet) used in the SDCH
  implementation; see sdch_manager.h for details.
* SdchObserver (in net/base).  This is an Abstract Base
  Class which other classes may implement if those classes wish to
  receive notifications about SDCH events.  Such classes should also
  register as observers with the SdchManager.
* SdchFilter (int net/filter).  This class is derived from net::Filter
  that is used for decoding the SDCH response; it cooperates with
  SdchManager and the URLRequestJob to decode SDCH encoded responses. 
* SdchDictionaryFetcher (int net/url_request):
  This class implements the nuts&bolts of fetching an SDCH
  dictionary.  
* SdchOwner (in net/sdch): This class is an SdchObserver.
  It contains policy for the SDCH implementation, including mediation
  of fetching dictionaries, prioritization and eviction of
  dictionaries in response to new fetches, and constraints on the
  amount of memory that is usable by SDCH dictionaries.  It initiates
  dictionary fetches as appropriate when it receives notification of
  a "Get-Dictionary" header from the SdchManager.

A net/ embedder should instantiate an SdchManager and an SdchOwner,
and guarantee that the SdchManager outlive the SdchOwner.

Note the layering of the above classes:

1. The SdchManager and SdchOwner classes have no knowledge of
   URLRequests.  URLRequest is dependent on those classes, not the
   reverse.
2. SdchDictionaryFetcher is dependent on URLRequest, but is still a
   utility class exported by the net/ library for use by higher levels.
3. SdchOwner manages the entire system on behalf of the embedder.  The
   intent is that the embedder can change policies through methods on
   SdchOwner, while letting the SdchOwner class take care of policy
   implementation. 

## SDCH in Chromium: Debugging

Data that is useful in debugging SDCH problems:

* The SDCH UMA prefix is "Sdch3", and histograms that have been found
  useful for debugging include 
    * ProblemCodes_* (though this requires trawling the source for each bucket).
    * ResponseCorruptionDetection.{Cached,Uncached}: An attempt to make
      sense of the twisted mess in SdchFilter::ReadFilteredData mentioned
      above. 
    * BlacklistReason: Why requests avoid using SDCH when they could use
      it. 
* about:net-internals has an SDCH tab, showing loaded dictionaries and
  other information.  Searching in net-internals for "Get-Dictionary",
  the URLRequest that actually fetches that dictionary, and then the
  hash of that dictionary (often used as the file name) can also be
  useful.

## SDCH in Chromium: Gotchas and corner cases

There are a couple of known issues in SDCH in Chromium that developers
in this space should be aware of:

* As noted in the spec above, there have historically been problems
  with middleboxes stripping or corrupting SDCH encoded responses.
  For this reason, the protocol requires that if a server is not using
  SDCH encoding when it has previously advertised the availability of
  doing such, it includes an "X-SDCH-Encode: 0" header in the
  response.  Servers don't always do this (especially multi-servers),
  and that can result in failed decodings and requests being dropped
  on the floor.  The code to handle this is a twisted mess (see
  SdchFilter::ReadFilteredData()) and problems have often been seen
  from or associated with it.
* If the decoding logic trips over a problem, it will often blacklist
  the server in question, temporarily (if it can recover that request)
  or permanently (if it can't).  This can lead to a mysterious lack of
  SDCH encoding when it's expected to be present.
* The network cache currently stores the response precisely as received from
  the network.  This means that requests that don't advertise SDCH
  may get a cached value that is SDCH encoded, and requests that do
  advertise SDCH may get a cached value that is not SDCH encoded.
  The second case is handled transparently, but the first case may
  lead to request failure.