summaryrefslogtreecommitdiffstats
path: root/chrome/common/extensions/docs/static/ttsEngine.html
blob: a98aae1d073ebe69bcd8ea8a315090849b6ccd7e (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
<p id="classSummary">
Use the <code>chrome.ttsEngine</code> module to
implement a text-to-speech (TTS) engine using an extension. If your
extension registers using this API, it will receive events containing
an utterance to be spoken and other parameters when any extension or packaged
app uses the
<a href="tts.html">tts</a>
module to generate speech. Your extension can then use any available
web technology to synthesize and output the speech, and send events back
to the calling function to report the status.
</p>


<h2 id="overview">Overview</h2>

<p>An extension can register itself as a speech engine. By doing so, it
can intercept some or all calls to functions such as
<a href="tts.html#method-speak"><code>speak()</code></a> and
<a href="tts.html#method-stop"><code>stop()</code></a>
and provide an alternate implementation.
Extensions are free to use any available web technology
to provide speech, including streaming audio from a server, HTML5 audio,
Native Client, or Flash. An extension could even do something different
with the utterances, like display closed captions in a pop-up window or
send them as log messages to a remote server.</p>

<h2 id="manifest">Manifest</h2>

<p>To implement a TTS engine, an extension must
declare the "ttsEngine" permission and then declare all voices
it provides in the extension manifest, like this:</p>

<pre>{
  "name": "My TTS Engine",
  "version": "1.0",
  <b>"permissions": ["ttsEngine"],
  "tts_engine": {
    "voices": [
      {
        "voice_name": "Alice",
        "lang": "en-US",
        "gender": "female",
        "event_types": ["start", "marker", "end"]
      },
      {
        "voice_name": "Pat",
        "lang": "en-US",
        "event_types": ["end"]
      }
    ]
  },</b>
  "background_page": "background.html",
}</pre>

<p>An extension can specify any number of voices.</p>

<p>The <code>voice_name</code> parameter is required. The name should be
descriptive enough that it identifies the name of the voice and the
engine used. In the unlikely event that two extensions register voices
with the same name, a client can specify the ID of the extension that
should do the synthesis.</p>

<p>The <code>gender</code> parameter is optional. If your voice corresponds
to a male or female voice, you can use this parameter to help clients
choose the most appropriate voice for their application.</p>

<p>The <code>lang</code> parameter is optional, but highly recommended.
Almost always, a voice can synthesize speech in just a single language.
When an engine supports more than one language, it can easily register a
separate voice for each language. Under rare circumstances where a single
voice can handle more than one language, it's easiest to just list two
separate voices and handle them using the same logic internally. However,
if you want to create a voice that will handle utterances in any language,
leave out the <code>lang</code> parameter from your extension's manifest.</p>

<p>Finally, the <code>event_types</code> parameter is required if the engine can
send events to update the client on the progress of speech synthesis.
At a minimum, supporting the <code>'end'</code> event type to indicate
when speech is finished is highly recommended, otherwise Chrome cannot
schedule queued utterances.</p>

<p class="note">
<strong>Note:</strong> If your TTS engine does not support
the <code>'end'</code> event type, Chrome cannot queue utterances
because it has no way of knowing when your utterance has finished. To
help mitigate this, Chrome passes an additional boolean <code>enqueue</code>
option to your engine's onSpeak handler, giving you the option of
implementing your own queueing.  This is discouraged because then
clients are unable to queue utterances that should get spoken by different
speech engines.</p>

<p>The possible event types that you can send correspond to the event types
that the <code>speak()</code> method receives:</p>

<ul>
  <li><code>'start'</code>: The engine has started speaking the utterance.
  <li><code>'word'</code>: A word boundary was reached. Use
          <code>event.charIndex</code> to determine the current speech
          position.
  <li><code>'sentence'</code>: A sentence boundary was reached. Use
          <code>event.charIndex</code> to determine the current speech
          position.
  <li><code>'marker'</code>: An SSML marker was reached. Use
          <code>event.charIndex</code> to determine the current speech
          position.
  <li><code>'end'</code>: The engine has finished speaking the utterance.
  <li><code>'error'</code>: An engine-specific error occurred and
          this utterance cannot be spoken.
          Pass more information in <code>event.errorMessage</code>.
</ul>

<p>The <code>'interrupted'</code> and <code>'cancelled'</code> events are
not sent by the speech engine; they are generated automatically by Chrome.</p>

<p>Text-to-speech clients can get the voice information from your
extension's manifest by calling
<a href="tts.html#method-getVoices">getVoices()</a>,
assuming you've registered speech event listeners as described below.</p>

<h2 id="handling_speech_events">Handling speech events</h2>

<p>To generate speech at the request of clients, your extension must
register listeners for both <code>onSpeak</code> and <code>onStop</code>,
like this:</p>

<pre>var speakListener = function(utterance, options, sendTtsEvent) {
  sendTtsEvent({'event_type': 'start', 'charIndex': 0})

  // (start speaking)

  sendTtsEvent({'event_type': 'end', 'charIndex': utterance.length})
};

var stopListener = function() {
  // (stop all speech)
};

chrome.ttsEngine.onSpeak.addListener(speakListener);
chrome.ttsEngine.onStop.addListener(stopListener);</pre>

<p class="warning">
<b>Important:</b>
If your extension does not register listeners for both
<code>onSpeak</code> and <code>onStop</code>, it will not intercept any
speech calls, regardless of what is in the manifest.</p>

<p>The decision of whether or not to send a given speech request to an
extension is based solely on whether the extension supports the given voice
parameters in its manifest and has registered listeners
for <code>onSpeak</code> and <code>onStop</code>. In other words,
there's no way for an extension to receive a speech request and
dynamically decide whether to handle it.</p>