chrome/common/extensions/docs/static/experimental.tts.html


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189

<p id="classSummary">
Use the <code>chrome.experimental.tts</code> module to play synthesized
text-to-speech (TTS) from your extension or packaged app.
See also the related
<a href="experimental.ttsEngine.html">experimental.ttsEngine</a>
module which allows an extension to implement a speech engine.
</p>

<p class="note"><b>Give us feedback:</b> If you have suggestions,
especially changes that should be made before stabilizing the first
version of this API, please send your ideas to the
<a href="http://groups.google.com/a/chromium.org/group/chromium-extensions">chromium-extensions</a>
group.</p>

<h2 id="overview">Overview</h2>

<p>To enable this experimental API, visit
<b>chrome://flags</b> and enable <b>Experimental Extension APIs</b>.

<p>Chrome provides native support for speech on Windows (using SAPI
5), Mac OS X, and Chrome OS, using speech synthesis capabilities
provided by the operating system. On all platforms, the user can
install extensions that register themselves as alternative speech
engines.</p>

<h2 id="generating_speech">Generating speech</h2>

<p>Call <code>speak()</code> from your extension or
packaged app to speak. For example:</p>

<pre>chrome.experimental.tts.speak('Hello, world.');</pre>

<p>To stop speaking immediately, just call <code>stop()</code>:

<pre>chrome.experimental.tts.stop();</pre>

<p>You can provide options that control various properties of the speech,
such as its rate, pitch, and more. For example:</p>

<pre>chrome.experimental.tts.speak('Hello, world.', {'rate': 2.0});</pre>

<p>It's also a good idea to specify the language so that a synthesizer
supporting that language (and regional dialect, if applicable) is chosen.</p>

<pre>chrome.experimental.tts.speak(
    'Hello, world.', {'lang': 'en-US', 'rate': 2.0});</pre>

<p>By default, each call to <code>speak()</code> will interrupt any
ongoing speech and speak immediately. To determine if a call would be
interrupting anything, you can call <code>isSpeaking()</code>, or
you can use the <code>enqueue</code> option to cause this utterance to
be added to a queue of utterances that will be spoken when the current
utterance has finished.

<pre>chrome.experimental.tts.speak(
    'Speak this first.');
chrome.experimental.tts.speak(
    'Speak this next, when the first sentence is done.', {'enqueue': true});
</pre>

<p>A complete description of all options can be found in the
<a href="#method-speak">speak() method documentation</a> below.
Not all speech engines will support all options.</p>

<p>To catch errors and make sure you're calling <code>speak()</code>
correctly, pass a callback function that takes no arguments. Inside
the callback, check
<a href="extension.html#property-lastError">chrome.extension.lastError</a>
to see if there were any errors.</p>

<pre>chrome.experimental.tts.speak(
    utterance,
    options,
    function() {
      if (chrome.extension.lastError) {
        console.log('Error: ' + chrome.extension.lastError.message);
      }
    });</pre>

<p>The callback returns right away, before the speech engine has started
generating speech. The purpose of the callback is to alert you to syntax
errors in your use of the TTS API, not all possible errors that might occur
in the process of synthesizing and outputting speech. To catch these errors
too, you need to use an event listener, described below.

<h2 id="events">Listening to events</h2>

<p>To get more real-time information about the status of synthesized speech,
pass an event listener in the options to <code>speak()</code>, like this:</p>

<pre>chrome.experimental.tts.speak(
    utterance,
    {
      'onevent': function(event) {
        console.log('Event ' + event.type ' at position ' + event.charIndex);
        if (event.type == 'error') {
          console.log('Error: ' + event.errorMessage);
        }
      }
    },
    callback);</pre>

<p>Each event includes an event type, the character index of the current
speech relative to the utterance, and for error events, an optional
error message. The event types are:</p>

<ul>
  <li><code>'start'</code>: the engine has started speaking the utterance.
  <li><code>'word'</code>: a word boundary was reached. Use
          <code>event.charIndex</code> to determine the current speech
          position.
  <li><code>'sentence'</code>: a sentence boundary was reached. Use
          <code>event.charIndex</code> to determine the current speech
          position.
  <li><code>'marker'</code>: an SSML marker was reached. Use
          <code>event.charIndex</code> to determine the current speech
          position.
  <li><code>'end'</code>: the engine has finished speaking the utterance.
  <li><code>'interrupted'</code>: this utterance was interrupted by another
          call to <code>speak()</code> or <code>stop()</code> and did not
          finish.
  <li><code>'cancelled'</code>: this utterance was queued, but then
          cancelled by another call to <code>speak()</code> or
          <code>stop()</code> and never began to speak at all.
  <li><code>'error'</code>: An engine-specific error occurred and
          this utterance cannot be spoken.
          Check <code>event.errorMessage</code> for details.
</ul>

<p>Four of the event types, <code>'end'</code>, <code>'interrupted'</code>,
<code>'cancelled'</code>, and <code>'error'</code>, are <i>final</i>. After
one of those events is received, this utterance will no longer speak and
no new events from this utterance will be received.</p>

<p>Some TTS engines may not support all event types, and some may not even
support any events at all. To require that the speech engine used sends
the events you're interested in, you can pass a list of event types in
the <code>requiredEventTypes</code> member of the options object, or use
<code>getVoices</code> to choose a voice that has the events you need.
Both are documented below.

<h2 id="ssml">SSML markup</h2>

<p>Utterances used in this API may include markup using the
<a href="http://www.w3.org/TR/speech-synthesis">Speech Synthesis Markup
Language (SSML)</a>. If you use SSML, the first argument to
<code>speak()</code> should be a complete SSML document with an XML
header and a top-level <code>&lt;speak&gt;</code> tag, not a document
fragment.

For example:

<pre>chrome.experimental.tts.speak(
    '&lt;?xml version="1.0"?&gt;' +
    '&lt;speak&gt;' +
    '  The &lt;emphasis&gt;second&lt;/emphasis&gt; ' +
    '  word of this sentence was emphasized.' +
    '&lt;/speak&gt;');</pre>

<p>Not all speech engines will support all SSML tags, and some may not support
SSML at all, but all engines are required to ignore any SSML they don't
support and still speak the underlying text.</p>

<h2 id="choosing_voice">Choosing a voice</h2>

<p>By default, Chrome will choose the most appropriate voice for each
utterance you want to speak, based on the language and gender. On most
Windows, Mac OS X, and Chrome OS systems, speech synthesis provided by
the operating system should be able to speak any text in at least one
language. Some users may have a variety of voices available, though,
from their operating system and from speech engines implemented by other
Chrome extensions. In those cases, you can implement custom code to choose
the appropriate voice, or present the user with a list of choices.</p>

<p>To get a list of all voices, call <code>getVoices()</code> and pass it
a function that receives an array of <code>TtsVoice</code> objects as its
argument:</p>

<pre>chrome.experimental.tts.getVoices(
    function(voices) {
      for (var i = 0; i < voices.length; i++) {
        console.log('Voice ' + i + ':');
        console.log('  name: ' + voices[i].voiceName);
        console.log('  lang: ' + voices[i].lang);
        console.log('  gender: ' + voices[i].gender);
        console.log('  extension id: ' + voices[i].extensionId);
        console.log('  event types: ' + voices[i].eventTypes);
      }
    });</pre>