Building Resilient Systems on AWS: Learn how to design and implement a resilient, highly available, fault-tolerant infrastructure on AWS.

JavaScript Speech Recognition

By David Walsh on September 27, 2016

Speech recognition software is becoming more and more important; it started (for me) with Siri on iOS, then Amazon's Echo, then my new Apple TV, and so on. Speech recognition is so useful for not just us tech superstars but for people who either want to work "hands free" or just want the convenience of shouting orders at a moment's notice. Browser tech sometimes lags behind native technology but not for speech recognition: the technology in the browser today and it's time to use it: the SpeechRecognition API.

Basic Video Demo

`SpeechRecognition`

For as advanced of a concept speech recognition is, the API to use it is fairly simple:

var recognition = new (window.SpeechRecognition || window.webkitSpeechRecognition || window.mozSpeechRecognition || window.msSpeechRecognition)();
recognition.lang = 'en-US';
recognition.interimResults = false;
recognition.maxAlternatives = 5;
recognition.start();

recognition.onresult = function(event) {
    console.log('You said: ', event.results[0][0].transcript);
};

The first match is at the event.results[0][0].transcript path; you can also set the number of alternatives in the case that what you're listening for could be ambiguous.

You can even add your own terms using the SpeechGrammarList object:

var grammar = '#JSGF V1.0; grammar colors; public  = aqua | azure | beige ... ;'
var recognition = new SpeechRecognition();
var speechRecognitionList = new SpeechGrammarList();
speechRecognitionList.addFromString(grammar, 1);
recognition.grammars = speechRecognitionList;

There are several events emitted during the speech recognition process, so you can use the following snippet to follow the event timeline:

[
 'onaudiostart',
 'onaudioend',
 'onend',
 'onerror',
 'onnomatch',
 'onresult',
 'onsoundstart',
 'onsoundend',
 'onspeechend',
 'onstart'
].forEach(function(eventName) {
    recognition[eventName] = function(e) {
        console.log(eventName, e);
    };
});

A few caveats about the using speech recognition:

Chrome ends the listener after a given amount of time, so you'll need to hook into the end event to restart the speech listener
If you have multiple tabs using the speech listener API, you may experience the listener ending quickly

annyang!

The excellent annyang library provides a neat API for listening to for desired commands, all in an awesome 2KB package. The following is a sample usage of annyang:

// Let's define our first command. First the text we expect, and then the function it should call
var commands = {
    'play video': function() {
        document.querySelector('video').play();
    },
    'pause video': function() {
        document.querySelector('video').pause();
    }
    '* video': function(word) {
        if(word === 'play') {
            document.querySelector('video').play();
        }
        else if(word === 'pause' || word === 'stop') {
            document.querySelector('video').pause();
        }
    }
};

// Add our commands to annyang
annyang.addCommands(commands);

// Start listening. You can call this here, or attach this call to an event, button, etc.
annyang.start();

Note that not only can you provide an exact phrase to listen for, but you can also provide a wildcard string; the wildcard string is useful in cases where you want to prefix your commands, much like saying "Siri: (instructions)" or "Echo: (instructions)".

Basic Video Demo

It's so cool that speech recognition is available within the browser today. If you want to see an awesome application of this feature, check out Mozilla VR's Kevin Ngo's amazing demo: Speech Recognition + A-Frame VR + Spotify. You could even use this API to listen for "wtf" when someone reviews your code! Take some time to play with this API and create something innovative!

Recent Features

By David WalshAugust 29, 2011
Create Namespaced Classes with MooTools
MooTools has always gotten a bit of grief for not inherently using and standardizing namespaced-based JavaScript classes like the Dojo Toolkit does. Many developers create their classes as globals which is generally frowned up. I mostly disagree with that stance, but each to their own. In any event...
By David WalshSeptember 3, 2014
Create a CSS Flipping Animation
CSS animations are a lot of fun; the beauty of them is that through many simple properties, you can create anything from an elegant fade in to a WTF-Pixar-would-be-proud effect. One CSS effect somewhere in between is the CSS flip effect, whereby there's...

Incredible Demos

By David WalshOctober 20, 2011
Face Detection with jQuery
I've always been intrigued by recognition software because I cannot imagine the logic that goes into all of the algorithms. Whether it's voice, face, or other types of detection, people look and sound so different, pictures are shot differently, and from different angles, I...
By David WalshFebruary 23, 2011
RealTime Stock Quotes with MooTools Request.Stocks and YQL
It goes without saying but MooTools' inheritance pattern allows for creation of small, simple classes that possess immense power. One example of that power is a class that inherits from Request, Request.JSON, and Request.JSONP: Request.Stocks. Created by Enrique Erne, this great MooTools class acts as...

Discussion

Adam Snodgrass
The Speech Recognition API has been disabled in Chromium and Chromium derivatives like Electron but it looks like if you want the feature you can use their Cloud Speech API.

Raymond Camden
Any idea why it was disabled?

Kostas

Probably, a better initializer:

var recognition = new (window.SpeechRecognition || window.webkitSpeechRecognition || window.mozSpeechRecognition || window.msSpeechRecognition)();

Paul
Another speech recognition interface is https://www.textfromtospeech.com/uk/voice-to-text/
Eric
Has anyone else had problems using this and other speech recognition on Android Nougat? Is Google pushing everyone to use the Cloud Speech API?
Paterson
recognition. lang = ‘en-US’ ; how can i put multiple language. plse help me thank
Sun
Hello David,

I am really not finding a way to add speech recognition to my web app, that would run on either chrome or safari. both support the webspeech synthesis, but not recognition. I want to somehow add a simple command recognition, based on which my app would do different things. How do I go about…I have tried everything but cannot seem to get this to work on iphone.
Siva
for cordova lite app android/ios – is that wont recognize:
```
var SpeechRecognition = window.webkitSpeechRecognition;
```
I am unable to find the solutions, please guide me
Luc Volders
Could someone please help me a bit out.
The demo page works in Google Chrome on my PC and on my Android Phone. But when I use the code as described myself it does work on my PC but not on Android.
Can someone please give a working example of the pure code for Android.
Asrar
Can we assign a specific corpus with limited words instead of recognizing everything we say ?

JavaScript Speech Recognition

SpeechRecognition

annyang!

Recent Features

Create Namespaced Classes with MooTools

Create a CSS Flipping Animation

Incredible Demos

Face Detection with jQuery

RealTime Stock Quotes with MooTools Request.Stocks and YQL

Discussion

`SpeechRecognition`