JavaScript Speech Recognition

By  on  
Speech Recognition

Speech recognition software is becoming more and more important; it started (for me) with Siri on iOS, then Amazon's Echo, then my new Apple TV, and so on.  Speech recognition is so useful for not just us tech superstars but for people who either want to work "hands free" or just want the convenience of shouting orders at a moment's notice.  Browser tech sometimes lags behind native technology but not for speech recognition:  the technology in the browser today and it's time to use it:  the SpeechRecognition API.

SpeechRecognition

For as advanced of a concept speech recognition is, the API to use it is fairly simple:

var recognition = new (window.SpeechRecognition || window.webkitSpeechRecognition || window.mozSpeechRecognition || window.msSpeechRecognition)();
recognition.lang = 'en-US';
recognition.interimResults = false;
recognition.maxAlternatives = 5;
recognition.start();

recognition.onresult = function(event) {
    console.log('You said: ', event.results[0][0].transcript);
};

The first match is at the event.results[0][0].transcript path; you can also set the number of alternatives in the case that what you're listening for could be ambiguous.

You can even add your own terms using the SpeechGrammarList object:

var grammar = '#JSGF V1.0; grammar colors; public  = aqua | azure | beige ... ;'
var recognition = new SpeechRecognition();
var speechRecognitionList = new SpeechGrammarList();
speechRecognitionList.addFromString(grammar, 1);
recognition.grammars = speechRecognitionList;

There are several events emitted during the speech recognition process, so you can use the following snippet to follow the event timeline:

[
 'onaudiostart',
 'onaudioend',
 'onend',
 'onerror',
 'onnomatch',
 'onresult',
 'onsoundstart',
 'onsoundend',
 'onspeechend',
 'onstart'
].forEach(function(eventName) {
    recognition[eventName] = function(e) {
        console.log(eventName, e);
    };
});

A few caveats about the using speech recognition:

  • Chrome ends the listener after a given amount of time, so you'll need to hook into the end event to restart the speech listener
  • If you have multiple tabs using the speech listener API, you may experience the listener ending quickly

annyang!

The excellent annyang library provides a neat API for listening to for desired commands, all in an awesome 2KB package.  The following is a sample usage of annyang:

// Let's define our first command. First the text we expect, and then the function it should call
var commands = {
    'play video': function() {
        document.querySelector('video').play();
    },
    'pause video': function() {
        document.querySelector('video').pause();
    }
    '* video': function(word) {
        if(word === 'play') {
            document.querySelector('video').play();
        }
        else if(word === 'pause' || word === 'stop') {
            document.querySelector('video').pause();
        }
    }
};

// Add our commands to annyang
annyang.addCommands(commands);

// Start listening. You can call this here, or attach this call to an event, button, etc.
annyang.start();

Note that not only can you provide an exact phrase to listen for, but you can also provide a wildcard string; the wildcard string is useful in cases where you want to prefix your commands, much like saying "Siri: (instructions)" or "Echo: (instructions)".

It's so cool that speech recognition is available within the browser today.  If you want to see an awesome application of this feature, check out Mozilla VR's Kevin Ngo's amazing demo:  Speech Recognition + A-Frame VR + Spotify.  You could even use this API to listen for "wtf" when someone reviews your code!  Take some time to play with this API and create something innovative!

Recent Features

  • By
    Write Simple, Elegant and Maintainable Media Queries with Sass

    I spent a few months experimenting with different approaches for writing simple, elegant and maintainable media queries with Sass. Each solution had something that I really liked, but I couldn't find one that covered everything I needed to do, so I ventured into creating my...

  • By
    How I Stopped WordPress Comment Spam

    I love almost every part of being a tech blogger:  learning, preaching, bantering, researching.  The one part about blogging that I absolutely loathe:  dealing with SPAM comments.  For the past two years, my blog has registered 8,000+ SPAM comments per day.  PER DAY.  Bloating my database...

Incredible Demos

  • By
    Dynamically Load Stylesheets Using MooTools 1.2

    Theming has become a big part of the Web 2.0 revolution. Luckily, so too has a higher regard for semantics and CSS standards. If you build your pages using good XHTML code, changing a CSS file can make your website look completely different.

  • By
    Duplicate the jQuery Homepage Tooltips Using Dojo

    The jQuery homepage has a pretty suave tooltip-like effect as seen below: Here's how to accomplish this same effect using Dojo. The XHTML The above HTML was taken directly from the jQuery homepage -- no changes. The CSS The above CSS has been slightly modified to match the CSS rules already...

Discussion

  1. Adam Snodgrass

    The Speech Recognition API has been disabled in Chromium and Chromium derivatives like Electron but it looks like if you want the feature you can use their Cloud Speech API.

  2. Kostas

    Probably, a better initializer:

    var recognition = new (window.SpeechRecognition || window.webkitSpeechRecognition || window.mozSpeechRecognition || window.msSpeechRecognition)();
    
  3. Paul

    Another speech recognition interface is https://www.textfromtospeech.com/uk/voice-to-text/

  4. Eric

    Has anyone else had problems using this and other speech recognition on Android Nougat? Is Google pushing everyone to use the Cloud Speech API?

Wrap your code in <pre class="{language}"></pre> tags, link to a GitHub gist, JSFiddle fiddle, or CodePen pen to embed!