How to Build a Text-to-Voice Application With JavaScript

Trending 3 months ago

This tutorial will screen really to person matter into reside utilizing JavaScript utilizing WebSpeechAPI. It will characteristic a elemental interface wherever nan personification adds nan matter to beryllium spoken, past clicks a fastener to make nan corresponding speech.

Our Text-to-Speech Demo

Here’s what we’re going to build. Type thing you want successful nan textarea, prime nan connection you’ve written it in, and click nan fastener to perceive nan result!

A Quick Introduction to nan WebSpeechAPI

Before we build anything, let’s quickly get acquainted pinch nan API we’ll beryllium using.

The WebSpeechAPI is simply a JavaScript API that allows developers to adhd sound functionality to web applications. It consists of nan speechSynthesis component, which enables text-to-voice conversion. SpeechSynthesis allows you to walk matter contented and different optional parameters specified arsenic language, pitch, rate, and voice.

It besides has respective methods for starting, pausing, resuming, and canceling. To make sound from text, nan first point is to create an lawsuit of nan SpeechSynthesisUtterance constituent for illustration this: 

1 utterance = new SpeechSynthesisUtterance();

SpeechSynthesisUtterance is nan portion which represents nan matter you want nan strategy to say. The adjacent measurement is to specify nan matter contented utilizing nan matter spot for illustration this;

1 utterance.text = "Hello world";

Here we want nan machine to opportunity nan words "Hello world". 

Finally, telephone nan speak() method, which will speak nan fixed utterance (SpeechSynthesisUtterance object) defined above.

1 speechSynthesis.speak(utterance);

We tin besides group a language, for example, English -US, 

1 utterance.lang = 'en-US'

If you don’t walk a connection to nan SpeechSynthesisUtterance constructor, nan default connection configured by nan browser will beryllium applied. 

The SpeechSynthesis controller also  provides a getVoices() method, which returns a database of system-supported voices for nan existent device, allowing users to take a civilization voice.

HTML Structure

Okay, let’s commencement building. The HTML Structure will dwell of nan pursuing elements:

  • a <textarea> for nan matter to beryllium converted.
  • A <select> element. Inside nan prime element, we will populate connection options.
  • A make <button> which, erstwhile clicked, will speak nan matter contented provided.

To support america focused connected functionality, we’ll usage Bootstrap to build nan interface. Ensure you adhd nan Bootstrap CDN nexus successful your header for illustration this: 

1 <link
2 href="[email protected]/dist/css/bootstrap.min.css"
3 rel="stylesheet"
4 integrity="sha384-QWTKZyjpPEjISv5WaRU9OFeRpok6YctnYmDr5pNlyT2bRjXh0JMhjY6hW+ALEwIH"
5 crossorigin="anonymous"
6 />

Add nan HTML Structure.

1 <div class="container">
2 <div class="message alert alert-warning" role="alert">
3 </div>
4 <h1>Text to Voice Converter</h1>
5 <form>
6 <div class="form-group">
7 <label for="text">Enter your text:</label>
8 <textarea name="text" class="content form-control form-control-lg" rows="6"></textarea>
9 </div>
10 <div class="form-group">
11 <label for="voices">Choose your language:</label>
12 <select class="select-voices form-control form-control-lg" name="voices">
13 </select>
14 </div>
15 <button type="button" class="convert btn btn-primary">🔊 Convert Text to Voice</button>
16 </form>
17 </div>

Additional Styling pinch CSS

Bootstrap handles beautiful overmuch each nan styling for us. But let’s adhd immoderate civilization CSS properties to our design. These will springiness america a civilization font, a container, immoderate other spacing for nan elements successful nan form, and a norm to hide our alert message.

1 @import url(",wght@0,300;0,400;0,500;1,300;1,400;1,500&display=swap");
3 body {
4 font-family: "DM Mono", monospace;
5 }
6 .container {
7 width: 100%;
8 max-width: 600px;
9 padding: 2rem 0;
10 }
11 .form-group {
12 margin: 2rem 0;
13 }
14 label {
15 margin-bottom: 1rem;
16 }
17 .message{
18 display: none;
19 }

We person group display:none to nan alert constituent truthful that it will only look if location are correction messages to display.

JavaScript Functionality

As I explained successful nan introduction, we tin get voices utilizing nan speechSynthesis.getVoices() method; let’s commencement by getting and storing them successful an array for illustration this.

1 const voices = [
2 { name: "Google Deutsch", lang: "de-DE" },
3 { name: "Google US English", lang: "en-US" },
4 { name: "Google UK English Female", lang: "en-GB" },
5 { name: "Google UK English Male", lang: "en-GB" },
6 { name: "Google español", lang: "es-ES" },
7 { name: "Google español de Estados Unidos", lang: "es-US" },
8 { name: "Google français", lang: "fr-FR" },
9 { name: "Google हिन्दी", lang: "hi-IN" },
10 { name: "Google Bahasa Indonesia", lang: "id-ID" },
11 { name: "Google italiano", lang: "it-IT" },
12 { name: "Google 日本語", lang: "ja-JP" },
13 { name: "Google 한국의", lang: "ko-KR" },
14 { name: "Google Nederlands", lang: "nl-NL" },
15 { name: "Google polski", lang: "pl-PL" },
16 { name: "Google português do Brasil", lang: "pt-BR" },
17 { name: "Google русский", lang: "ru-RU" },
18 { name: "Google 普通话(中国大陆)", lang: "zh-CN" },
19 { name: "Google 粤語(香港)", lang: "zh-HK" },
20 { name: "Google 國語(臺灣)", lang: "zh-TW" }
21 ];

Identify nan Required Elements

Next, usage nan Document Object Model (DOM) to get nan alert, select, and fastener elements.

1 const optionsContainer = document.querySelector(".select-voices");
2 const convertBtn = document.querySelector(".convert");
3 const messageContainer = document.querySelector(".message")

Create Voices Selection

The optionsContainer represents nan <select> constituent for nan drop-down database of voices from which nan personification will prime an option.

We want to populate it pinch nan voices from nan voices array. Create a usability called addVoices().

1 function addVoices(){
2 // populate options pinch nan voices from array
4 }

Inside nan function, usage nan forEach() method to loop done nan voices array, and for each sound object, set option.value = voice.lang and option.text =, past append nan action to nan prime element.

1 function addVoices() {
2 console.log(voices);
3 voices.forEach((voice) => {
4 let option = document.createElement("option");
5 option.value = voice.lang;
6 option.textContent =;
7 optionsContainer.appendChild(option);
9 if (voice.lang === "en-US") {
10 option.selected = true;
11 }
12 });
13 }

We request to invoke nan addVoices() usability to use nan functionality, however, for nan Chrome browser, we request to perceive to nan voiceschanged arena and past telephone nan addVoices() function. So we’ll adhd a condition: 

1 if (navigator.userAgent.indexOf("Chrome") !== -1) {
2 speechSynthesis.addEventListener("voiceschanged", addVoices);
3 } else {
4 addVoices();
5 }

The voiceschanged arena is simply a JavaScript arena fired erstwhile nan database of disposable reside synthesis voices changes. The arena happens erstwhile nan database of disposable voices is fresh to use.

Button Event Listener

Add a click arena listener to nan make button.

1 convertBtn.addEventListener("click", function () {
2 // show an alert connection if contented is empty
3 // walk nan arguments to convertToSpeech()
4 });

Inside nan arena listener function, we want to show an alert if nan contented is not provided, get nan matter from nan textarea, get nan selected language, and walk nan values to nan convertToSpeech() function.

Update nan arena listener arsenic follows.

1 convertBtn.addEventListener("click", function () {
2 convertText = document.querySelector(".content").value;
4 if (convertText === "") {
5 messageContainer.textContent = " Please supply immoderate text";
6 = "block";
8 setTimeout(() => {
9 messageContainer.textContent = "";
10 = "none";
11 }, 2000);
13 return;
14 }
16 const selectedLang =
17 optionsContainer.options[optionsContainer.selectedIndex].value;
20 convertToSpeech(convertText, selectedLang);
21 });

Create nan convertToSpeech() usability and adhd nan codification below.

1 function convertToSpeech(text, lang) {
2 if (!("speechSynthesis" in window)) {
3 messageContainer.textContent =
4 " Your browser is not supported, effort different browser";
5 ="block"
6 return;
7 }
8 let utterance = new SpeechSynthesisUtterance();
9 utterance.lang = lang;
10 utterance.text = text;
12 speechSynthesis.speak(utterance);
14 }

The covertToSpeech() usability will return nan 2 parameters, i.e., nan matter to beryllium converted and nan connection nan matter should beryllium spoken in.

Let’s break it down:

  • First, we will cheque if nan browser supports reside synthesis; if it doesn't, we will show nan connection “Your browser is not supported; effort different browser”
  • If reside synthesis is supported, we will create a caller SpeechSynthesisUtterance lawsuit and delegate it to nan adaptable utterance.
  • Then we use nan matter to nan reside petition pinch utterance.text and nan connection pinch utterance.lang.
  • Finally, nan browser will speak nan matter utilizing speechSynthesis.speak(utterance).


I dream you enjoyed this tutorial and learned thing useful! We covered everything you request to create text-to-voice apps by leveraging nan capabilities of WebSpeechApi. Incorporating text-to-voice functionality successful your exertion will cater to divers personification needs and will amended its wide accessibility.

Let’s punctual ourselves what we created:

Source Tuts Plus
Tuts Plus