Speech to text transcription with the Cloud Speech-to-Text API


Speech to text transcription with the Cloud Speech-to-Text API

The Cloud Speech API lets you do speech to text transcription from audio files in over 80 languages.

In this lab, we will record an audio file and send it to the Cloud Speech API for transcription.


1.Setup and requirements

Self-paced environment setup
If you don't already have a Google Account (Gmail or Google Apps), you must create one. Sign-in to Google Cloud Platform console (console.cloud.google.com) and create a new project:

Remember the project ID, a unique name across all Google Cloud projects (the name above has already been taken and will not work for you, sorry!). It will be referred to later in this codelab as PROJECT_ID.

Next, you'll need to enable billing in the Cloud Console in order to use Google Cloud resources.

Running through this codelab shouldn't cost you more than a few dollars, but it could be more if you decide to use more resources or if you leave them running (see "cleanup" section at the end of this document).

New users of Google Cloud Platform are eligible for a $300 free trial.


2.Enable the Cloud Speech API

  • Click on the menu icon in the top left of the screen.
  • Select the APIs and Services dashboard from the drop down.
  • Click on Enable APIs and Services
  • hen, search for "speech" in the search box. Click on Google Cloud Speech API
  • Click Enable to enable the Cloud Speech API
  • Wait for a few seconds for it to enable. You will see this once it's enabled


3.Activate Cloud Shell

Google Cloud Shell is a command line environment running in the Cloud. This Debian-based virtual machine is loaded with all the development tools you'll need (gcloud, bq, git and others) and offers a persistent 5GB home directory. We'll use Cloud Shell to create our request to the Speech API.

To get started with Cloud Shell, Click on the "Activate Google Cloud Shell" Screen Shot 2015-06-08 at 5.30.32 PM.pngicon in top right hand corner of the header bar
A Cloud Shell session opens inside a new frame at the bottom of the console and displays a command-line prompt. Wait until the user@project:~$ prompt appears


4.Create an API key

Since we'll be using curl to send a request to the Speech API, we'll need to generate an API key to pass in our request URL. To create an API key, navigate to the APIs & Services > Credentials section of your project dashboard:

Then click Create credentials:
In the drop down menu, select API key:
Next, copy the key you just generated and select Close (don't restrict the key).

Now that you have an API key, save it to an environment variable to avoid having to insert the value of your API key in each request. You can do this in Cloud Shell. Be sure to replace <your_api_key> with the key you just copied.


export API_KEY=<YOUR_API_KEY>


5.Create your Speech API request

You can build your request to the speech API in a request.json file. To create and edit this file, you can use one of your preferred command line editors (nano, vim, emacs) or use the built-in web editor in Cloud Shell:

Create the file in your home directory to be able to easily reference it and add the following to your request.json file :

request.json
{
  "config": {
      "encoding":"FLAC",
      "languageCode": "en-US"
  },
  "audio": {
      "uri":"gs://cloud-samples-tests/speech/brooklyn.flac"
  }
}

The request body has a config and audio object. In config, we tell the Speech API how to process the request. The encoding parameter tells the API which type of audio encoding you're using for the audio file you're sending to the API. FLAC is the encoding type for .raw files (see the documentation for encoding type for more details). There are other parameters you can add to your config object, but encoding is the only required one. languageCode will default to English if left out of the request.

In the audio object, you can pass the API either the uri of our audio file in Cloud Storage or the base64 encoded audio as a string. Here were using Cloud Storage URLs. The next step is calling the Speech API!


6.Call the Speech API

You can now pass your request body, along with the API key environment variable you saved earlier, to the Speech API with the following curl command (all in one single command line):

curl -s -X POST -H "Content-Type: application/json" --data-binary @request.json "https://speech.googleapis.com/v1/speech:recognize?key=${API_KEY}"

The response returned by this curl command should look something like the following:
{
  "results": [
    {
      "alternatives": [
        {
          "transcript": "how old is the Brooklyn Bridge",
          "confidence": 0.98267895
        }
      ]
    }
  ]
}


The transcript value will return the Speech API's text transcription of your audio file, and the confidence value indicates how sure the API is that it has accurately transcribed your audio.

You'll notice that we called the recognize method in our request above. The Speech API supports both synchronous and asynchronous speech to text transcription. In this example we sent it a complete audio file, but you can also use the longrunningrecognize method to perform streaming speech to text transcription while the user is still speaking.


7.Speech to text transcription in different languages

Are you multilingual? The Speech API supports speech to text transcription in over 100 languages! You can change the languageCode parameter in request.json. You can find a list of supported languages here.

Let's try a French audio file (listen to it here if you'd like a preview). Change your request.json to the following:

request.json
 {
  "config": {
      "encoding":"FLAC",
      "languageCode": "fr"
  },
  "audio": {
      "uri":"gs://speech-language-samples/fr-sample.flac"
  }
}

You should see the following response:
{
  "results": [
    {
      "alternatives": [
        {
          "transcript": "maître corbeau sur un arbre perché tenait en son bec un fromage",
          "confidence": 0.9710122
        }
      ]
    }
  ]
}


This is a sentence from a popular French children's tale. If you've got audio files in another language, you can try adding them to Cloud Storage and changing the languageCode parameter in your request.




Comments