Skip to content

giulianopz/go-gstt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

gstt

A Go client to call the Google Speech API for free.

The Google Speech API (full duplex version) are meant to offer a speech recognition service via the Web Speech API on the Google Chrome browser. They are different from the Google Cloud Speech-to-Text API.

Disclaimer: The Google Speech API is an internal API and totally unsupported, susceptible to change or disappear at any moment in the future.

Usage

Import it as a package:

import (
    "github.com/giulianopz/go-gstt/pkg/client"
    "github.com/giulianopz/go-gstt/pkg/transcription"
)

func main() {
	var (
		httpC   = client.New()
		in      io.Reader                            // audio input
		options *opts.Options                        // configure transcription parameters
		out     = make(chan *transcription.Response) // receive results from channel
	)

	go httpC.Transcribe(in, out, options)

	for resp := range out {
		for _, result := range resp.Result {
			for _, alt := range result.Alternative {
				fmt.Printf("confidence=%f, transcript=%s\n", alt.Confidence, strings.TrimSpace(alt.Transcript))
			}
		}
	}
}

Use it as a command:

$ git clone https://github.com/giulianopz/go-gstt
$ cd go-gstt
$ go build -o gstt .
$ mv gstt /usr/local/bin
# or just `go install github.com/giulianopz/go-gstt@latest`, if you don't want to rename the binary
$ gstt -h
Usage:
    gstt [OPTION]... --interim --continuous [--file FILE]

Options:
        --verbose
        --file, path of audio file to trascript
        --key, API key to authenticates request (default is the one built into any Chrome installation)
        --language, language of the recording transcription, use the standard webcodes for your language, i.e. 'en-US' for English-US, 'ru' for Russian, etc. please, see https://en.wikipedia.org/wiki/IETF_language_tag
        --continuous, to keep the stream open and transcoding as long as there is no silence
        --interim, to send back results before its finished, so you get a live stream of possible transcriptions as it processes the audio
        --max-alts, how many possible transcriptions do you want
        --pfilter, profanity filter ('0'=off, '1'=medium, '2'=strict)
        --user-agent, user-agent for spoofing
        --sample-rate, audio sampling rate
# trascribe audio from a single FLAC file
$ gstt --interim --continuous --file $FILE
# trascribe audio from microphone input (recorded with sox, removing silence)
$ rec -c 1 --encoding signed-integer --bits 16 --rate 16000 -t flac - silence 1 0.1 1% -1 0.5 1% | gstt --interim --continuous

Note: the Google Speech API seems to accept only input audio with 16k sample rate and 1 channel. If you need to mix a single stereo stream (2 channels) down to a mono stream (1 channel), please read the ffmpeg docs.

Demo

Live-caption speech redirecting speakers output to microphone input with PulseAudio Volume Control (pavucontrol):

livecapdemo

(how-to-gif)

Credits

As far as I know, this API has been going around since a long time.

Mike Pultz was possibly the first one to discover it in 2011. Subsequently, Travis Payton published a detailed report on the subject.

I wrote about it on my blog.