A Go client to call the Google Speech API for free.
The Google Speech API (full duplex version) are meant to offer a speech recognition service via the Web Speech API on the Google Chrome browser. They are different from the Google Cloud Speech-to-Text API.
Disclaimer: The Google Speech API is an internal API and totally unsupported, susceptible to change or disappear at any moment in the future.
Import it as a package:
import (
"github.com/giulianopz/go-gstt/pkg/client"
"github.com/giulianopz/go-gstt/pkg/transcription"
)
func main() {
var (
httpC = client.New()
in io.Reader // audio input
options *opts.Options // configure transcription parameters
out = make(chan *transcription.Response) // receive results from channel
)
go httpC.Transcribe(in, out, options)
for resp := range out {
for _, result := range resp.Result {
for _, alt := range result.Alternative {
fmt.Printf("confidence=%f, transcript=%s\n", alt.Confidence, strings.TrimSpace(alt.Transcript))
}
}
}
}
Use it as a command:
$ git clone https://github.com/giulianopz/go-gstt
$ cd go-gstt
$ go build -o gstt .
$ mv gstt /usr/local/bin
# or just `go install github.com/giulianopz/go-gstt@latest`, if you don't want to rename the binary
$ gstt -h
Usage:
gstt [OPTION]... --interim --continuous [--file FILE]
Options:
--verbose
--file, path of audio file to trascript
--key, API key to authenticates request (default is the one built into any Chrome installation)
--language, language of the recording transcription, use the standard webcodes for your language, i.e. 'en-US' for English-US, 'ru' for Russian, etc. please, see https://en.wikipedia.org/wiki/IETF_language_tag
--continuous, to keep the stream open and transcoding as long as there is no silence
--interim, to send back results before its finished, so you get a live stream of possible transcriptions as it processes the audio
--max-alts, how many possible transcriptions do you want
--pfilter, profanity filter ('0'=off, '1'=medium, '2'=strict)
--user-agent, user-agent for spoofing
--sample-rate, audio sampling rate
# trascribe audio from a single FLAC file
$ gstt --interim --continuous --file $FILE
# trascribe audio from microphone input (recorded with sox, removing silence)
$ rec -c 1 --encoding signed-integer --bits 16 --rate 16000 -t flac - silence 1 0.1 1% -1 0.5 1% | gstt --interim --continuous
Note: the Google Speech API seems to accept only input audio with 16k sample rate and 1 channel. If you need to mix a single stereo stream (2 channels) down to a mono stream (1 channel), please read the ffmpeg docs.
Live-caption speech redirecting speakers output to microphone input with PulseAudio Volume Control (pavucontrol):
As far as I know, this API has been going around since a long time.
Mike Pultz was possibly the first one to discover it in 2011. Subsequently, Travis Payton published a detailed report on the subject.
I wrote about it on my blog.