Parsing Error with audio files longer than 60 sec #57

Timo-Ko · 2019-04-04T09:32:30Z

Hi all,
I'm using the googleLanguageR package version 0.2.0.9 to transcribe German phone calls to text with the Google Speech-to-text API (speaker diarization is turned on, two speakers).
However, whenever I want to transcribe a file, which is longer than 60 seconds (i.e., I store it in a Cloud Bucket and then access it via the URI) it gives me a warning message.

Here is my code:

my_config <- list(encoding = "LINEAR16",
                  enableSpeakerDiarization = TRUE,
                  diarizationSpeakerCount = 2)

testcall <- "gs://[bucket]/testcall.wav"

apicall<- gl_speech(testcall, sampleRateHertz = 8000, languageCode = "de-DE", asynch = TRUE, customConfig = my_config)

testcall_transcript <- gl_speech_op(apicall)

The transcription is successful but R gives me this warning message.

Warning message:
In value[[3L]](cond) : Could not parse object with names:

What this error causes is that the structure of the two returend dataframes seems to be a little mixed up.
When I call str(testcall_transcript) it gives me the following output:

List of 2
 $ transcript:Classes ‘tbl_df’, ‘tbl’ and 'data.frame':	2 obs. of  2 variables:
  ..$ transcript: chr [1:2] "ja hallo ja und vergebe Zusatzdaten und zwar hat er was mache ich als nicht anzumerken ist einfach machen 815" "und wie heißt die Variable die drin da diese Datei ein Kratzer nennst Zusatzdaten Zusatzdaten vorgangs-id nicht"| __truncated__
  ..$ confidence: chr [1:2] "0.8421874" "0.8393924"
 $ timings   :List of 2
  ..$ :'data.frame':	1 obs. of  3 variables:
  .. ..$ transcript: chr "ja hallo ja und vergebe Zusatzdaten und zwar hat er was mache ich als nicht anzumerken ist einfach machen 815"
  .. ..$ confidence: num 0.842
  .. ..$ words     :List of 1
  .. .. ..$ :'data.frame':	20 obs. of  3 variables:
  .. .. .. ..$ startTime: chr [1:20] "0s" "17.600s" "18.100s" "18.200s" ...
  .. .. .. ..$ endTime  : chr [1:20] "17.600s" "18.100s" "18.200s" "19.100s" ...
  .. .. .. ..$ word     : chr [1:20] "ja" "hallo" "ja" "und" ...
  ..$ :'data.frame':	1 obs. of  3 variables:
  .. ..$ transcript: chr "und wie heißt die Variable die drin da diese Datei ein Kratzer nennst Zusatzdaten Zusatzdaten vorgangs-id nicht"| __truncated__
  .. ..$ confidence: num 0.839
  .. ..$ words     :List of 1
  .. .. ..$ :'data.frame':	43 obs. of  4 variables:
  .. .. .. ..$ startTime : chr [1:43] "0s" "17.600s" "18.100s" "18.200s" ...
  .. .. .. ..$ endTime   : chr [1:43] "17.600s" "18.100s" "18.200s" "19.100s" ...
  .. .. .. ..$ word      : chr [1:43] "ja" "hallo" "ja" "und" ...
  .. .. .. ..$ speakerTag: int [1:43] 1 1 1 1 1 1 1 1 1 1 ...

Looks all fine BUT...
when I try to access the $timings dataframe I'm having trouble to access the $speakerTag variable. I need to access the speakerTag and the respective start and Endtimes in order to determine the time stamps when a speaker turn happens.

For a short file (less than 60sec) R gives me this output (perfectly working):

> transcript_short$timings$speakerTag
  [1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [68] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1

For the long file R gives me this output:

> testcall_transcript$timings$speakerTag
NULL

Any ideas on how this can be fixed? Extracting the speakerTags is crucial for my further data processing.
Thanks! :)

The text was updated successfully, but these errors were encountered:

MarkEdmondson1234 · 2019-04-04T11:05:54Z

Thanks for report, it sounds like a parsing error when supporting this new feature. Would you have an example file I can use for debugging purposes? (if not confidential, obviously)

MarkEdmondson1234 · 2019-04-04T11:06:38Z

Could you also rerun the buggy API call with options(googleAuthR.verbose = 0)

Timo-Ko · 2019-04-04T11:49:09Z

Hi Mark, thanks for your fast reply!
I have sent you a sample file via email and also reran the buggy API call with your options, but it still throws the same warning.

MarkEdmondson1234 · 2019-04-05T08:52:26Z

I tested your file and with no code changes got a different response format that your example, so perhaps you just need to update to the latest version? The speakerTags are in the second data.frame, for some reason it puts it into a second alternative.

    my_config <- list(encoding = "LINEAR16",
                      enableSpeakerDiarization = TRUE,
                      diarizationSpeakerCount = 2)

    testcall <- "gs://mark-edmondson-public-read/testcall.wav"

    apicall<- gl_speech(testcall,
                        sampleRateHertz = 8000,
                        languageCode = "de-DE",
                        asynch = TRUE,
                        customConfig = my_config)

    testcall_transcript <- gl_speech_op(apicall)

str(testcall_transcript)
List of 2
 $ transcript:'data.frame':	2 obs. of  4 variables:
  ..$ transcript  : chr [1:2] "ja hallo ja und vergebe Zusatzdaten und zwar hat er was mache ich als nicht anzumerken ist einfach machen 815" "und wie heißt die Variable die drin da diese Datei ein Kratzer nennst Zusatzdaten Zusatzdaten vorgangs-id nicht"| __truncated__
  ..$ confidence  : chr [1:2] "0.84218776" "0.8393922"
  ..$ languageCode: chr [1:2] "de-de" "de-de"
  ..$ channelTag  : logi [1:2] NA NA
 $ timings   :List of 2
  ..$ :'data.frame':	20 obs. of  3 variables:
  .. ..$ startTime: chr [1:20] "0s" "17.600s" "18.100s" "18.200s" ...
  .. ..$ endTime  : chr [1:20] "17.600s" "18.100s" "18.200s" "19.100s" ...
  .. ..$ word     : chr [1:20] "ja" "hallo" "ja" "und" ...
  ..$ :'data.frame':	43 obs. of  4 variables:
  .. ..$ startTime : chr [1:43] "0s" "17.600s" "18.100s" "18.200s" ...
  .. ..$ endTime   : chr [1:43] "17.600s" "18.100s" "18.200s" "19.100s" ...
  .. ..$ word      : chr [1:43] "ja" "hallo" "ja" "und" ...
  .. ..$ speakerTag: int [1:43] 1 1 1 1 1 1 1 1 1 1 ...

testcall_transcript$timings[[2]]$speakerTag
 [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

It does a worse job though than your example though, putting every entry as speakerTag=1, but I don't see how that is related to any parsing problems, its in the raw API response. I tried setting to enhanced "phone_call" model to improve the results but its unsupported for German.

Timo-Ko · 2019-04-05T14:58:46Z

Hi Mark, it seems that I just had to update R from 3.5.1 to 3.5.3 and now the transcription runs smoothly again.
Thanks a lot for your help! :)

Timo-Ko · 2019-04-18T13:56:18Z

@MarkEdmondson1234 I used some more (and longer files) and noticed that the API Call produces multiple data frames in the second $timings list of the output. The longer the audio file, the more data frames it produces it seems. The last data frame in that list is always the complete transcription and the previous ones only cover parts of the transcription. Also only the last data frame in that list has a $speakerTag column. Maybe that is something worth looking into?

MarkEdmondson1234 self-assigned this Apr 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parsing Error with audio files longer than 60 sec #57

Parsing Error with audio files longer than 60 sec #57

Timo-Ko commented Apr 4, 2019

MarkEdmondson1234 commented Apr 4, 2019

MarkEdmondson1234 commented Apr 4, 2019

Timo-Ko commented Apr 4, 2019

MarkEdmondson1234 commented Apr 5, 2019 •

edited

Loading

Timo-Ko commented Apr 5, 2019

Timo-Ko commented Apr 18, 2019

Parsing Error with audio files longer than 60 sec #57

Parsing Error with audio files longer than 60 sec #57

Comments

Timo-Ko commented Apr 4, 2019

MarkEdmondson1234 commented Apr 4, 2019

MarkEdmondson1234 commented Apr 4, 2019

Timo-Ko commented Apr 4, 2019

MarkEdmondson1234 commented Apr 5, 2019 • edited Loading

Timo-Ko commented Apr 5, 2019

Timo-Ko commented Apr 18, 2019

MarkEdmondson1234 commented Apr 5, 2019 •

edited

Loading