-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parsing Error with audio files longer than 60 sec #57
Comments
Thanks for report, it sounds like a parsing error when supporting this new feature. Would you have an example file I can use for debugging purposes? (if not confidential, obviously) |
Could you also rerun the buggy API call with |
Hi Mark, thanks for your fast reply! |
I tested your file and with no code changes got a different response format that your example, so perhaps you just need to update to the latest version? The speakerTags are in the second data.frame, for some reason it puts it into a second alternative. my_config <- list(encoding = "LINEAR16",
enableSpeakerDiarization = TRUE,
diarizationSpeakerCount = 2)
testcall <- "gs://mark-edmondson-public-read/testcall.wav"
apicall<- gl_speech(testcall,
sampleRateHertz = 8000,
languageCode = "de-DE",
asynch = TRUE,
customConfig = my_config)
testcall_transcript <- gl_speech_op(apicall)
str(testcall_transcript)
List of 2
$ transcript:'data.frame': 2 obs. of 4 variables:
..$ transcript : chr [1:2] "ja hallo ja und vergebe Zusatzdaten und zwar hat er was mache ich als nicht anzumerken ist einfach machen 815" "und wie heißt die Variable die drin da diese Datei ein Kratzer nennst Zusatzdaten Zusatzdaten vorgangs-id nicht"| __truncated__
..$ confidence : chr [1:2] "0.84218776" "0.8393922"
..$ languageCode: chr [1:2] "de-de" "de-de"
..$ channelTag : logi [1:2] NA NA
$ timings :List of 2
..$ :'data.frame': 20 obs. of 3 variables:
.. ..$ startTime: chr [1:20] "0s" "17.600s" "18.100s" "18.200s" ...
.. ..$ endTime : chr [1:20] "17.600s" "18.100s" "18.200s" "19.100s" ...
.. ..$ word : chr [1:20] "ja" "hallo" "ja" "und" ...
..$ :'data.frame': 43 obs. of 4 variables:
.. ..$ startTime : chr [1:43] "0s" "17.600s" "18.100s" "18.200s" ...
.. ..$ endTime : chr [1:43] "17.600s" "18.100s" "18.200s" "19.100s" ...
.. ..$ word : chr [1:43] "ja" "hallo" "ja" "und" ...
.. ..$ speakerTag: int [1:43] 1 1 1 1 1 1 1 1 1 1 ...
testcall_transcript$timings[[2]]$speakerTag
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 It does a worse job though than your example though, putting every entry as |
Hi Mark, it seems that I just had to update R from 3.5.1 to 3.5.3 and now the transcription runs smoothly again. |
@MarkEdmondson1234 I used some more (and longer files) and noticed that the API Call produces multiple data frames in the second $timings list of the output. The longer the audio file, the more data frames it produces it seems. The last data frame in that list is always the complete transcription and the previous ones only cover parts of the transcription. Also only the last data frame in that list has a $speakerTag column. Maybe that is something worth looking into? |
Hi all,
I'm using the googleLanguageR package version 0.2.0.9 to transcribe German phone calls to text with the Google Speech-to-text API (speaker diarization is turned on, two speakers).
However, whenever I want to transcribe a file, which is longer than 60 seconds (i.e., I store it in a Cloud Bucket and then access it via the URI) it gives me a warning message.
Here is my code:
The transcription is successful but R gives me this warning message.
What this error causes is that the structure of the two returend dataframes seems to be a little mixed up.
When I call str(testcall_transcript) it gives me the following output:
Looks all fine BUT...
when I try to access the $timings dataframe I'm having trouble to access the $speakerTag variable. I need to access the speakerTag and the respective start and Endtimes in order to determine the time stamps when a speaker turn happens.
For a short file (less than 60sec) R gives me this output (perfectly working):
For the long file R gives me this output:
Any ideas on how this can be fixed? Extracting the speakerTags is crucial for my further data processing.
Thanks! :)
The text was updated successfully, but these errors were encountered: