Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to get it to work with Voco #42

Closed
flatsiedatsie opened this issue Feb 22, 2021 · 81 comments
Closed

Attempt to get it to work with Voco #42

flatsiedatsie opened this issue Feb 22, 2021 · 81 comments
Labels
enhancement New feature or request

Comments

@flatsiedatsie
Copy link

This is a continuation of discussion here.

I've managed to get Snips to recognise the wake-word, but only right after booting the Atom Echo.

Snips does recognise that there is audio input.

While in idle mode, Snips Watch indicates that audio is being heard.

[14:55:18] [VoiceActivity] Down on site atomecho
[14:55:21] [VoiceActivity] Up on site atomecho
[14:55:22] [VoiceActivity] Down on site atomecho
[14:55:35] [VoiceActivity] Up on site atomecho
[14:55:53] [VoiceActivity] Down on site atomecho
[14:55:55] [VoiceActivity] Up on site atomecho
[14:55:56] [VoiceActivity] Down on site atomecho
[14:56:00] [VoiceActivity] Up on site atomecho

If I press the button to start a session, a session is created, and the dialogue manager listens to the stream from the Atom Echo. But the voice input is not recognised as a voice command:

[14:56:10] [Dialogue] was asked to start a session on site atomecho
[14:56:10] [Asr] was asked to stop listening on site atomecho
[14:56:10] [Hotword] was asked to toggle itself 'off' on site atomecho
[14:56:10] [Dialogue] session with id 'fe685174-5053-4651-8756-8cb3b066003e' was started on site atomecho
[14:56:10] [Asr] was asked to listen on site atomecho
[14:56:11] [VoiceActivity] Down on site atomecho
[14:56:12] [VoiceActivity] Up on site atomecho
[14:56:15] [VoiceActivity] Down on site atomecho
[14:56:16] [VoiceActivity] Up on site atomecho
[14:56:18] [VoiceActivity] Down on site atomecho
[14:56:19] [VoiceActivity] Up on site atomecho
[14:56:26] [Dialogue] session with id 'fe685174-5053-4651-8756-8cb3b066003e' was ended on site atomecho. The session was ended because one of the component didn't respond in a timely manner
[14:56:26] [Asr] was asked to stop listening on site atomecho
[14:56:26] [Hotword] was asked to toggle itself 'on' on site atomecho

If I don't speak into the Raspberry Pi version, then things look a bit different.

[15:07:45] [VoiceActivity] Up on site azrxidia
[15:07:46] [Hotword] detected on site azrxidia, for model hey_snips
[15:07:46] [Asr] was asked to stop listening on site azrxidia
[15:07:46] [Hotword] was asked to toggle itself 'off' on site azrxidia
[15:07:46] [Dialogue] session with id '16427483-191a-4c39-9f0b-199dd4cb0e7e' was started on site azrxidia
[15:07:46] [Asr] was asked to listen on site azrxidia
[15:07:46] [VoiceActivity] Up on site atomecho
[15:07:47] [VoiceActivity] Down on site azrxidia
[15:07:50] [Asr] captured text "" in 4.0s
[15:07:50] [Asr] was asked to stop listening on site azrxidia
[15:07:50] [Dialogue] session with id '16427483-191a-4c39-9f0b-199dd4cb0e7e' was ended on site azrxidia. The session was ended because the platform didn't understand the user
[15:07:50] [Asr] was asked to stop listening on site azrxidia

So what would support the idea that some MQTT message is missing.

As an aside, I also noticed that the wave header is slightly different:

ESP32

RIFF,WAVEfmt ?>}datay?u?y?t?z?v?|?~?q?v?u??s?z?u?y?t?v?v?w?y?q?o?u?|?x?y?n?t?t?u?r?r?v?s?x?q?w?s?x?m?v?s?o?t?s?y?s?w?u?o?~?w?s?t?s?u?y?~?s?~?t?u?{Հ?x?u?u?u?}?xՁ?y?x?x?zՀ?s?u?s?v?z?z?z?p?n?n?w?r?v?z?p?t?r?q?w?|?x?u?o?q?|?y?y?t?t?o?~?}?y?w?p?w?}?~?t?v?v?z?{?{?{?z?u?xՌՋ?~?}?w?yՄՅՂ?{?{?yՂՈ?|Մ?v?~ՂՃ?Հ?|?wՁ?Ղ?z?|?|Յ?ՂՀ?|?}Ղ???|?~Հ?~Մ?y?zՀՈՃՁՂ?}ՆՋ?Ձ?zՄՂՈՁ?Մ?Ո?ՇՂՃՂ?}?}??|ՅՃՄՄՃ?|Ձ?~ՊՏ?}ՂՀՅՉՅՇՂ?~?yՊՄՄՁ?x?~Ձ??~?{?~?zՂ?{?y?~?}?{?|?|?y?z??~?{Ղ?qՀ?uՂ?t?w?|?z?~?x?{?

USB microphone on Raspberry Pi:

RIFF4WAVEfmt ?>}tim??wdata????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????#")%##0$*%%AAHNMVZPHNWALKGC))????????!
???????/!>@>$$2*8@1-?>GIKB:ECV=3/3<19C?=5H=80)& 
                                                ) <@5/,?23JH;

I also check the output of the various commands in Mosquitto to find out which exact message was missing.

---SNIPS----


hermes/hotword/azrxidia/detected {"siteId":"azrxidia","modelId":"hey_snips","modelVersion":"workflow-hey_snips_subww_feedback_10seeds-2018_12_04T12_13_05_evaluated_model_0002","modelType":"universal","currentSensitivity":0.5,"detectionSignalMs":1614003432719,"endSignalMs":1614003432719}

hermes/asr/stopListening ä"siteId":"azrxidia","sessionId":"f0b42455-fc95-4873-84ba-c6136b1dec3e"å
hermes/hotword/toggleOff ä"siteId":"azrxidia","sessionId":"f0b42455-fc95-4873-84ba-c6136b1dec3e"å

hermes/dialogueManager/sessionStarted ä"sessionId":"f0b42455-fc95-4873-84ba-c6136b1dec3e","customData":null,"siteId":"azrxidia","reactivatedFromSessionId":nullå

hermes/asr/startListening ä"siteId":"azrxidia","sessionId":"f0b42455-fc95-4873-84ba-c6136b1dec3e","startSignalMs":1614003432719å

hermes/audioServer/azrxidia/replayRequest {"requestId":"azrxidia-1614003432719","startAtMs":1614003432719,"siteId":"azrxidia"}
hermes/audioServer/azrxidia/replayResponse RIFF^WAVEfmt ?>}tim??wrpidazrxidia-1614003432719rprf
hermes/audioServer/azrxidia/replayResponse
hermes/audioServer/azrxidia/replayResponse
hermes/audioServer/azrxidia/replayResponse

hermes/asr/textCaptured ä"text":"what time is it","likelihood":1.0,"tokens":Ää"value":"what","confidence":1.0,"rangeStart":0,"rangeEnd":4,"time":ä"start":0.0,"end":1.05åå,ä"value":"time","confidence":1.0,"rangeStart":5,"rangeEnd":9,"time":ä"start":1.05,"end":1.17åå,ä"value":"is","confidence":1.0,"rangeStart":10,"rangeEnd":12,"time":ä"start":1.17,"end":1.3199999åå,ä"value":"it","confidence":1.0,"rangeStart":13,"rangeEnd":15,"time":ä"start":1.3199999,"end":2.1ååÅ,"seconds":2.0,"siteId":"azrxidia","sessionId":"f0b42455-fc95-4873-84ba-c6136b1dec3e"å
hermes/asr/stopListening ä"siteId":"azrxidia","sessionId":"f0b42455-fc95-4873-84ba-c6136b1dec3e"å
hermes/nlu/query ä"input":"what time is it","asrTokens":Ää"value":"what","confidence":1.0,"rangeStart":0,"rangeEnd":4,"time":ä"start":0.0,"end":1.05åå,ä"value":"time","confidence":1.0,"rangeStart":5,"rangeEnd":9,"time":ä"start":1.05,"end":1.17åå,ä"value":"is","confidence":1.0,"rangeStart":10,"rangeEnd":12,"time":ä"start":1.17,"end":1.3199999åå,ä"value":"it","confidence":1.0,"rangeStart":13,"rangeEnd":15,"time":ä"start":1.3199999,"end":2.1ååÅ,"intentFilter":Ä"createcandle:get_time","createcandle:set_value","createcandle:stop_timer","createcandle:set_timer","createcandle:get_value","createcandle:set_state","createcandle:get_boolean","createcandle:list_timers","createcandle:get_timer_count"Å,"id":"1eee4474-c274-4a09-a7a5-7a65229839fa","sessionId":"f0b42455-fc95-4873-84ba-c6136b1dec3e"å
hermes/nlu/intentParsed ä"id":"1eee4474-c274-4a09-a7a5-7a65229839fa","input":"what time is it","intent":ä"intentName":"createcandle:get_time","confidenceScore":1.0å,"slots":ÄÅ,"sessionId":"f0b42455-fc95-4873-84ba-c6136b1dec3e","alternatives":Ää"intentName":"createcandle:get_value","confidenceScore":0.06613055,"slots":ÄÅå,ä"intentName":"createcandle:list_timers","confidenceScore":0.048560027,"slots":ÄÅåÅå
hermes/intent/createcandle:get_time ä"sessionId":"f0b42455-fc95-4873-84ba-c6136b1dec3e","customData":null,"siteId":"azrxidia","input":"what time is it","asrTokens":ÄÄä"value":"what","confidence":1.0,"rangeStart":0,"rangeEnd":4,"time":ä"start":0.0,"end":1.05åå,ä"value":"time","confidence":1.0,"rangeStart":5,"rangeEnd":9,"time":ä"start":1.05,"end":1.17åå,ä"value":"is","confidence":1.0,"rangeStart":10,"rangeEnd":12,"time":ä"start":1.17,"end":1.3199999åå,ä"value":"it","confidence":1.0,"rangeStart":13,"rangeEnd":15,"time":ä"start":1.3199999,"end":2.1ååÅÅ,"asrConfidence":1.0,"intent":ä"intentName":"createcandle:get_time","confidenceScore":1.0å,"slots":ÄÅ,"alternatives":Ää"intentName":"createcandle:get_value","confidenceScore":0.06613055,"slots":ÄÅå,ä"intentName":"createcandle:list_timers","confidenceScore":0.048560027,"slots":ÄÅåÅå

also:
hermes/voiceActivity/azrxidia/vadDown {"siteId":"azrxidia","signalMs":1614003434199}
hermes/voiceActivity/azrxidia/vadUp ä"siteId":"azrxidia","signalMs":1614003432077å

----Atom Echo button----

hermes/dialogueManager/startSession æ"init":æ"type":"action","canBeEnqueued": falseå,"siteId":"atomecho"å

hermes/asr/stopListening æ"siteId":"atomecho","sessionId":"cd6118bf-a971-4921-a6b5-59aeb7967a3d"å
hermes/hotword/toggleOff æ"siteId":"atomecho","sessionId":"cd6118bf-a971-4921-a6b5-59aeb7967a3d"å

hermes/dialogueManager/sessionStarted æ"sessionId":"cd6118bf-a971-4921-a6b5-59aeb7967a3d","customData":null,"siteId":"atomecho","reactivatedFromSessionId":nullå

hermes/asr/startListening æ"siteId":"atomecho","sessionId":"cd6118bf-a971-4921-a6b5-59aeb7967a3d","startSignalMs":nullå

hermes/voco/atomecho/mute æ"mute": trueå
...

hermes/voiceActivity/atomecho/vadDown æ"siteId":"atomecho","signalMs":-386å

----Atom Echo hotword detected----

hermes/hotword/azrxidia/detected æ"siteId":"atomecho","modelId":"hey_snips","modelVersion":"workflow-hey_snips_subww_feedback_10seeds-2018_12_04T12_13_05_evaluated_model_0002","modelType":"universal","currentSensitivity":0.5,"detectionSignalMs":-70,"endSignalMs":-70å

hermes/voco/atomecho/play æ"sound_file": "start_of_input"å

hermes/audioServer/atomecho/audioFrame RIFF,WAVEfmt ?>ådata?????????ؽ??????????????????????????????????????????????ؽ????????غ????ؿؽ????غؿؾػؾؿ??ع??????ظ????????ؾ????ؾؾؾ????????ؽ??ؿ????ظ??????صطؽؼػغغ????????ؼ??ضؼؼشؾغ????ع??ؼ??ؼ??ظؼر??ػعػظ??ؾؼ??ذضظص??زؾشؼطؾظؽؾطؽخرخصؼذزشسزظخعػش??زطزػغطضضجزعج??ضرذ??غؼحذظ??ظرش??ؼذطصثزضص??سشش??رشص??عسصح??عظخؼجؽزػرخرشصشذظصحطظشغغصشسسسذسصغزضطشعغص??ؽظؿظرظطؼصؿ??عصغؼؽطػص???
hermes/asr/stopListening æ"siteId":"atomecho","sessionId":"a9e2c85d-cdfd-40ce-9ad6-ad30c9ed2868"å
hermes/audioServer/atomecho/audioFrame RIFF,WAVEfmt ?>ådata???غ??????ظؿؾ????ؼؾظغطغ??ظؿع??ط??ؾع??ططظعشظغ??ظؽعظؼطضسطضؼشرطسعسطظغؽؿ????صعصغ??غؿؼظط????????ػؾؽؼ????ؼ??صػظػظؾظػػععؼػ??ؾ??سؿشغػششرؽخغشػؽزؼضزسض??ضصض??سظػػرؽعؼػعظص??صغسؼؼػظؼظغؼؼظ????ظػ??ؿعغعظظ??ػع??ظحؽش??ضظعػطصذ??ظؿؽظطصؽصؽشغسسط??صؾؿغعغؽؽضعغ??ػػؽؽ??ؼص??ظدضعضصضغظخصػطؽضػتػشظرعصصدذشطزطخخظظذزظضصؾد??حرخ?
hermes/hotword/toggleOff æ"siteId":"atomecho","sessionId":"a9e2c85d-cdfd-40ce-9ad6-ad30c9ed2868"å

hermes/dialogueManager/sessionStarted æ"sessionId":"a9e2c85d-cdfd-40ce-9ad6-ad30c9ed2868","customData":null,"siteId":"atomecho","reactivatedFromSessionId":nullå

hermes/asr/startListening æ"siteId":"atomecho","sessionId":"a9e2c85d-cdfd-40ce-9ad6-ad30c9ed2868","startSignalMs":-70å

hermes/voco/atomecho/mute æ"mute": trueå

hermes/voiceActivity/atomecho/vadDown {"siteId":"atomecho","signalMs":-386}


@Romkabouter
Copy link
Owner

Romkabouter commented Feb 22, 2021

What is the samplerate of voco?
This software output 16000 16bit mono, each MQTT messsage containt 512 bytes wavedata and 44 but header, a total of 556 bytes per message.

And this software does not support snips anymore, which version of snips does voco use?

I can see if I can get it to work with voco, but I won't put that in master due to the fact that snips is not supported.
I will create a new branch for it

@flatsiedatsie
Copy link
Author

flatsiedatsie commented Feb 22, 2021

That would rock! I was looking into the header to see if I had to change something there.

As far as I know, Voco uses the latest version of Snips.

Does Rhasspy use different audio headers from that last version of Snips?

The mystery is: why does it sometimes actually work?

@Romkabouter
Copy link
Owner

Where can I find this "VoiceActivity" logs on such?

The audio produced is just a lot of small wave files. Rhasspy does not differ from snips with regards to that.
But the chunksize from Voco is different than what is send by the streamer

This is the wave format: http://soundfile.sapp.org/doc/WaveFormat/
From Voco: RIFF4WAVEfmt
From streamer: RIFF,WAVEfmt

The 4 and , are not a 4 and a , but represent the 4 bytes of the ChunkSize.
So if the chunk size in not the same then depending on how Voco is implemented, it might deal with the audio unexpectedly

@Romkabouter
Copy link
Owner

@flatsiedatsie
Copy link
Author

The 4 and , are not a 4 and a , but represent the 4 bytes of the ChunkSize.

Ah, interesting. I figured it might matter. I also wonder if the tim part was some kind of time index.

To debug, I use these commands:

Snips Watch: ~/.webthings/addons/voco/snips/snips-watch -vvvv

Looking at ALL MQTT traffic: mosquitto_sub -v -h 192.168.2.167 -t 'hermes/#'
(mosquitto sub isn't installed by default, so sudo apt-get install mosquitto-clients will fix that)

And if you need to quickly restart the gateway for some reason: sudo systemctl restart webthings-gateway.service

@Romkabouter
Copy link
Owner

Ah, interesting. I figured it might matter. I also wonder if the tim part was some kind of time index.

Yes, when I see this:
hermes/audioServer/azrxidia/replayResponse RIFF^WAVEfmt ?>}tim??wrpidazrxidia-1614003432719rprf
that does not really look like a good waveheader to me, it should somewhere contain the plain letter data as well
The link I posted was how a wave format header should look like, I do not know how the replayRespoonse is generated.

In the streamer I use a fixed header, because every wav audio send is the same length and format :)
I will ignore the replayResponse for now and will dive into voco a bit to find out what voco expects.

@flatsiedatsie
Copy link
Author

Yes I've only seen that replayRequest command once. It's not documented either, all I could find were two references to Snips code on Github. I'd just ignore it.

that does not really look like a good waveheader to me

0_0 :-)

@Romkabouter
Copy link
Owner

If have tried a bare snips with the demo assistant with a mic attached to the pi, I get timeouts:

image

I think Snips is broken and this is actually causing the timeouts, I will try this setup with voco as well.

@flatsiedatsie
Copy link
Author

flatsiedatsie commented Mar 1, 2021

AI, that doesn't sound good... Is that with a different wav encoding? Are you using the 6.0 version?

@Romkabouter
Copy link
Owner

Romkabouter commented Mar 1, 2021

No, this is Snips installed on a Pi with an attached mic. Nothing to do with this code.
I wanted to see which message are going back and forth to snips to see what is needed for a correct Dialogue session.

That is why I installed the demo assistant.
I followed https://docs.snips.ai/articles/raspberrypi/manual-setup, which is old but still the latest version :)

As you can see, the hotword is detected ok, I also tested the mic outside snips. All works.
But I still get a timeout, it should detect the weather intent

ok edit, no ASR is running ;)

@flatsiedatsie
Copy link
Author

Keeping my fingers crossed over here :-)

@Romkabouter
Copy link
Owner

I followed this step and now snips is working :)

snipsco/snips-issues#161

@Romkabouter
Copy link
Owner

progress :)

image

@flatsiedatsie
Copy link
Author

Hey wow!! Great work!

What's still left to do? It looks to me like a 100% succes?

@Romkabouter
Copy link
Owner

This was a test with a Pi with a build-in mike.
To verify that snips still works.

The next step for me is to check and adjust the code for the streamer to get Snips going (again).
When that works, I can move on to Voco :)
First with a build-in mike and if that works the streamer. So there is a road ahead :)

@flatsiedatsie
Copy link
Author

Ah I see. Are you sure the effort is worth it? We could just wait for Voco to move to Rhasspy. That has to happen at some point anyway.

@Romkabouter
Copy link
Owner

Well, it would be nice to find what the issue is, but that is just me :D

I see in the messages RIFF4, as you also already found. The bytes send per message is also 572 instead of the 556 that the streamer sends.
I believe this is the root cause of the bad performance, the wave audio does not match.

@flatsiedatsie
Copy link
Author

That's what I figured as well.

@Romkabouter Romkabouter added the enhancement New feature or request label Mar 21, 2021
@Romkabouter
Copy link
Owner

Sorry about my lack of commitment, my sd card died so I had to start over which I did not have time for.
This is still on my to-do list however :)

@flatsiedatsie
Copy link
Author

No worries. Pretty busy overhere as well :-)

@Romkabouter
Copy link
Owner

I have the code working with snips, however there are some header changes which I can't get right atm.
When hey snips is spoken, it works if you speak it a bit slower, so it is most likely a wave format thing.
I have recorder an audio file which triggers the hotword 100% with the code. I will attach that to the branch :)

@flatsiedatsie
Copy link
Author

That's great news! I'm going to check this out asap!

@Romkabouter
Copy link
Owner

This is the result of the audio file
image

After the hotword it picks up the audio as expected
image

The timeout occurs because I have no intent handler.

@flatsiedatsie
Copy link
Author

Where can I find the code? I had a look at the voco branch, but that doesn't seem to be it?

@Romkabouter
Copy link
Owner

Just pushed it. I have also included a record.py which records a stream for a couple of seconds.
I was experimenting with the header, but found the I did not have to make a change strange enough.

Just pushed branch works with met Atom Echo, but the recordlevel seems to be very low. This will also be the case with the master branch I assume

@flatsiedatsie
Copy link
Author

Cool, I will try it now!

@flatsiedatsie
Copy link
Author

flatsiedatsie commented May 27, 2021

It took a bit of work to be able to upload it via the Arduino IDE again. I had to strip out the LED parts.

Now that it uploads, I get this error. Nothing to worry about, I just have to look into it.

22:17:17.029 -> Creating I2Stask
22:17:17.029 -> Enter WifiDisconnected
22:17:17.029 -> Total heap: 293476
22:17:17.029 -> Free heap: 231364
22:17:19.401 -> Enter WifiConnected
22:17:19.401 -> Connected to Wifi with IP: 192.168.2.137, SSID: mywifiname, BSSID: B0:95:75:9F:FE:CF, RSSI: -49
22:17:19.438 -> Enter MQTTDisconnected
22:17:19.438 -> Connecting MQTT: 192.168.2.166, 1883
22:17:19.475 -> end of setupEnter MQTTConnected
22:17:19.586 -> Connected as atomecho
22:17:19.586 -> Enter Idle
22:17:19.586 -> Guru Meditation Error: Core  0 panic'ed (LoadProhibited). Exception was unhandled.
22:17:19.586 -> Core 0 register dump:
22:17:19.586 -> PC      : 0x4015618e  PS      : 0x00060b30  A0      : 0x800d11ec  A1      : 0x3ffde570  
22:17:19.586 -> A2      : 0x00000000  A3      : 0x3ffde5e0  A4      : 0x00000200  A5      : 0x3ffde5ac  
22:17:19.623 -> A6      : 0x00000064  A7      : 0x00000000  A8      : 0x00000001  A9      : 0x0000000b  
22:17:19.623 -> A10     : 0x3ffb96ac  A11     : 0x3ffde58f  A12     : 0x00000000  A13     : 0x00000008  
22:17:19.623 -> A14     : 0x00060b23  A15     : 0x00000000  SAR     : 0x00000000  EXCCAUSE: 0x0000001c  
22:17:19.623 -> EXCVADDR: 0x00000014  LBEG    : 0x00000000  LEND    : 0x00000000  LCOUNT  : 0x00000000  
22:17:19.660 -> 
22:17:19.660 -> ELF file SHA256: 0000000000000000
22:17:19.660 -> 
22:17:19.660 -> Backtrace: 0x4015618e:0x3ffde570 0x400d11e9:0x3ffde5a0 0x400d2174:0x3ffde5d0 0x40089bce:0x3ffdea50
22:17:19.660 -> 
22:17:19.660 -> Rebooting...

@flatsiedatsie
Copy link
Author

Got a bit further.

Should I change some of the settings to get it to continuously stream audio? I'm assuming I should not use hotward detection.

The Connected as null is a bit strange. I tried adding siteid to the config file, but that didn't seem to solve it.

22:41:39.027 -> Rebooting...
22:41:39.027 -> ets Jun  8 2016 00:22:57
22:41:39.027 -> 
22:41:39.027 -> rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
22:41:39.027 -> configsip: 188777542, SPIWP:0xee
22:41:39.027 -> clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
22:41:39.027 -> mode:DIO, clock div:1
22:41:39.027 -> load:0x3fff0018,len:4
22:41:39.061 -> load:0x3fff001c,len:1044
22:41:39.061 -> load:0x40078000,len:10124
22:41:39.061 -> load:0x40080400,len:5856
22:41:39.061 -> entry 0x400806a8
22:41:39.386 -> Boo⸮M5Atom initializing...Loading configuration
22:41:39.574 -> {
22:41:39.574 ->   "mqtt_host": "192.168.2.166",
22:41:39.574 ->   "mqtt_port": 1883,
22:41:39.574 ->   "mqtt_user": "",
22:41:39.574 ->   "mqtt_pass": "",
22:41:39.574 ->   "sideid": "atomecho",
22:41:39.574 ->   "mqtt_valid": true,
22:41:39.574 ->   "mute_input": false,
22:41:39.574 ->   "mute_output": false,
22:41:39.574 ->   "amp_output": 0,
22:41:39.574 ->   "brightness": 30,
22:41:39.574 ->   "hotword_brightness": 100,
22:41:39.574 ->   "hotword_detection": 0,
22:41:39.574 ->   "volume": 100,
22:41:39.611 ->   "gain": 5
22:41:39.611 -> }
22:41:39.611 -> Creating I2Stask
22:41:39.611 -> Enter WifiDisconnected
22:41:39.611 -> Total heap: 293084
22:41:39.611 -> Free heap: 230916
22:41:43.910 -> Enter WifiConnected
22:41:43.910 -> Connected to Wifi with IP: 192.168.2.137, SSID: sterrenkijker_nomap, BSSID: B0:95:75:9F:FE:CF, RSSI: -50
22:41:43.945 -> Enter MQTTDisconnected
22:41:43.945 -> Connecting MQTT: 192.168.2.166, 1883
22:41:43.945 -> end of setupConnect failed, retry
22:41:48.932 -> Audio connected: 1, Async connected: 0
22:41:48.932 -> Enter MQTTDisconnected
22:41:48.932 -> Connecting MQTT: 192.168.2.166, 1883
22:42:20.105 -> Connect failed, retry
22:42:20.105 -> Audio connected: 0, Async connected: 0
22:42:20.105 -> Enter MQTTDisconnected
22:42:20.105 -> Connecting MQTT: 192.168.2.166, 1883
22:42:38.203 -> Connect failed, retry
22:42:38.203 -> Audio connected: 0, Async connected: 0
22:42:38.203 -> Enter MQTTDisconnected
22:42:38.240 -> Connecting MQTT: 192.168.2.166, 1883
22:42:56.724 -> Connect failed, retry
22:42:56.724 -> Audio connected: 0, Async connected: 0
22:42:56.724 -> Enter MQTTDisconnected
22:42:56.724 -> Connecting MQTT: 192.168.2.166, 1883
22:43:15.229 -> Connect failed, retry
22:43:15.229 -> Audio connected: 0, Async connected: 0
22:43:15.229 -> Enter MQTTDisconnected
22:43:15.229 -> Connecting MQTT: 192.168.2.166, 1883
22:43:33.736 -> Connect failed, retry
22:43:33.736 -> Audio connected: 0, Async connected: 0
22:43:33.772 -> Enter MQTTDisconnected
22:43:33.772 -> Connecting MQTT: 192.168.2.166, 1883
22:43:52.241 -> Connect failed, retry
22:43:52.279 -> Audio connected: 0, Async connected: 0
22:43:52.279 -> Enter MQTTDisconnected
22:43:52.279 -> Connecting MQTT: 192.168.2.166, 1883
22:44:02.380 -> Connect failed, retry
22:44:02.380 -> Audio connected: 1, Async connected: 0
22:44:02.380 -> Enter MQTTDisconnected
22:44:02.380 -> Connecting MQTT: 192.168.2.166, 1883
22:44:02.492 -> Enter MQTTConnected
22:44:02.492 -> Connected as null
22:44:02.492 -> going from mqtt connected to idle
22:44:02.492 -> Enter Idle
22:44:02.492 -> still in idle
22:44:02.492 -> end of idle
22:44:02.492 -> Guru Meditation Error: Core  0 panic'ed (LoadProhibited). Exception was unhandled.
22:44:02.492 -> Core 0 register dump:
22:44:02.492 -> PC      : 0x40157776  PS      : 0x00060d30  A0      : 0x800d161c  A1      : 0x3ffde570  
22:44:02.492 -> A2      : 0x00000000  A3      : 0x3ffde5e0  A4      : 0x00000200  A5      : 0x3ffde5ac  
22:44:02.525 -> A6      : 0x00000064  A7      : 0x00000000  A8      : 0x0000000b  A9      : 0x00000068  
22:44:02.525 -> A10     : 0x3ffb20d0  A11     : 0x3ffde58f  A12     : 0x00000000  A13     : 0x00000008  
22:44:02.525 -> A14     : 0x00060d23  A15     : 0x00000000  SAR     : 0x00000000  EXCCAUSE: 0x0000001c  
22:44:02.525 -> EXCVADDR: 0x00000014  LBEG    : 0x00000000  LEND    : 0x00000000  LCOUNT  : 0x00000000  
22:44:02.525 -> 
22:44:02.560 -> ELF file SHA256: 0000000000000000
22:44:02.560 -> 

@Romkabouter
Copy link
Owner

Gain is actualy only used in the Matrix Voice I think. Expect unexpected results!

@flatsiedatsie
Copy link
Author

Good news: I managed to get it to detect a hotword by shouting very loudly.

I'm looking closer at how the back-and-forth with Snips is going. After it detects the hotword, the ASR doesn't receive audio (timeout).

{"sessionId":"42b02e1c-331e-4aaf-abb1-5a548abedeec","customData":null,"siteId":"ATOMECHO","reactivatedFromSessionId":null}
{"sessionId":"42b02e1c-331e-4aaf-abb1-5a548abedeec","customData":null,"termination":{"reason":"timeout","component":"asr"},"siteId":"ATOMECHO"}

@flatsiedatsie
Copy link
Author

There is a doubling going on again it seems.

14:10:31.055 -> end of idle. Stream was set to true.
14:10:31.055 -> Total heap: 293320
14:10:31.055 -> Free heap: 147664
14:10:31.055 -> Incoming MQTT message. Topic: hermes/voco/ATOMECHO/play
14:11:10.843 -> Incoming MQTT message. Topic: hermes/hotword/azrxidia/detected
14:11:11.058 -> Incoming MQTT message. Topic: hermes/voco/ATOMECHO/play
14:11:11.058 -> Incoming MQTT message. Topic: hermes/hotword/toggleOff
14:11:11.093 -> toggleOff message was for us
14:11:11.093 -> SessionId in toggleOff:59a6374e-89c4-49b7-a654-d77f77a7384c
14:11:11.093 -> Hotword detected event
14:11:11.093 -> Enter HotwordDetected
14:11:11.093 -> -Semaphone something
14:11:11.093 -> -Re-stream
14:11:25.930 -> Incoming MQTT message. Topic: hermes/hotword/toggleOn
14:11:25.930 -> toggleOn message was for us. Going to idle mode.
14:11:25.930 -> hw-detected-go-back-to-idle
14:11:25.930 -> Enter Idle
14:11:25.968 -> still in idle
14:11:25.968 -> end of idle. Stream was set to true.
14:11:25.968 -> Total heap: 293412
14:11:25.968 -> Free heap: 150684
14:11:26.472 -> Incoming MQTT message. Topic: hermes/voco/ATOMECHO/play
14:12:32.626 -> One of them failed: Enter MQTTDisconnected
14:12:32.626 -> Audio connected: 0, Async connected: 0
14:12:32.626 -> Enter MQTTDisconnected
14:12:32.626 -> Connecting MQTT: 192.168.2.165, 1883
14:12:32.626 -> Connecting MQTT: 192.168.2.165, 1883
14:12:32.626 -> asyncclient connect was called
14:12:32.626 -> asyncclient connect was called
14:12:32.626 -> also reconnecting to audio
14:12:32.626 -> also reconnecting to audio
14:12:47.011 -> 
14:12:47.011 -> ELF file SHA256: 0000000000000000
14:12:47.046 -> 
14:12:47.046 -> Backtrace: 0x40088938:0x3ffbf9d0 0x40088bb5:0x3ffbf9f0 0x40140d30:0x3ffbfa10 0x400870c9:0x3ffbfa30 0x4000cff5:0x3ffde0d0 0x400db815:0x3ffde0f0 0x400db88e:0x3ffde130 0x400d156d:0x3ffde160 0x400d15a7:0x3ffde1f0 0x400d16c7:0x3ffde210 0x400d1fa2:0x3ffde230 0x40089c06:0x3ffde6b0
14:12:47.046 -> 
14:12:47.046 -> Rebooting...

@flatsiedatsie
Copy link
Author

flatsiedatsie commented Jun 10, 2021

These are some messages going to the ASR:

hermes/asr/stopListening {"siteId":"azrxidia","sessionId":"569ac21f-cde0-4004-be21-f6112640cfdf"}
hermes/asr/startListening {"siteId":"azrxidia","sessionId":"569ac21f-cde0-4004-be21-f6112640cfdf","startSignalMs":1623327749156}
hermes/asr/stopListening {"siteId":"azrxidia","sessionId":null}

hermes/asr/stopListening {"siteId":"ATOMECHO","sessionId":"bf788e48-fe11-4206-9469-5ac4ec3fd8bd"}
hermes/asr/startListening {"siteId":"ATOMECHO","sessionId":"bf788e48-fe11-4206-9469-5ac4ec3fd8bd","startSignalMs":-20}
hermes/asr/stopListening {"siteId":"ATOMECHO","sessionId":null}

The StartSignalMS seems to be a strange value: -20. Maybe that's because the time data isn't in the audio stream?

@Romkabouter
Copy link
Owner

As I do not know what your code looks like, I do not know where the doubling occurs.

Is your asr listening? Depens on the snips.toml file I believe

@flatsiedatsie
Copy link
Author

The ASR does work for other satellites in the house, which are based on Voco/Snips. Perhaps they are sending an extra message.

The latest Arduino code can be found here: https://github.com/flatsiedatsie/voco_mini_sat

@Romkabouter
Copy link
Owner

Romkabouter commented Jun 15, 2021

Ok, checking your code.

  1. why do you have the publish to asr/stopListening" on line 44? This actually stops the ASR from listening in the hotworddetected state
  2. Is that the exact code? Is this also p[rinted twice? Serial.println("Creating I2Stask"); If so, maybe the check f (i2sHandle == NULL) does not work as expected

@flatsiedatsie
Copy link
Author

flatsiedatsie commented Jun 16, 2021

Sharp eyes -) I was trying to stop and then restart the ASR, hoping that would fix the issue. But then I tried skipping the HotwordDetected state alltogether. So currently that code is never called. All the HotwordDetected state did, was to stop the stream and restart it, which I suspected wasn't needed if there wasn't on-board hotword detection being done.

I've only removed the wifi password :-)

Just in case you'd like to try uploading via the arduino IDE yourself:

  1. Tou'll need to add ESP32 support. In the menu go to settings, and add these two lines under additional board manager urls:
https://raw.githubusercontent.com/espressif/arduino-esp32/gh-pages/package_esp32_index.json
https://dl.espressif.com/dl/package_esp32_index.json

(maybe restart the IDE)

Then under tools -> boards -> boards manager select M5Atom-stack as the device.

  1. Make sure the USB port is selected as the serial port (under tools as well)

  2. Make sure the serial monitor is closed in case it's already open. Then click on ESP32 Sketch data upload under the tools menu. This will upload the settings file to the SPIFF storage.

  3. Upload the code (arrow button in the top-left)

  4. Open the serial monitor (under tools), and you can see the serial output.

I believe I've managed to remove the double call of MMQTTDisconnected state. The run method was apparently calling it before it has switched to the new state, causing the state to be initiated twice. It now no longer crashes after the first recognition of "hey snips".

The strange thing is that the ASR stops responding for the entire system if I use the AtomEcho. The ASR also stops responding to the main microphone, although it still does hotword detection fine. A session is also created just fine.

Sometimes the ASR stops working alltogether, and sometimes it will work 50%, intermittently: after a succesfull run it will not respond the next time, until it times out, and then start responding again after that, and so forth. This seems to only happens if the AtomEcho is on the network.

The AtomEcho also seems to go into reboot loops. I'm not sure how that's even possible. It's as if it remembers that the previous time it booted up, it failed, and will continue to do so until I unplug it, and then plug it in again.

@flatsiedatsie
Copy link
Author

Just saw another strange situation where I disconnected the AtomEcho, and then the ASR started only listening for 1 second on the main microphone.

[11:15:07] [Hotword] detected on site azrxidia, for model hey_snips
[11:15:07] [Asr] was asked to stop listening on site azrxidia
[11:15:07] [Hotword] was asked to toggle itself 'off' on site azrxidia
[11:15:07] [Dialogue] session with id '1819212f-7fe3-40c9-83f7-26021d46f671' was started on site azrxidia
[11:15:07] [Asr] was asked to listen on site azrxidia
[11:15:09] [Asr] captured text "unknownword" in 1.0s
[11:15:09] [Asr] was asked to stop listening on site azrxidia
[11:15:09] [Nlu] was asked to parse input "unknownword"
[11:15:09] [Nlu] intent not recognized for "*"
[11:15:09] [Dialogue] session with id '1819212f-7fe3-40c9-83f7-26021d46f671' was ended on site azrxidia. The session was ended because the platform didn't understand the user
[11:15:09] [Asr] was asked to stop listening on site azrxidia
[11:15:09] [Hotword] was asked to toggle itself 'on' on site azrxidia
[11:15:13] [Hotword] detected on site azrxidia, for model hey_snips
[11:15:13] [Asr] was asked to stop listening on site azrxidia
[11:15:13] [Hotword] was asked to toggle itself 'off' on site azrxidia
[11:15:13] [Dialogue] session with id '5efe57ea-f984-4ad1-8342-ba4a9d6a1e47' was started on site azrxidia
[11:15:13] [Asr] was asked to listen on site azrxidia
[11:15:28] [Dialogue] session with id '5efe57ea-f984-4ad1-8342-ba4a9d6a1e47' was ended on site azrxidia. The session was ended because one of the component didn't respond in a timely manner
[11:15:28] [Asr] was asked to stop listening on site azrxidia
[11:15:28] [Hotword] was asked to toggle itself 'on' on site azrxidia
[11:15:38] [Hotword] detected on site azrxidia, for model hey_snips
[11:15:38] [Asr] was asked to stop listening on site azrxidia
[11:15:38] [Hotword] was asked to toggle itself 'off' on site azrxidia
[11:15:38] [Dialogue] session with id '13bfee2f-4b4a-4ae0-8896-916fb8e6d27b' was started on site azrxidia
[11:15:38] [Asr] was asked to listen on site azrxidia
[11:15:40] [Asr] captured text "unknownword" in 1.0s
[11:15:40] [Asr] was asked to stop listening on site azrxidia
[11:15:40] [Nlu] was asked to parse input "unknownword"
[11:15:40] [Nlu] intent not recognized for "*"
[11:15:40] [Dialogue] session with id '13bfee2f-4b4a-4ae0-8896-916fb8e6d27b' was ended on site azrxidia. The session was ended because the platform didn't understand the user
[11:15:40] [Asr] was asked to stop listening on site azrxidia
[11:15:40] [Hotword] was asked to toggle itself 'on' on site azrxidia
[11:16:37] [Hotword] detected on site azrxidia, for model hey_snips
[11:16:37] [Asr] was asked to stop listening on site azrxidia

After that it reverted to the intermittent "ASR listens, ASR is deaf" situation.

@flatsiedatsie
Copy link
Author

I've tried to manually run the ASR and check it's output. Here's what happens with a "normal" call from Voco:

pi@thuis:~/.webthings/addons/voco/snips $ LD_LIBRARY_PATH=. /home/pi/.webthings/addons/voco/snips/snips-asr -u /home/pi/.webthings/data/work -a /home/pi/.webthings/addons/voco/snips/assistant -c /home/pi/.webthings/addons/voco/snips/snips.toml
[11:27:30.198765] INFO :snips_asr_hermes::handler: Using model from "/home/pi/.webthings/data/work/injections/20210209T163004178929730/inj_20210616T092026773150365/asr"
[11:27:30.332529] INFO :snips_kaldi::decode::model: Loading model v2
[11:27:31.958167] INFO :snips_asr_hermes::handler : Preparing decoder
[11:27:31.958415] INFO :snips_asr_hermes::handler : Preparing decoder
[11:28:11.557659] INFO :snips_asr_hermes::handler : Listening at site id azrxidia
[11:28:11.557826] INFO :snips_asr_hermes::handler : Listening
[11:28:11.704154] INFO :snips_asr_lib::asr        : T0       entered AsrRunner::run
[11:28:11.704224] INFO :snips_asr_lib::asr        : T0+0.000 capture started
[11:28:13.883099] INFO :snips_asr_lib::asr        : T0+2.179 endpoint detected (rule:4) frame:155 samples:39680 signal_time:2.48 rtf:0.327
[11:28:13.883973] INFO :snips_asr_lib::asr        : Source thread stop on push: "SendError(..)"
[11:28:13.884145] INFO :snips_asr_lib::asr        : T0+2.180 capture ended
[11:28:13.885827] INFO :snips_asr_lib::asr        : T0+2.182 decoder finalized
[11:28:13.894667] INFO :snips_asr_lib::asr        : T0+2.191 lookup and post-processing done
[11:28:13.894747] INFO :snips_asr_lib::asr        : decoded: [Recognition { decoded_string: "what time is it", likelihood: 1.0, tokens: Some([Token { value: "what", confidence: 1.0, time: (0.0, 1.38), range: 0..4 }, Token { value: "time", confidence: 1.0, time: (1.38, 1.4399999), range: 5..9 }, Token { value: "is", confidence: 1.0, time: (1.4399999, 1.62), range: 10..12 }, Token { value: "it", confidence: 1.0, time: (1.62, 2.31), range: 13..15 }]) }]
[11:28:13.895411] INFO :snips_asr_hermes::handler : Publishing the recognition

And this is all that happens with the AtomEcho:

[11:28:25.235052] INFO :snips_asr_hermes::handler : Preparing decoder
[11:29:24.793911] INFO :snips_asr_hermes::handler : Listening at site id ATOMECHO
[11:29:24.793989] INFO :snips_asr_hermes::handler : Listening

@Romkabouter
Copy link
Owner

Romkabouter commented Jun 16, 2021

All the HotwordDetected state did, was to stop the stream and restart it, which I suspected wasn't needed if there wasn't on-board hotword detection being done.

It also initializes the wave header and updates the led status. I recommend not to fiddle with the status too much.

Just in case you'd like to try uploading via the arduino IDE yourself:
I do not use Arduino IDE ;)

What is this azrxidia I see in all your messages? Can you try to stop that stream?
And can you put the contents of your snips.toml?

@flatsiedatsie
Copy link
Author

I'd be happy to. Here's the snips.toml:
https://github.com/createcandle/voco/blob/master/snips/snips.toml

I've also stripped out the LED parts (there was an error I couldn't fix, so I just stripped it out completely). I've also removed the OTA updates, since that won't be needed either and I figured it might leave more memory.

I've re-enabled the HotwordDetected state, but the result is the same. I'll update the code on github.

@Romkabouter
Copy link
Owner

've also stripped out the LED parts (there was an error I couldn't fix, so I just stripped it out completely).

If you remove the methods updateColors(int colors) and updateBrightness(int brightness) in your device ocde, then nothing will be done :)

I think you need to set this for the AudioServer:

[snips-audio-server]
bind = "+@mqtt"

That is so that the audioserver actually listens to all audio streams.
This setting is then the same as in the [snips-hotword] setting, which might clarify why the hotword is listening and the rest not.
I am not 100% sure though, but setting it to + is not a bad idea in general

@flatsiedatsie
Copy link
Author

I'll give it a go.

I could also add it to common? Perhaps that will help ASR to detect the stream?

I've also added a feature to Voco so that it can provide the current time through an MQTT request. I wanted to experiment with sending the timestamp in the wav header.

@flatsiedatsie
Copy link
Author

Something else I'm curious about: would it be possible to have the AtomEcho connect based on hostname instead of IP address? I seem to see some hints in the settings this might be possible? if so, then the main controller could infuse that hostname into the AtomEcho at the moment of uploading the code.

@Romkabouter
Copy link
Owner

I could also add it to common? Perhaps that will help ASR to detect the stream?

Might be a good idea, than you should have it set for all sections

@Romkabouter
Copy link
Owner

Something else I'm curious about: would it be possible to have the AtomEcho connect based on hostname instead of IP address? I seem to see some hints in the settings this might be possible? if so, then the main controller could infuse that hostname into the AtomEcho at the moment of uploading the code.

It already does if you pust a hostname instead of an IP

@Romkabouter
Copy link
Owner

Hi @flatsiedatsie,

We have come a long way since any activity here.
Did you make any progress on the subject?
Maybe you can checkout my new master branch, I have just released version 7.8.

If you require some help from me, please give me a shout. Otherwise I will close this issue at some point in the future.
I have tried to get Voco running, but ran into some issues which I cannot remember and stopped

@Romkabouter
Copy link
Owner

Romkabouter commented Oct 29, 2021

@flatsiedatsie it seems Voco is not available anymore as Addon, is that correct?
I see a Voice Contol, but that is different. It is still in the list found here:
https://github.com/WebThingsIO/addon-list/tree/master/addons

I just cannot find it in the Addon in Webthings.
Note: I am using the docker image

@flatsiedatsie
Copy link
Author

Voco is only available on the Raspberry Pi.

I spent considerable time on it last time, but unfortunately couldn't get the audio to be coherent enough. Unfortunately in the end I couldn't spend that much time on a 'nice to have' anymore :-(

@Romkabouter
Copy link
Owner

Ah ok, that is probably the issue then. I have a Raspberry Pi available now, do you still want me to put some effort in it?
I still have the branch.

@Romkabouter
Copy link
Owner

I still find this interesting, so I have installed WebThings and could now indeed install voco.
Let's see if I can run it with an USB mike and a speaker and go from there :)

@flatsiedatsie
Copy link
Author

Sure, that would be wonderful! If you live in Amsterdam I can supply you with a good USB mic if you want :-)

@flatsiedatsie
Copy link
Author

I've uploaded the latest version of the code I was working on here:
https://github.com/createcandle/voco-mini-satellite

It would be great if you could try this Arduino workflow (Arduino IDE), because if that works, then it will be possible too flash the code to user devices via the Candle Manager addon for the Webthings Gateway.

@Romkabouter
Copy link
Owner

Sure, that would be wonderful! If you live in Amsterdam I can supply you with a good USB mic if you want :-)

hehe, nope. Some good 200km drive north. But I got one :)

@Romkabouter
Copy link
Owner

I have installed WebThing and VoCo on a Pi. When I type "tell me the time", I expected to have audio output. The correct text appears. Is my expectation incorrect? I have set the output to headphone. speaker-test works

@Romkabouter
Copy link
Owner

ok, apparently I was expecting that incorrect. I got voco running on a Pi now and it is working :)
No to see if I can get this running

@Romkabouter
Copy link
Owner

I thought the issue might be caused by the low energy from the M5 so I tried my matrixvoice.

I get this error:

2021-12-04 09:27:00.626 INFO   : voco: INFO:snips_hotword_lib::audio    : Audio thread for matrixvoice started
2021-12-04 09:27:00.627 INFO   : voco: INFO:snips_hotword_lib::audio    : Net and VAD thread for site matrixvoice started (vad inhibitor: true, vad messages: false
2021-12-04 09:27:00.632 INFO   : voco: ERROR:snips_hotword_lib::audio    : Error in network and VAD thread for site matrixvoice: no more audio in source

So I think it boils down to the audio again. Snips has some extra headers, it might be that this is causing that. I'll see if I can fix it

@flatsiedatsie
Copy link
Author

Yeah those headers, those indeed seem to be the issue.

Glad Voco is working :-) text commands only give text output (designed for quiet operation when kids are sleeping). Voice commands give voice output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants