Suggestion: API/commands for fetching audio #225

Miffyli · 2017-07-20T08:57:22Z

E.g. by commanding "enable_audio" before initing the game, and then receiving additional object in State object which holds audio samples played inside that time frame.

I know it is VizDoom but this could possibly allow bots to "home in" towards high-action areas and/or hear close-by enemies behind them.

mwydmuch · 2017-07-20T23:08:02Z

Hi @Miffyli,
this idea has been on our minds for some time now and I'd love to add it. However, it's easy to just pass OpenAL buffer as it is to state, but I have no idea if it'll be convenient to work with it (unfortunately I've never done any serious sound processing). So we need some help to decide what things we should take care of and what should be configurable (format? stereo/some 3D sound, channels? frequency? sample rate/size?).

If anyone has any ideas about these things I'll be happy to hear them :)

Miffyli · 2017-07-21T08:29:28Z

@mwydmuch
I have background in speech processing but still stuck deciphering the structure of VizDoom source ^^'.

Anywho I do not think we need anything fancy, especially considering Doom was originally indented to run on old machines. I think these would be enough, at least for a start:

8kHz sampling rate (Doom has very low-freq audioscape, from my experience)
16 bit sample size
Two channels (stereo)
Simple PCM, without modifications. I guess OpenAL exposes something like this.

And as for what API would give to user: 2xN matrix where N is the amount of samples played in that state's timeframe. I think "timeframe" = "since last call of get_state", for simplicity. Users can then build longer buffer in Python for analyzing longer pieces of audio.

I can create example scenarios/scripts, and generally test the implementation if this is added.

mwydmuch · 2017-07-22T16:25:20Z

Alright, thank you @Miffyli for the tips! For now I'm pretty busy, but I think that I will be able to add this by the end of August and then I will ask you for a small tests and review :)

piquirez · 2019-12-03T03:24:49Z

Hi @mwydmuch :
Any news on any way to get the sound buffer as part of Doom.get_state()? The research that can be carried out by adding the audio is very interesting. If anyone knows about any method to obtain the sound as part of the inputs it would be much appreciated.

Miffyli · 2019-12-03T18:48:51Z

@piquirez

I did further digging on this subject earlier, and I think it hits a roadblock: ZDoom uses OpenAL library to create the sound samples from sound sources/listeners and their locations. You'd have to start messing around with OpenAL (and its drivers) to be able to hijack these samples at some part of the way before they are fed into a common buffer.

A hacky way to do this would be to create a sound device per each vizdoom instance and capture the audio there, but syncing this up with frames would be difficult if not even impossible.

piquirez · 2019-12-04T02:45:01Z

@Miffyli Thanks for your answer.
As you describe, it does seem quite complicated. I did notice that the sound plays at the same speed no matter what the speed of the screen inputs is, making it very hard to sync them if you wanted to capture the audio since when you use it for training the speed will be different than inference.
However this gave me an idea; I presume that each sound is triggered by a doom instruction in a particular frame, and we know that in real time doom should run at 35 FPS?. In this case it should be possible to save the audio triggers on each frame, and then divide each sound in small samples over 35 FPS. in this way we would obtain one audio sample per frame which is what we are after. Is this something feasible? Even if it was possible to get the audio triggers per frame that would be very helpful, and I could sort out the audio divide part.

Miffyli · 2019-12-04T19:42:22Z

@piquirez

Theoretically that could work. However since it would skip audio library completely it would not have any processing done by the positional audio (e.g. how strong audio plays on left/right, how faint it is). Now that you mention it, the "sped up" game also makes things harder: If you do things through audio library, it (probably) plays sounds at the natural speed and thus far too slow for the ZDoom running at lightspeeds (at thousands FPS).

piquirez · 2019-12-05T04:20:51Z

@Miffyli
I believe the stereo information about a sound should be part of the command that executes the sound. So if all sounds are saved in mono, the stereo version would simply be a mathematical relation based on the position of the player. If we have the information of the frame that plays a sound, which should include where it has to play spatially (stereo information) we could create as you mentioned earlier "2xN matrix where N is the amount of samples played in that state's timeframe". in this case N would be 1/35th of a second of the audio file. In this way, no matter at which speed doom is run, the agent will always get the same sound synchronized to 35 FPS., which will allow it to learn.

Miffyli · 2019-12-05T18:38:22Z

@piquirez

Hmm you are right, this could work. I am not sure how easy all the "positional audio processing" would be, but the part of providing samples of sounds-being-played should be possible. It is not perfect but it would be a start.

As for implementing something like this: I am not intimately familiar with ZDoom on that side and do not have time to work on this for at least couple months, sadly :(

piquirez · 2019-12-06T03:55:33Z

@Miffyli
A couple of months doesn't sound bad. I don't have much time myself either, but I will research on how zdoom handles sounds whenever I get the chance and hopefully I'll be able to help you if you're interested.

hegde95 · 2020-06-10T22:29:55Z

Hey, was anyone able to get this up?

Miffyli · 2020-06-10T22:32:31Z

I have not worked on this since last posts, my attention shifted to other projects sadly :( . The above issues are still complex to handle, as playing audio (or sound, as it were) is so tightly tied to our "natural passage" of time.

hegde95 · 2020-06-10T23:25:12Z

would it be possible to get audio in "real time" by using the fix in #40 ?

mwydmuch · 2020-06-11T00:03:21Z

Hi @hegde95, as described in #40 the audio can be enabled so for sure it's possible to obtain it from os somehow. On Linux, you can probably access Pulse Audio sink using some Python library. But I guess that's all we know about the topic right now.

mwydmuch · 2020-06-11T00:05:51Z

This approach will require to use ViZDoom async mode to have correctly played audio.

hegde95 · 2020-06-11T00:19:01Z

So I'm guessing if we have multiple games running in parallel, it won't be possible to isolate the sound produced by each game this way?

Miffyli · 2020-06-11T08:48:16Z

You can create virtual outputs in PulseAudio, and then with some commands direct program's audio to sink you want (I can not find those commands right now). It is doable, but bit of a mess.

If it does not have to be ViZDoom per se, Unity's ML-agents can be tuned to include audio in the observations by creating the necessary AudioListeners etc in the Unity game. We did this in some experiments and worked quite well.

hegde95 · 2020-08-13T12:01:34Z

if I had to push the audio buffers collected in async mode to the ViZDoomPythonModule.cpp as a part of the game state, what would be the changes I'd have to make?

Miffyli · 2020-08-13T12:04:35Z

@mwydmuch Could you provide quick pointers to above?

mihahauke added enhancement feature request labels Jul 20, 2017

mwydmuch added the help wanted label Jul 20, 2017

mwydmuch added this to the 1.2.0 milestone Jul 22, 2017

Miffyli mentioned this issue Jul 5, 2021

Added audio buffer #486

Merged

mwydmuch closed this as completed in #486 Aug 18, 2021

mwydmuch modified the milestones: 1.2.0, 1.1.11 Nov 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion: API/commands for fetching audio #225

Suggestion: API/commands for fetching audio #225

Miffyli commented Jul 20, 2017

mwydmuch commented Jul 20, 2017

Miffyli commented Jul 21, 2017 •

edited

Loading

mwydmuch commented Jul 22, 2017

piquirez commented Dec 3, 2019

Miffyli commented Dec 3, 2019

piquirez commented Dec 4, 2019

Miffyli commented Dec 4, 2019

piquirez commented Dec 5, 2019

Miffyli commented Dec 5, 2019

piquirez commented Dec 6, 2019

hegde95 commented Jun 10, 2020

Miffyli commented Jun 10, 2020

hegde95 commented Jun 10, 2020

mwydmuch commented Jun 11, 2020

mwydmuch commented Jun 11, 2020

hegde95 commented Jun 11, 2020

Miffyli commented Jun 11, 2020

hegde95 commented Aug 13, 2020

Miffyli commented Aug 13, 2020

Suggestion: API/commands for fetching audio #225

Suggestion: API/commands for fetching audio #225

Comments

Miffyli commented Jul 20, 2017

mwydmuch commented Jul 20, 2017

Miffyli commented Jul 21, 2017 • edited Loading

mwydmuch commented Jul 22, 2017

piquirez commented Dec 3, 2019

Miffyli commented Dec 3, 2019

piquirez commented Dec 4, 2019

Miffyli commented Dec 4, 2019

piquirez commented Dec 5, 2019

Miffyli commented Dec 5, 2019

piquirez commented Dec 6, 2019

hegde95 commented Jun 10, 2020

Miffyli commented Jun 10, 2020

hegde95 commented Jun 10, 2020

mwydmuch commented Jun 11, 2020

mwydmuch commented Jun 11, 2020

hegde95 commented Jun 11, 2020

Miffyli commented Jun 11, 2020

hegde95 commented Aug 13, 2020

Miffyli commented Aug 13, 2020

Miffyli commented Jul 21, 2017 •

edited

Loading