-
-
Notifications
You must be signed in to change notification settings - Fork 412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggestion: API/commands for fetching audio #225
Comments
Hi @Miffyli, If anyone has any ideas about these things I'll be happy to hear them :) |
@mwydmuch Anywho I do not think we need anything fancy, especially considering Doom was originally indented to run on old machines. I think these would be enough, at least for a start:
And as for what API would give to user: 2xN matrix where N is the amount of samples played in that state's timeframe. I think "timeframe" = "since last call of get_state", for simplicity. Users can then build longer buffer in Python for analyzing longer pieces of audio. I can create example scenarios/scripts, and generally test the implementation if this is added. |
Alright, thank you @Miffyli for the tips! For now I'm pretty busy, but I think that I will be able to add this by the end of August and then I will ask you for a small tests and review :) |
Hi @mwydmuch : |
I did further digging on this subject earlier, and I think it hits a roadblock: ZDoom uses OpenAL library to create the sound samples from sound sources/listeners and their locations. You'd have to start messing around with OpenAL (and its drivers) to be able to hijack these samples at some part of the way before they are fed into a common buffer. A hacky way to do this would be to create a sound device per each vizdoom instance and capture the audio there, but syncing this up with frames would be difficult if not even impossible. |
@Miffyli Thanks for your answer. |
Theoretically that could work. However since it would skip audio library completely it would not have any processing done by the positional audio (e.g. how strong audio plays on left/right, how faint it is). Now that you mention it, the "sped up" game also makes things harder: If you do things through audio library, it (probably) plays sounds at the natural speed and thus far too slow for the ZDoom running at lightspeeds (at thousands FPS). |
@Miffyli |
Hmm you are right, this could work. I am not sure how easy all the "positional audio processing" would be, but the part of providing samples of sounds-being-played should be possible. It is not perfect but it would be a start. As for implementing something like this: I am not intimately familiar with ZDoom on that side and do not have time to work on this for at least couple months, sadly :( |
@Miffyli |
Hey, was anyone able to get this up? |
I have not worked on this since last posts, my attention shifted to other projects sadly :( . The above issues are still complex to handle, as playing audio (or sound, as it were) is so tightly tied to our "natural passage" of time. |
would it be possible to get audio in "real time" by using the fix in #40 ? |
This approach will require to use ViZDoom async mode to have correctly played audio. |
So I'm guessing if we have multiple games running in parallel, it won't be possible to isolate the sound produced by each game this way? |
You can create virtual outputs in PulseAudio, and then with some commands direct program's audio to sink you want (I can not find those commands right now). It is doable, but bit of a mess. If it does not have to be ViZDoom per se, Unity's ML-agents can be tuned to include audio in the observations by creating the necessary AudioListeners etc in the Unity game. We did this in some experiments and worked quite well. |
if I had to push the audio buffers collected in async mode to the ViZDoomPythonModule.cpp as a part of the game state, what would be the changes I'd have to make? |
@mwydmuch Could you provide quick pointers to above? |
E.g. by commanding "enable_audio" before initing the game, and then receiving additional object in State object which holds audio samples played inside that time frame.
I know it is VizDoom but this could possibly allow bots to "home in" towards high-action areas and/or hear close-by enemies behind them.
The text was updated successfully, but these errors were encountered: