Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Jack Audio support in murmur #3530

Open
SM0VXI opened this issue Oct 1, 2018 · 24 comments
Open

Feature request: Jack Audio support in murmur #3530

SM0VXI opened this issue Oct 1, 2018 · 24 comments
Labels
feature-request This issue or PR deals with a new feature

Comments

@SM0VXI
Copy link

SM0VXI commented Oct 1, 2018

When a channel is created on a murmur server, I would like it to appear in Jack Audio. This is to be able to stream the real-time audio of the murmur server channels to and from third-party (intercom) software, or to the I/O of a multichannel sound card hosted on the murmur server.
This could of course be possible to achieve by running one mumble client per channel on the server, but it quickly adds up to a lot of simultaneous mumble clients, lots of additional users and less flexibility when quickly wanting to add and remove channels.

@Kissaki Kissaki added the feature-request This issue or PR deals with a new feature label Oct 3, 2018
@Kissaki
Copy link
Member

Kissaki commented Oct 3, 2018

I’m curious about the UI of Jack Audio of this. Does Jack Audio present those channels grouped/categorized under the application? I would imagine with a big number of channels it would otherwise get very confusing/hard to find channels in a long list? Would the jack channels be named with a sanitized mumble server hostname/process id and channel name naming scheme?

@SM0VXI
Copy link
Author

SM0VXI commented Oct 3, 2018

To make connections in jack I have only used some of the numerous GUI drag-and-drop control applications, but it does appear to be grouped and categorised.
http://jackaudio.org/applications/

I'm not a programmer, so I have no idea what kind of effort it would take to implement any of this, but ideally the channel names (parent channel/channel name) in murmur would be inherited to show up in jack. As far as I have been able to understand both jack audio and murmur support d-bus, but if that fact makes implementation any easier is way beyond my horizon. Any ideas?

@Yamakaky
Copy link

Alternatively, jack channels could be managed via rpc (Ice?)

@sebastiannielsen
Copy link

I support this. It would make it possible to connect a mumble server with a SIP server for example Asterisk running on the same machine and thus be able to connect phone systems with mumble.

@Kissaki
Copy link
Member

Kissaki commented Sep 1, 2019

The use cases you describe raise two questions though:

  • Privacy/User expectation: If the server can intercept and record audio, how do we present this to the user?
    • Do we show this with a channel icon/state (like recording status)?
  • Visualisation of state: If the server links to another communication end, how do we present this in the client?
    • A linked status state icon?
    • A “talking” indicator?
    • Should it be a pseudo-user/link-user per linked communication system? Kind of like a bot sitting in the channel?

@sebastiannielsen
Copy link

sebastiannielsen commented Sep 1, 2019

@Kissaki I would suggest the linked state icon. This makes it pretty clear for the user that the audio heard may come from another channel (that even might be invisible to the user thus no "talking" indicator). And thus same expectation even if audio comes from a external system.

This also solves the privacy issue, as a linked channel's audio could be sent anywhere or heard by anybody.

This makes it also backwards compatible with older mumble clients, that do not support newer status indicators.

This also means the standard channel linking feature could be removed, instead if you want to link channels together, you do some jack plumbling instead. This would also have possibility for different channel links on the same level without interference.

The admin interface for this (of course only available in the newer clients of course), in channel settings, could be simply that you specify a "jack sink" and a "jack source", and you can specify multiple sinks by separating them with a comma. If any of these is specified, the link state icon appears.

If one channel's sink is the same as another channel's source, channels are linked in one direction only.
If one channel's sink is the same as another channel's source, and this channels's source is the same as that channel's sink, the channel is linked in both directions.

@Kissaki
Copy link
Member

Kissaki commented Sep 1, 2019

Unfortunately it also makes implementing it a lot more effort than simply adding the server functionality though. 🙂

@sebastiannielsen
Copy link

sebastiannielsen commented Sep 1, 2019

I don't think its so much effort.
You just add the server functionality and add 2 text fields in admin interface (for each channel) for specifying jack audio sources and sinks, on the same page you specify max number of users.

Showing the link state icon could be done regardless of if those sources and sinks are in use or not, basically, if the fields "JACK source" or "JACK sink" have any text in them, link icon shows up.

If JACK is not installed on the server, these fields could be greyed out, same if the admin in question doesn't have rights to "link channel".

JACK audio then takes care of the rest, when it comes to the plumbing. Nothing you as a developer needs to take care about.

If you want the older linking functionality could be kept for those that don't want to install JACK just to be able to link channels, but eventually the older linking method could be phased out, since its something that is handled completely server-side anyways.

@wojtek14a
Copy link

This would be HUGE, I'm just experimenting with multiple instances of Mumble running, it's already WAY much better on linux with built in JACK support (ealier I was on Windows with Virtual Audio Cards) but the ability to route audio directly from murmur would be HUGE for integrating intercom systems!

@Krzmbrzl
Copy link
Member

Krzmbrzl commented Sep 8, 2021

Something that speaks against this is that it would prevent the use of end-to-end-encryption which is something I think should find its way into Mumble sooner or later (though the discussion about this belongs into #1813).

@sebastiannielsen
Copy link

@Krzmbrzl How would end2end encryption work with linked channels?

Somehow it could be managed by either not implementing end2end crypto in channels - but only for private communications (ergo 1 to 1 calls), or by implementing so talks in channels aren't encrypted end2end but in a way so server can decode them.

However, for additional security, you could have so channel chats are encrypted with some sort of rolling channel key, meaning that the server, if configured to do so, can still forward audio into intercom systems or phone systems back/forth, but if the server was not configured to do so at the time of the chat, there will be no possibility to decrypt previously sniffed traffic since the key is rolling and only kept in RAM.

@Krzmbrzl
Copy link
Member

Krzmbrzl commented Sep 9, 2021

As I said the discussion about E2E should go into #1813.

As far as this issue here is concerned though I think it's more or less either-or. Either Mumble aims to use E2E (in which case it should be the default and used as much as possible) or it sticks with using client-server-encryption only which would then allow for a feature like this one.

@ranomier
Copy link

maybe consider pipewire directly. It has more features and more control mechanisms.

@IsaMorphic
Copy link

IsaMorphic commented Jan 22, 2023

I am considering implementing this feature into Murmur myself and submitting it as a PR. First, though, I'm going to make a summary of the main points I've observed in this thread, in order of importance:

  1. E2E encryption would completely defeat any hopes of implementing this feature server-side.
  2. Users should know on the client-side that the server has JACK routing enabled.
  3. There should be 1 JACK source per channel (room).

I also want to provide my own interpretation of the feature so others understand my use-case and implementation scenario:

  • I want to use a locally hosted Murmur instance with my friends during livestreams, and be able to send each user's microphone audio as a separate mono stream to a DAW (I use Ardour for my mixing & routing needs)
  • Many others have before expressed the need to record multiple rooms at once without launching several clients.
  • Regarding point no. 2 above, I believe that users should be able to easily enable/disable whether their audio gets routed through JACK. The way I picture it, it could literally be a checkbox on the client-side that tells the server: "hey, emit User A's microphone as part of Room A's interface, but not User B's, because they don't consent to it."
  • Regarding point no. 3, there are a few ways that this can be handled.
    1. Murmur exposes each room to JACK as a singular audio channel, where all the voices are already mixed. This goes against what I've observed of the design philosophy for Murmur, however, where a major goal is that it doesn't do any type of mixing or processing on its own.
    2. Murmur exposes each room to JACK as a multichannel source, where each user is mapped to channel N, N being a given user's ID number for that room. Non-consenting users have their channels muted from JACKs perspective, as in their audio is filled with silence enroute to JACK. The only potential problem I see with this type of mapping is software limitations for DAWs, and JACK itself internally. I'm sure there's some kind of upper limit to how many channels a single stream can expose.
    3. (extra) This one I regard as entirely impractical, have one source in JACK per consenting user, per room. Could quickly become extremely unmanageable usability-wise.

If anyone has any thoughts that they'd like to add to this, my mind is open. I should also quickly mention that this implementation would provide some type of support for the feature discussed in #4112, just on the server-side. I will try and open a PR soon as I am busy with college work at the moment.

Edit: have done some reading on JACK's API and realized my terminology might be confusing. Whenever I refer to a JACK "source" I mean a logical client. Whenever I refer to a "channel" pertaining to a JACK source, I mean a logical port exposed on that client. Also, I realized that JACK ports can be named using a string, so user-ID numbers wouldn't be the only way to distinguish each user's audio.

Cheers~

@Krzmbrzl
Copy link
Member

This all sounds reasonable to me. We might want to consider how this feature (or others like it) should be handled, in case we want to implement E2E after all. I guess one possibility would be to say: if that's the case, such features will be removed.
I guess a feature like this (that depends on the server tapping into the audio sources) should always be implemented with the possibility in mind that it might have to be dropped in the future... 🤷

@IsaMorphic
Copy link

Thanks Krzmbrzl! E2E would definitely kill a feature like this, and in the event that it becomes an official development goal for Mumble I think that should take priority (it is well within the FOSS "own-your-computing" mentality).

With that said I think its still worthwhile to implement the feature in the meantime. It turns out I have some free time today so I will make a fork and get started. I'll open a draft PR within a couple days so that people can comment on the feature's progress, make suggestions, etc as its being implemented.

This will be fun!

@IsaMorphic
Copy link

IsaMorphic commented Jan 23, 2023

An update on my thinking for this specific issue. I talked with a friend about how I'm planning to implement this feature, and they were entirely on-board with it and agreed with the design until I mentioned the privacy concern; after that conversation I'm pretty convinced that this feature should be rejected, at least of the form that it takes in this specific thread.

The big "yikes!" moment for me was when this friend of mine asked me: "if Mumble told you upfront with some kind of message box that if you joined X server, your voice and that of others could be recorded without your knowledge and/or consent; would you still join the server?"

I think its a really good point, especially considering that once the feature is implemented, the code is written and out there. Its not a big stretch of imagination to think that some malicious actor out there would modify the feature (it could quite literally be as easy as removing an if statement) to completely ignore the consent information coming in from the clients, and run it on a public server that potentially sees hundreds of conversations per day.

Quite frankly, it'd almost be evil to make such a feature available at all as described by OP. So I strongly suggest to @Krzmbrzl or another maintainer to close this issue and remove it from the feature board.

With all this said, I do still want the "send separated user audio to JACK in realtime" aspect of this feature. In that respect, I think it would work great to build it in as a couple of extra options in the already existing and popular "recording" feature. To me this looks like a couple of extra radio options just below "mix-down" and "multi-channel", those being "multi-channel + JACK transport" and "JACK transport (standalone)". That way users can decide to use the new JACK feature either alongside a simultaneous recording by Mumble (maybe as a backup, in case they screwed up something in their own setup), or to just do the JACK transport by itself (no files are emitted by Mumble itself).

This client-based solution takes advantage of an existing, accepted feature that already implements a way for users to know when they are being recorded and by who. An added bonus is that then we can have all the E2E encryption we want + this feature, too :)

Everything else I have expressed regarding my actual commitment to implementing the feature still stands.

@SM0VXI
Copy link
Author

SM0VXI commented Jan 23, 2023

Many thanks to everyone bringing this subject to the fore again!

I would like to highlight one thing from the original post that I feel has been lost in recent posts; the possibility to also feed audio from Jack into each channel. Meaning that it should be possible to make the connection between Jack and each murmur server channel bi-directional. While this may not be strictly necessary for gaming purposes it is crucial for other use cases, such as broadcast intercom applications.

E2E encryption has been mentioned. E2E encryption can never be implemented between two users talking to each other within a channel. The reason for this is that audio between users are mixed together on the murmur server. Hence, for channels, the encryption must end at the server, audio mixed and then re-encrypted and sent out to the other channel participants. E2E encryption could however be achieved when two users are communicating directly outside a channel. But the suggestion at hand only involves channels and Jack. So it will never interfere with any possible future implementation of E2E between users.

Also, a comment on the privacy concerns that have been raised. Remember that with current functionality any user can still record the events within a channel that they are subscribing to. So there is really no such thing as strict privacy even in the current implementation.

With that said, I suggest that Jack would appear as a user in the channel user list to indicate that channel audio is sent to and from third party equipment. This could be combined with a Tally Light in the clients, (or in other words an "OnAir" indication), that further underlines this state to the users.

@IsaMorphic
Copy link

IsaMorphic commented Jan 23, 2023

Great to hear back from you! I think it would indeed be fantastic to enable the feature as some type of bot like you describe. But I think at that point, it would make more sense to implement such bots as actual clients (just with minimal or no GUI). That way the feature can still be used even if E2E encryption is implemented in the future. It also restricts the bots to one channel each; a desirable feature in both the privacy context and the context of a wider VOIP network.

Also as a rebuttal to a couple of your points:

  • Murmur doesn't do any mixing lol. Each client does that independently. That's why E2E is under consideration for the project as a whole.
  • Therefore, if E2E is implemented in the future, the server can't see the audio -> neither can Jack.
  • Strict privacy is not really the goal in the first place, communication is. This includes users knowing when and by who they are being recorded, which is different from telling them that their audio is ephemerally (or possibly permanently- users can't know) being sent through Jack to another service. The problem then becomes how users can decide to trust whether the server administrator is honoring their privacy.
  • To that end, Mumble right now tells everyone in a channel that a given user has started a recording, same for when it stops. It is then their choice to leave if they don't want to be recorded. On the other hand, it could create a false sense of trust and security to have the UI indicate that a bot is "only" sending its audio to JACK. Once its in JACK, the server administrator can do practically anything with it, even if contrary to what the UI or server application says or believes.

@SM0VXI
Copy link
Author

SM0VXI commented Jan 24, 2023

Are you sure about the client doing the mixing? That would mean that for a channel that has 100 participants, there would be 100 audio streams to each client. That would also mean that bandwidth usage to each client will vary depending on how many participants there is in a channel. It is not my understanding that this is the case. But I may of course be wrong about this.

Regarding recording what you describe is only true if the recording is made with a recorder within mumble. Any user can record the audio with an external recorder from the sound card of its computer, and mumble will never know about it or give any indication about it to the channel participants. Which is exactly like how you describe the situation should there be a Jack implementation.

@IsaMorphic
Copy link

IsaMorphic commented Jan 24, 2023

I am 100% certain, here's the link to the code that does the mixing on the client (click link to see the full method):

bool AudioOutput::mix(void *outbuff, unsigned int frameCount) {

Regarding your point about out-of-band recording, this is certainly true and a very good counterpoint. The decision could really go either way and I'm sure the maintainers would accept a PR implementing the feature on the server side if one were submitted. But that person is not going to be me :)

I will reference this discussion, as well as the other one I mentioned earlier, in my PR for posterity and consistency.

Cheers!

@SM0VXI
Copy link
Author

SM0VXI commented Jan 25, 2023

Thank you for clarifying this!
Then, since everything runs on its own discrete streams, it would be possible to maintain E2E encryption "client <-> client" as well as "client <-> murmur-Jack-interface" should E2E encryption ever be implemented.
And a client side record ban could be implemented by allowing each client to deny its client streaming back to the Jack interface of the murmur server, should they not want to participate in recordings or other external activities.

@IsaMorphic
Copy link

No problem! Peep that PR 😄👀

@Krzmbrzl
Copy link
Member

Krzmbrzl commented Feb 3, 2023

I can confirm everything IsaMorphic said about how the server deals with audio streams and yes: bandwidth requirements increase with the amount of clients inside a given channel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request This issue or PR deals with a new feature
Projects
None yet
Development

No branches or pull requests

8 participants