Skip to content
This repository has been archived by the owner on Sep 16, 2021. It is now read-only.

dleyna-server-service spins mainloop while running #150

Open
heftig opened this issue Jul 19, 2015 · 24 comments
Open

dleyna-server-service spins mainloop while running #150

heftig opened this issue Jul 19, 2015 · 24 comments

Comments

@heftig
Copy link

heftig commented Jul 19, 2015

While idling, dleyna-server-service saturates a CPU. This has been seen by me on Arch Linux, but there also exist reports for Fedora (https://bugzilla.redhat.com/show_bug.cgi?id=1240411) and Debian (https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=779731).

The main GMainContext is constantly dispatching. Maybe there is a g_idle_add source that is never removed. From my cursory examination it seems prv_process_task might be involved.

glib2 2.44.1
dleyna-connector-dbus 0.2.0
dleyna-* 0.5.0

@rsbells
Copy link
Contributor

rsbells commented Jul 29, 2015

Jan - Thanks for filing this issue and attaching the issues reported in Fedora and Debian. We will try to reproduce it.

@heftig
Copy link
Author

heftig commented Oct 23, 2015

The source that's constantly getting dispatched is an idle source with the callback emit_notifications in libgupnp. A bit of tracing revealed that for the broken GUPnPServiceProxy, subscribe and server_handler were being called, but not subscribe_got_response. Yet, Wireshark shows that the SUBSCRIBE request was quickly answered with a 200.

@heftig
Copy link
Author

heftig commented Oct 23, 2015

Anyway, I shut down the media server that triggered the bug (a FRITZ!Box 7272, OS 06.30).

@wolfv
Copy link

wolfv commented Nov 6, 2015

I have this problem as well, and frankly it's been there since quite a while (Fedora 23 currently). It's also not very obvious what dleyna does, and how to stop it.
Having the laptop fan going like crazy when I am only listening to music on rhythmbox or gnome music is a bit stressing.

@wolfv
Copy link

wolfv commented Nov 6, 2015

Unfortunately it's also not possible to delete dleyna server from fedora without also removing rhythmbox and gnome-music as they are dependend on dleyna for whatever reason.

@phako
Copy link

phako commented Mar 13, 2016

That could possibly also be an GUPnP bug - I would also be interested in reproduction

@conrad-heimbold
Copy link

conrad-heimbold commented May 4, 2017

I am currently facing this issue - dleyna-server-service uses 100% of my CPU.
The stacktrace seems to be the same as in https://bugzilla.redhat.com/show_bug.cgi?id=1240411 and https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=779731 .
Output from gstack $(pidof dleyna-server-service):

Thread 4 (Thread 0x7fe374c3c700 (LWP 26929)):
#0  0x00007fe37b08101d in poll () at /lib64/libc.so.6
#1  0x00007fe37b91b166 in g_main_context_iterate.isra () at /lib64/libglib-2.0.so.0
#2  0x00007fe37b91b27c in g_main_context_iteration () at /lib64/libglib-2.0.so.0
#3  0x00007fe374c43fad in dconf_gdbus_worker_thread () at /usr/lib64/gio/modules/libdconfsettings.so
#4  0x00007fe37b942b93 in g_thread_proxy () at /lib64/libglib-2.0.so.0
#5  0x00007fe37b3526ca in start_thread () at /lib64/libpthread.so.0
#6  0x00007fe37b08cf7f in clone () at /lib64/libc.so.6
Thread 3 (Thread 0x7fe37761f700 (LWP 26926)):
#0  0x00007fe37b08101d in poll () at /lib64/libc.so.6
#1  0x00007fe37b91b166 in g_main_context_iterate.isra () at /lib64/libglib-2.0.so.0
#2  0x00007fe37b91b4f2 in g_main_loop_run () at /lib64/libglib-2.0.so.0
#3  0x00007fe37beffa76 in gdbus_shared_thread_func () at /lib64/libgio-2.0.so.0
#4  0x00007fe37b942b93 in g_thread_proxy () at /lib64/libglib-2.0.so.0
#5  0x00007fe37b3526ca in start_thread () at /lib64/libpthread.so.0
#6  0x00007fe37b08cf7f in clone () at /lib64/libc.so.6
Thread 2 (Thread 0x7fe377e20700 (LWP 26925)):
#0  0x00007fe37b08101d in poll () at /lib64/libc.so.6
#1  0x00007fe37b91b166 in g_main_context_iterate.isra () at /lib64/libglib-2.0.so.0
#2  0x00007fe37b91b27c in g_main_context_iteration () at /lib64/libglib-2.0.so.0
#3  0x00007fe37b91b2c1 in glib_worker_main () at /lib64/libglib-2.0.so.0
#4  0x00007fe37b942b93 in g_thread_proxy () at /lib64/libglib-2.0.so.0
#5  0x00007fe37b3526ca in start_thread () at /lib64/libpthread.so.0
#6  0x00007fe37b08cf7f in clone () at /lib64/libc.so.6
Thread 1 (Thread 0x7fe37d572240 (LWP 26923)):
#0  0x00007fe37b08101d in poll () at /lib64/libc.so.6
#1  0x00007fe37b91b166 in g_main_context_iterate.isra () at /lib64/libglib-2.0.so.0
#2  0x00007fe37b91b4f2 in g_main_loop_run () at /lib64/libglib-2.0.so.0
#3  0x00007fe37cf5c8fc in dleyna_main_loop_start () at /lib64/libdleyna-core-1.0.so.4
#4  0x0000560572bd6e14 in main () 

@phako
Copy link

phako commented May 4, 2017

That stack trace shows everything idling... what UPnP devices do you have in your network?

@conrad-heimbold
Copy link

conrad-heimbold commented May 4, 2017

Interestingly, the issue seems to be gone now. I am/was directly attached by cable to the university network of the Leibniz-University Hannover (130.75.0.0/16). I can't tell if and how many UPnP devices there are in the university network; but I guess there are some. It might be gone because the university kicks out devices that spam others with UPnP. I have heard such stories from other universities.

@phako
Copy link

phako commented May 4, 2017

Very odd.

@rtiemann
Copy link

I ran into this issue as well. It happens when there is a "FRITZ!Box Fon WLAN 7240" on the network. After detaching the FRITZ!Box from the network (or simply blocking incoming traffic from that device via iptables) and restarting dleyna-server-service everything starts working again as expected.

A quick look with sysprof quite clearly showed that the main loop must be pretty busy, and a closer look with gdb confirmed this. I cannot, however, find out the exact root cause for this. Any suggestions what I could try to track this down?

Reproducible with dLeyna v0.5 and v0.6 (didn't test other versions) and libgupnp-1.0.

@phako
Copy link

phako commented May 21, 2019

I suspect the FritzBox has its media server enabled? Which FritzOS is it running?

@rtiemann
Copy link

Yes, media server was enabled. Switching it off, however, didn't change anything. FritzOS is 06.05, 1&1-branded (latest version downloadable from AVM seems to be 06.06).

I just saw that gupnp-universal-cp on Ubuntu 18.04.2 also hogs the CPU while the box is on the network. Among others, gupnp-universal-cp links against libgupnp, libgssdp, libsoup, and a whole bunch of GNOME libraries, but it doesn't use dLeyna. In addition, this program also uses 100% CPU time if a D-Link DIR-635 router is present. Blocking the FRITZ!Box and the router made gupnp-universal-cp behave well again.

So, this bug is probably not dLeyna's.

I'd still like to see this bug fixed and I have the time to help out, but not knowing any of the libraries involved, I don't really know where to start. Any hints?

@phako
Copy link

phako commented May 21, 2019

Yeah, that's what I suspected in #150 (comment) - that it's actually a GUPnP bug, presumably in the service introspection.

Would you mind trying to get a backtrace while it is hogging CPU? I have filed https://gitlab.gnome.org/GNOME/gupnp/issues/8 for GUPnP

@rtiemann
Copy link

Here is one:

Thread 5 (Thread 0x7ff6c33ec700 (LWP 6070)):
#0  0x00007ff6d02cdbf9 in __GI___poll (fds=0x5654882536c0, nfds=1, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x00007ff6d0a375c9 in g_main_context_poll (priority=, n_fds=1, fds=0x5654882536c0, timeout=, context=0x5654882537d0) at ../../../../glib/gmain.c:4203
#2  0x00007ff6d0a375c9 in g_main_context_iterate (context=context@entry=0x5654882537d0, block=block@entry=1, dispatch=dispatch@entry=1, self=) at ../../../../glib/gmain.c:3897
#3  0x00007ff6d0a376dc in g_main_context_iteration (context=0x5654882537d0, may_block=1) at ../../../../glib/gmain.c:3963
#4  0x00007ff6c33f436d in  () at /usr/lib/x86_64-linux-gnu/gio/modules/libdconfsettings.so
#5  0x00007ff6d0a5f195 in g_thread_proxy (data=0x565488247320) at ../../../../glib/gthread.c:784
#6  0x00007ff6cfd2f6db in start_thread (arg=0x7ff6c33ec700) at pthread_create.c:463
#7  0x00007ff6d02da88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 4 (Thread 0x7ff6c3fff700 (LWP 6069)):
#0  0x00007ff6d02d4839 in syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1  0x00007ff6d0a7d89a in g_cond_wait_until (cond=cond@entry=0x5654881f1208, mutex=mutex@entry=0x5654881f1200, end_time=end_time@entry=1666124582434) at ../../../../glib/gthread-posix.c:1449
#2  0x00007ff6d0a0a571 in g_async_queue_pop_intern_unlocked (queue=queue@entry=0x5654881f1200, wait=wait@entry=1, end_time=end_time@entry=1666124582434) at ../../../../glib/gasyncqueue.c:422
#3  0x00007ff6d0a0ab2c in g_async_queue_timeout_pop (queue=0x5654881f1200, timeout=timeout@entry=15000000) at ../../../../glib/gasyncqueue.c:543
#4  0x00007ff6d0a5fc3e in g_thread_pool_wait_for_new_pool () at ../../../../glib/gthreadpool.c:167
#5  0x00007ff6d0a5fc3e in g_thread_pool_thread_proxy (data=) at ../../../../glib/gthreadpool.c:364
#6  0x00007ff6d0a5f195 in g_thread_proxy (data=0x5654881f5720) at ../../../../glib/gthread.c:784
#7  0x00007ff6cfd2f6db in start_thread (arg=0x7ff6c3fff700) at pthread_create.c:463
#8  0x00007ff6d02da88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 3 (Thread 0x7ff6c8b24700 (LWP 6068)):
#0  0x00007ff6d02cdbf9 in __GI___poll (fds=0x565488209770, nfds=2, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x00007ff6d0a375c9 in g_main_context_poll (priority=, n_fds=2, fds=0x565488209770, timeout=, context=0x5654882076a0) at ../../../../glib/gmain.c:4203
#2  0x00007ff6d0a375c9 in g_main_context_iterate (context=0x5654882076a0, block=block@entry=1, dispatch=dispatch@entry=1, self=) at ../../../../glib/gmain.c:3897
#3  0x00007ff6d0a37962 in g_main_loop_run (loop=0x565488208010) at ../../../../glib/gmain.c:4098
#4  0x00007ff6cfa580e6 in gdbus_shared_thread_func (user_data=0x565488207670) at ../../../../gio/gdbusprivate.c:275
#5  0x00007ff6d0a5f195 in g_thread_proxy (data=0x5654881f54a0) at ../../../../glib/gthread.c:784
#6  0x00007ff6cfd2f6db in start_thread (arg=0x7ff6c8b24700) at pthread_create.c:463
#7  0x00007ff6d02da88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 2 (Thread 0x7ff6c9325700 (LWP 6067)):
#0  0x00007ff6d02cdbf9 in __GI___poll (fds=0x5654881f64b0, nfds=1, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x00007ff6d0a375c9 in g_main_context_poll (priority=, n_fds=1, fds=0x5654881f64b0, timeout=, context=0x5654881f5190) at ../../../../glib/gmain.c:4203
#2  0x00007ff6d0a375c9 in g_main_context_iterate (context=context@entry=0x5654881f5190, block=block@entry=1, dispatch=dispatch@entry=1, self=) at ../../../../glib/gmain.c:3897
#3  0x00007ff6d0a376dc in g_main_context_iteration (context=0x5654881f5190, may_block=may_block@entry=1) at ../../../../glib/gmain.c:3963
#4  0x00007ff6d0a37721 in glib_worker_main (data=) at ../../../../glib/gmain.c:5772
#5  0x00007ff6d0a5f195 in g_thread_proxy (data=0x5654881f5400) at ../../../../glib/gthread.c:784
#6  0x00007ff6cfd2f6db in start_thread (arg=0x7ff6c9325700) at pthread_create.c:463
#7  0x00007ff6d02da88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7ff6d0ed8540 (LWP 6066)):
#0  0x00007ff6d0a344a0 in g_source_iter_next (iter=iter@entry=0x7fff9ff6d2a0, source=source@entry=0x7fff9ff6d298) at ../../../../glib/gmain.c:950
#1  0x00007ff6d0a36a63 in g_main_context_prepare (context=context@entry=0x5654881e5c20, priority=priority@entry=0x7fff9ff6d320) at ../../../../glib/gmain.c:3452
#2  0x00007ff6d0a374fb in g_main_context_iterate (context=0x5654881e5c20, block=block@entry=1, dispatch=dispatch@entry=1, self=) at ../../../../glib/gmain.c:3882
#3  0x00007ff6d0a37962 in g_main_loop_run (loop=0x56548822bf60) at ../../../../glib/gmain.c:4098
#4  0x00007ff6d07e523e in dleyna_main_loop_start (server=0x565486a5df54 "dleyna-server-service", control_point=0x7ff6d07dfae0 , user_data=0x0) at libdleyna/core/main-loop.c:154
#5  0x0000565486a5de83 in main (argc=1, argv=0x7fff9ff6d598) at daemon.c:93

@rtiemann
Copy link

Oh, frame 4 in thread 5 is dconf_gdbus_worker_thread().

@phako
Copy link

phako commented May 22, 2019

hm. Possibly a rouge glib idle function... I tried to reproduce it with my FritzBox, unfortunately I could not.

@rtiemann
Copy link

I have written a minimalistic UPnP client based on libgupnp to reproduce the problem. Indeed, I was able to reproduce it, so from my perspective, dLeyna is out.

My test program simply searches for "upnp:rootdevice" targets, much like dLeyna does. My callback for GUPnPControlPoint's "device-proxy-available" signal does nothing unless the reported device is the problematic FRITZ!Box 7240. For that device, I get a GUPnPServiceProxy for service "urn:schemas-upnp-org:service:ContentDirectory" by calling gupnp_device_info_get_service(). If this function returns a non-NULL pointer, I subscribe to the service via gupnp_service_proxy_set_subscribed(), because this is what dLeyna does. And this is when the CPU fan starts spinning. :)

Inverting the filter so that all devices except the FRITZ!Box are subscribed to (7 devices) works as expected. That's all I know so far, I'll dig a bit deeper tomorrow.

@phako
Copy link

phako commented May 22, 2019

Thanks :) Can you please add further investigations to the GUPnP gitlab bug? https://gitlab.gnome.org/GNOME/gupnp/issues/8

@rtiemann
Copy link

Yes, as it stands, this seems to be the right place.

@rtiemann
Copy link

Seems like dLeyna isn't off the hook yet... It is definitely GUPnP hogging the CPU, but it stops doing it after a while (see discussion on GitLab).

My test program showed that there is some kind of timeout after which the CPU load goes down after a minute---so back to dLeyna. The timeout also applies here, but gupnp_service_proxy_set_subscribed() gets called again when a previous call of gupnp_service_proxy_set_subscribed() fails.

Trace to first subscription: prv_process_task()dleyna_service_task_process_cb()prv_subscribe()dls_device_subscribe_to_service_changes()gupnp_service_proxy_set_subscribed().

Successive subscriptions are attempted in prv_cd_subscription_lost_cb(), which for the FRITZ!Box is called after one minute. Since the subscription to the FRITZ!Box service never seems to succeed (where "never" means that I have waited for ~30 minutes), we have an infinite loop. (The FRITZ!Box ContentDirectory can be browsed, by the way.)

I don't know how this should be fixed. Counting retries? Maybe by taking the GError passed to prv_cd_subscription_lost_cb() into account?

>>> print *reason
$1 = {
  domain = 560,
  code = 0,
  message = 0x555555b8eb00 "Connection terminated unexpectedly"
}

Please let me know if you need more information or if you want me to try something.

@phako
Copy link

phako commented May 28, 2019

The fix for GUPnP is released as 1.0.4 for the oldstable and will be released as 1.2.2 for the current stable.

@rtiemann
Copy link

Nice, thank you for fixing this issue. :)

I have just checked dleyna-server-service against GUPnP 1.0.4, and sure enough it doesn't use much CPU anymore. The "Connection terminated unexpectedly" error, however, still occurs every minute as described in my previous comment: function gupnp_service_proxy_set_subscribed() gets called from prv_cd_subscription_lost_cb().

I can live with that very slow infinite loop now, but it doesn't look right.

@phako
Copy link

phako commented Apr 8, 2021

It does not handle the GError properly, as in: It's ignoring it completely.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants