Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gstdec: Avoid leaking memory when reading audio data #84

Merged
merged 1 commit into from
Jan 20, 2019

Conversation

ssssam
Copy link
Contributor

@ssssam ssssam commented Jan 12, 2019

We were reading audio data with the Gst.Buffer.extract_dup() method.
This allocates new memory using g_malloc() and returns it to the caller.
The memory needs to be freed with g_free(), however the PyGObject
bindings do not do this.

We can avoid problem by reading the audio data directory from the
underlying Gst.Memory object. In this case the Python interpreter is
responsible for copying the data and so it is able to correctly free
the memory after it's no longer needed.

I tested this by calling pyacoustid.fingerprint() on 34 .MP3 files in
sequence, and I saw the following difference:

  • memory usage without the patch: 557052 KB
  • memory usage with the patch: 52752 KB

The generated acoustid fingerprints were identical with and without the
patch.

@ssssam
Copy link
Contributor Author

ssssam commented Jan 12, 2019

This should fix #62, and possibly #40

@ssssam
Copy link
Contributor Author

ssssam commented Jan 12, 2019

In theory we should be able to use Gst.Buffer.extract() to avoid duplicating memory that we can't free. In practice this leads to a crash. I think that there is a bug in the GStreamer gobject-introspection annotations causing that. I don't have enough information to open a bug against GStreamer right now though.

Copy link
Member

@sampsyo sampsyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice digging! This does indeed seem like it might be the memory leak. And it would, of course, be a bad one—we're leaking all the audio data, not just one metadata record per audio file or something tame like that. I have just a couple of suggestions inline.

audioread/gstdec.py Show resolved Hide resolved
audioread/gstdec.py Show resolved Hide resolved
We were reading audio data with the Gst.Buffer.extract_dup() method.
This allocates new memory using g_malloc() and returns it to the caller.
The memory needs to be freed with g_free(), however the PyGObject
bindings do not do this.

We can avoid problem by reading the audio data directory from the
underlying Gst.Memory object. In this case the Python interpreter is
responsible for copying the data and so it is able to correctly free
the memory after it's no longer needed.

I tested this by calling pyacoustid.fingerprint() on 34 .MP3 files in
sequence, and I saw the following difference:

  - memory usage without the patch: 557052 KB
  - memory usage with the patch: 52752 KB

The generated acoustid fingerprints were identical with and without the
patch.
@ssssam ssssam force-pushed the sam/gstreamer-memory-leak-fix branch from 527ed09 to 2cd2456 Compare January 13, 2019 18:39
@sampsyo sampsyo merged commit 2cd2456 into beetbox:master Jan 20, 2019
sampsyo added a commit that referenced this pull request Jan 20, 2019
gstdec: Avoid leaking memory when reading audio data
sampsyo added a commit that referenced this pull request Jan 20, 2019
@ssssam ssssam deleted the sam/gstreamer-memory-leak-fix branch January 20, 2019 21:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants