Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

web: Crash when serializing objects with bytes fields #2532

Closed
ocelotsloth opened this issue Apr 30, 2017 · 10 comments
Closed

web: Crash when serializing objects with bytes fields #2532

ocelotsloth opened this issue Apr 30, 2017 · 10 comments
Labels
bug bugs that are confirmed and actionable

Comments

@ocelotsloth
Copy link
Contributor

ocelotsloth commented Apr 30, 2017

Problem

For some reason the web interface is not working correctly. I can't actually get it to search for anything. When you type something into the search box nothing happens except for the following text being printed to the terminal.

Running this command in verbose (-vv) mode:

$ beet -vv web

Then searching for something in the web interface gives this problem:

~
➔ beet -vv web
 * Running on http://127.0.0.1:8337/ (Press CTRL+C to quit)
127.0.0.1 - - [30/Apr/2017 11:05:15] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [30/Apr/2017 11:05:22] "GET /item/query/isabel HTTP/1.1" 200 -
Error on request:
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/werkzeug/serving.py", line 209, in run_wsgi
    execute(self.server.app)
  File "/usr/lib/python3.6/site-packages/werkzeug/serving.py", line 199, in execute
    for data in application_iter:
  File "/usr/lib/python3.6/site-packages/werkzeug/wsgi.py", line 704, in __next__
    return self._next()
  File "/usr/lib/python3.6/site-packages/werkzeug/wrappers.py", line 81, in _iter_encoded
    for item in iterable:
  File "/usr/lib/python3.6/site-packages/beetsplug/web/__init__.py", line 74, in json_generator
    yield json.dumps(_rep(item, expand=expand))
  File "/usr/lib/python3.6/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "/usr/lib/python3.6/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python3.6/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/usr/lib/python3.6/json/encoder.py", line 180, in default
    o.__class__.__name__)
TypeError: Object of type 'bytes' is not JSON serializable

Setup

  • OS: Arch Linux
  • Python version: Python 3.6
  • beets version: 1.4.3
  • Turning off plugins made problem go away (yes/no): no, the problem is with the web plugin

My configuration (output of beet config) is:

directory: ~/Music
library: ~/Music/musicLibrary.db

import:
    move: yes

plugins: chroma web
acoustid:
    apikey: REDACTED
web:
    host: 127.0.0.1
    port: 8337
    cors: ''
chroma:
    auto: yes
@sampsyo sampsyo added the needinfo We need more details or follow-up from the filer before this can be tagged "bug" or "feature." label May 1, 2017
@sampsyo
Copy link
Member

sampsyo commented May 1, 2017

Hmm; thanks for reporting! I think we may have fixed some similar issues in the web plugin recently. Can I ask you the favor of trying out the latest source? (There are instructions for running the latest source in our FAQ.)

Thanks again!

@ocelotsloth
Copy link
Contributor Author

Sure! Let me pull that down

@ocelotsloth
Copy link
Contributor Author

ocelotsloth commented May 1, 2017

So I tried it with the beets-git AUR package and got the same:

~/projects
➔ beet version
beets version 1.4.4
Python version 3.6.0
plugins: chroma, web

~/projects
➔ beet web    
 * Running on http://127.0.0.1:8337/ (Press CTRL+C to quit)
127.0.0.1 - - [01/May/2017 18:29:47] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [01/May/2017 18:29:47] "GET /static/beets.css HTTP/1.1" 200 -
127.0.0.1 - - [01/May/2017 18:29:47] "GET /static/jquery.js HTTP/1.1" 200 -
127.0.0.1 - - [01/May/2017 18:29:47] "GET /static/underscore.js HTTP/1.1" 200 -
127.0.0.1 - - [01/May/2017 18:29:47] "GET /static/beets.js HTTP/1.1" 200 -
127.0.0.1 - - [01/May/2017 18:29:47] "GET /static/backbone.js HTTP/1.1" 200 -
127.0.0.1 - - [01/May/2017 18:29:48] "GET /favicon.ico HTTP/1.1" 404 -
127.0.0.1 - - [01/May/2017 18:29:52] "GET /item/query/isabel HTTP/1.1" 200 -
Error on request:
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/werkzeug/serving.py", line 209, in run_wsgi
    execute(self.server.app)
  File "/usr/lib/python3.6/site-packages/werkzeug/serving.py", line 199, in execute
    for data in application_iter:
  File "/usr/lib/python3.6/site-packages/werkzeug/wsgi.py", line 704, in __next__
    return self._next()
  File "/usr/lib/python3.6/site-packages/werkzeug/wrappers.py", line 81, in _iter_encoded
    for item in iterable:
  File "/usr/lib/python3.6/site-packages/beetsplug/web/__init__.py", line 77, in json_generator
    yield json.dumps(_rep(item, expand=expand))
  File "/usr/lib/python3.6/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "/usr/lib/python3.6/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python3.6/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/usr/lib/python3.6/json/encoder.py", line 180, in default
    o.__class__.__name__)
TypeError: Object of type 'bytes' is not JSON serializable

I'm going to check and make sure that just pulling from master doesn't give different results.

@ocelotsloth
Copy link
Contributor Author

Yep, more of the same:

~/projects/beets on master
➔ ./beet web
 * Running on http://127.0.0.1:8337/ (Press CTRL+C to quit)
127.0.0.1 - - [01/May/2017 18:33:32] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [01/May/2017 18:33:35] "GET /item/query/isabel HTTP/1.1" 200 -
Error on request:
Traceback (most recent call last):

It's the same so I'll not repeat the rest.

@ocelotsloth
Copy link
Contributor Author

Alright, so I've taken an attempt to try and learn a bit more, but I'm running up against the limits of my understanding of the internals of the library.

Adding a print(item.__repr__()) statement to the item that is trying to be Serialized yields something like this:

Item(id=677, path=b"/home/mstengle/Music/Compilations/Monstercat 016_ Expedition/06 What's Going On.mp3", album_id=49, title="What's Going On", artist='Droptek feat. Isabel Higuero', artist_sort='Droptek feat. Higuero, Isabel', artist_credit='Droptek feat. Isabel Higuero', album='Monstercat 016: Expedition', albumartist='Various Artists', albumartist_sort='Various Artists', albumartist_credit='Various Artists', genre='', composer='', grouping='', year=2014, month=2, day=26, track=6, tracktotal=32, disc=1, disctotal=1, lyrics='', comments='Visit http://music.monstercat.com', bpm=0, comp=1, mb_trackid='ce2d7b40-7997-4a5c-9b98-bf3129c64fb4', mb_albumid='9d4384da-6624-4feb-949a-27d55fa95dd7', mb_artistid='cfd48b9d-acc9-46cb-b275-6f6ac7cb9028', mb_albumartistid='89ad4ac3-39f7-470e-963a-56509c546377', albumtype='compilation', label='Monstercat', acoustid_fingerprint='LONG STUFF', acoustid_id='00b14796-049f-4b42-abe6-1beab37cadd1', mb_releasegroupid='1433d6c9-8263-4c91-876e-7b1ee7a85d43', asin='B00ILEX1YE', catalognum='MC016', script='Latn', language='eng', country='XW', albumstatus='Official', media='Digital Media', albumdisambig='', disctitle='', encoder='', rg_track_gain=None, rg_track_peak=None, rg_album_gain=None, rg_album_peak=None, original_year=2014, original_month=2, original_day=26, initial_key=None, length=205.9557142857143, bitrate=128044, format='MP3', samplerate=44100, bitdepth=0, channels=2, mtime=1493583618.0, added=1493583615.9340143, data_source='MusicBrainz')

I think this is the important item:

path=b"/home/mstengle/Music/Compilations/Monstercat 016_ Expedition/06 What's Going On.mp3"

I went through to the library.py file for beets to see what that data type is, but it just takes me to the PathType and PathQuery classes.

I would try to find a way to add a .decode("utf_8") (not sure if it is correct to assume this encoding), but I can't figure out where that would go.

Is this the right direction to go in to fix it? I don't mind digging a bit further to fix it myself but I'm not sure where the sql call get's turned into an actual object that I can manipulate before it gets out to the JSON serializer.

@sampsyo
Copy link
Member

sampsyo commented May 2, 2017

Thanks for continuing to investigate! Yes, I agree that paths are probably related to the problem, but we actually remove the path field by default (and if the paths are enabled, we decode them to stings):

out['path'] = util.displayable_path(out['path'])

It's strange that I can't reproduce this error… maybe it would be useful to try doing a print(out) just before the return from _rep? That should tell us which bytes values are still remaining.

@ocelotsloth
Copy link
Contributor Author

ocelotsloth commented May 2, 2017

Ah, I see now. Thanks for pointing that out to me. It does look like it is taking the path out:

#### TEST: Post `_rep` `__repr__()`
{'id': 384, 'album_id': 34, 'title': 'Bridge Burning', 'artist': 'Foo Fighters', 'artist_sort': 'Foo Fighters', 'artist_credit': 'Foo Fighters', 'album': 'Wasting Light', 'albumartist': 'Foo Fighters', 'albumartist_sort': 'Foo Fighters', 'albumartist_credit': 'Foo Fighters', 'genre': 'Rock', 'lyricist': '', 'composer': 'Foo Fighters', 'arranger': '', 'grouping': '', 'year': 2011, 'month': 4, 'day': 8, 'track': 1, 'tracktotal': 11, 'disc': 1, 'disctotal': 1, 'lyrics': '', 'comments': '', 'bpm': 0, 'comp': 0, 'mb_trackid': 'd319e25c-d2ae-49dd-9103-c0819cd8877d', 'mb_albumid': 'ba3c72d8-b62e-4dff-ab19-b1bbaabf6924', 'mb_artistid': '67f66c07-6e61-4026-ade5-7e782fad3a5d', 'mb_albumartistid': '67f66c07-6e61-4026-ade5-7e782fad3a5d', 'albumtype': 'album', 'label': 'RCA', 'acoustid_fingerprint': b'SUPER LONG', 'acoustid_id': '03f11f8d-5881-4bf3-9ad0-dfcc89538b98', 'mb_releasegroupid': '91825bcf-e51d-4357-986c-ac2a49b4d205', 'asin': 'B004S4AU9A', 'catalognum': '88697872772', 'script': 'Latn', 'language': 'eng', 'country': 'GB', 'albumstatus': 'Official', 'media': 'CD', 'albumdisambig': 'UK & Europe jewel case release', 'disctitle': '', 'encoder': '', 'rg_track_gain': -10.45, 'rg_track_peak': 0.999969, 'rg_album_gain': 0.999969, 'rg_album_peak': None, 'original_year': 2011, 'original_month': 4, 'original_day': 8, 'initial_key': None, 'length': 286.5342403628118, 'bitrate': 256000, 'format': 'AAC', 'samplerate': 44100, 'bitdepth': 16, 'channels': 2, 'mtime': 1493583207.0, 'added': 1493583207.532419, 'data_source': 'MusicBrainz', 'size': 10384504}
############

And hey! I figured it out! It's the acoustid_fingerprint variable. If you don't have this setup and enabled and have generated fingerprints that would explain why you can't reproduce it.

If you want to deal with this in place with the _rep function, I can write a patch for it.

For now I tested just adding a del command for that and confirmed that it works on my end.

It might be worthwhile for that patch to include some kind of framework additional plugins can hook into that declares dangerous values like that. That would make it so that you don't have to hardcode it in place here for all the addins which may or may not be present on a person's particular configuration. Just a thought, and I'm open to implementing and documenting it as a project.

@ocelotsloth
Copy link
Contributor Author

With a bit more tinkering I found that even with the chroma extension disabled it still picks those values up, so it would need to be a more global configuration schema than just having an extension checking in with the global conf.

I still think it would be a bit heavy handed an approach to manually add any bytes or other non realizable data tags individually by hand. Might be better to scan through and delete any bytes tags by default but allow an extension to override the behaviour if the end user has that extension loaded.

@sampsyo
Copy link
Member

sampsyo commented May 2, 2017

Got it; thanks for nailing it down!

I think the right solution is indeed to fix _rep. An easy way to prevent this from coming up again would be a simple type test: check whether isinstance(value, bytes) to filter out values like this. We have basically two choices:

  • Drop these values altogether.
  • Use one of the pseudo-standard ways to encode binary data for JSON transport—probably encoding the bytes as a string using base64.

@sampsyo sampsyo changed the title beet web: TypeError: Object of type 'bytes' is not JSON serializable web: Crash when serializing objects with bytes fields May 2, 2017
@sampsyo sampsyo added bug bugs that are confirmed and actionable and removed needinfo We need more details or follow-up from the filer before this can be tagged "bug" or "feature." labels May 2, 2017
ocelotsloth added a commit to ocelotsloth/beets that referenced this issue May 3, 2017
This commit fixes issue beetbox#2532 by filtering out any byte values
added by any other extensions and decoding them to strings for the
JSON serializer.

It does not remove any of the keys, instead opting to just convert
them.

Signed-off-by: Mark Stenglein <[email protected]>
ocelotsloth added a commit to ocelotsloth/beets that referenced this issue May 3, 2017
This commit fixes issue beetbox#2532 by filtering out any byte values
added by any other extensions and decoding them to strings for the
JSON serializer.

It does not remove any of the keys, instead opting to just convert
them.

Signed-off-by: Mark Stenglein <[email protected]>
ocelotsloth added a commit to ocelotsloth/beets that referenced this issue May 3, 2017
This commit fixes issue beetbox#2532 by filtering out any byte values
added by any other extensions and decoding them to strings for the
JSON serializer.

It does not remove any of the keys, instead opting to just convert
them.

Signed-off-by: Mark Stenglein <[email protected]>
ocelotsloth added a commit to ocelotsloth/beets that referenced this issue May 3, 2017
This commit fixes issue beetbox#2532 by filtering out any byte values
added by any other extensions and decoding them to strings for the
JSON serializer.

It does not remove any of the keys, instead opting to just convert
them.

Signed-off-by: Mark Stenglein <[email protected]>
@ocelotsloth
Copy link
Contributor Author

That PR should fix the problem by decoding the byte values to strings. I'm not sure if it's correctly using base64 as you suggested, but I'm also not sure how to do that if it's not already in that form.

sampsyo added a commit that referenced this issue May 4, 2017
@sampsyo sampsyo closed this as completed in 376c31a May 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug bugs that are confirmed and actionable
Projects
None yet
Development

No branches or pull requests

2 participants