Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return string rather than bytes from SVG2PDFPreprocessor conversion. #1025

Closed
wants to merge 1 commit into from

Conversation

SpencerPark
Copy link

The conversion should always return valid json but it seems it was slipping by unnoticed for a little while. As suggested in #904, #847 is causing the output extraction preprocessor to try and convert the bytestring to json which is raising an error.

This should fix #904.

@MSeal
Copy link
Contributor

MSeal commented May 14, 2019

I commented in the issue, but I believe this was already addressed in nbconvert 5.5 release. Can you confirm the issue is fix / present in 5.5 before we look at making any change here?

Converting to ascii here is not safe either.

@SpencerPark
Copy link
Author

Woops my bad, thought I was up to date. It is fixed in 5.5.0. #904 can be closed regardless of the rest of our discussion here.

The conversion to ascii shouldn't be a problem here as the string is encoded in base64 so all chars are in ascii ranges.

I still think it is strange to be returning bytes from the pdf conversion though as it adds output to a notebook cell (even if just intermediately). The newest extractoutput preprocessor handles both cases but the comments suggest it is expecting a b64 string (as opposed to b64 bytestring). binascii's a2b_base64 handles it by letting it pass though.

# Binary files are base64-encoded, SVG is already XML
if mime_type in {'image/png', 'image/jpeg', 'application/pdf'}:
    # data is b64-encoded as text (str, unicode),
    # we want the original bytes
    data = a2b_base64(data)

I'll defer to your expertise on whether it is acceptable to have invalid JSON while the notebook json is in memory (an represented as a python object where bytes is fine) potentially for performance reasons? nbformat also seems to handle bytes fine and encodes it during the serialization but displaying a pdf from jupyter would include a base64 string in the output.

@MSeal
Copy link
Contributor

MSeal commented May 14, 2019

Yeah I tried making a similar change and there were some valid use-cases and patterns where the binary is expected for the in-memory representation. I can go dredge up the exact patterns but we probably can't change it in 5.x versions.

Going to close for now as the root issue is solved.

@MSeal MSeal closed this May 14, 2019
@SpencerPark
Copy link
Author

Fair enough, thanks for the review!

@MSeal MSeal added this to the no action milestone Sep 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use of SVG generates TypeError: Object of type 'bytes' is not JSON serializable
2 participants