Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pyfi fails when handling strings that are too long #3

Open
pswoodworth opened this issue Apr 17, 2018 · 1 comment
Open

Pyfi fails when handling strings that are too long #3

pswoodworth opened this issue Apr 17, 2018 · 1 comment

Comments

@pswoodworth
Copy link
Member

pswoodworth commented Apr 17, 2018

The python JSON parser appears to be running into issues with strings that are too long inside of python. Needs some further investigation.

@pswoodworth pswoodworth changed the title Python appears to fail with strings that are too long inside JSON. Pyfi fails when handling strings that are too long Jun 14, 2018
@pswoodworth
Copy link
Member Author

pswoodworth commented Jun 14, 2018

On further investigation, this seems to be an issue with how the OS handles buffering stdout pipes. When the stdout buffer overflows it flushes the buffer to the pipe, which results in arbitrarily splitting at the exact byte where it hits overflow. The result is that it will either break when python/js tries to convert to a string or when parsing JSON from the string, depending on where the string was split (ie, if it was at the end of a character or in the middle of a character).

There doesn't seem to be a straightforward way to increase the buffer size at the OS level, so we may be stuck with solving it in code.

Options there seem to be:

  1. Split long strings before piping them across, and indicate somehow that the string has been split. This has the advantage that it should be pretty explicit, but the disadvantage that we'll essentially be guessing about at what point the error occurs – ie if the buffer size changes across systems or dynamically on a single system we could still end up hosed.
  2. Try to create our own buffering inside js + python. This has the advantage that it should work regardless of system conditions – the system will flush the buffer whenever it wants to, and our code can handle it, but it has the disadvantage that since neither the sender nor the receiver knows for sure where a bytestring was split or will be split the receiver will essentially just need to assume that data is arriving in the order it was sent in, which is potentially a pretty brittle assumption, particularly given that we're multithreading python operations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant