Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix Python binding compilation under Windows #31

Closed
wants to merge 15 commits into from

Conversation

anthrotype
Copy link
Member

Hi,
the setup.py needs to be modified to make sure that the building of the Python extension succeeds on a Windows machine.

  • The symlink trick that @khaledhosny devised to make distutils search the parent directory unfortunately does not work on Win. So we need to use absolute paths to the C/C++ source files instead of symbolic links.
  • also, the "bro" command line utility which is used for testing the Brotli library cannot be built under Windows because it requires several POSIX-only headers. Besides, in order to run the included shell scripts one also needs the bash shell and several other UNIX tools, which are not present on Windows by default. To be able to test the Python extension, I made a quick Python script which mimics the command line utility, it's called "bro.py". I also made equivalent test scripts ("roundtrip_test.py" and "compatibility_test.py") which do not require the presence of bash, cat or diff.

I tested the scripts on Windows, Cygwin and OS X. You can use them to test that the Python extension works on non-UNIX systems.

All best,

Cosimo


try:
if options.decompress:
bufsize = options.bufsize or 10*len(data)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default here seems a bit fragile. The compression ratio could easily be more that 10x, the easiest example is a huge file with only zeros in it.

If you want to be more robust, the solutions I can think of, in order of increasing complexity:

  1. Do the decompression in a loop, with exponentially increasing buffer sizes, until a limit is reached. The limit can be a command-line argument with a reasonable default, e.g. 1GB
  2. Add a python binding for the BrotliDecompressedSize() and try to use that. Drawback is that it does not work if there are many metablocks, so you would have to fall back to solution 1)
  3. Add a python binding for the decompression function that uses two callbacks, and then you can allocate more output if it is needed.

@anthrotype
Copy link
Member Author

thanks for your comments and suggestions. I'll look into it and try to fix it.
Btw, in the current Python bindings, the default decompression buffer size is even less that 10x.

@anthrotype
Copy link
Member Author

I cherry-picked two commits from #32 since they relate to the Python binding build process.
Basically, the setup.py will pass the -std=c++0x flag only when the compiler is GCC, and not when it is MSVC (which cannot handle it).

Besides, the latter gives a warning if the flag /EHsc is not enabled and C++ style exceptions are used. See https://msdn.microsoft.com/en-us/library/2axwkyt4.aspx

@anthrotype
Copy link
Member Author

@szabadka I added a binding for the BrotliDecompressedSize function (I called it "get_decompressed_size"), as you have suggested.
I also modified the bro.py script so that it first tries to calculate the decompressed size, and if it can't find one, then it does the decompression in a loop with increasing buffer size, until a limit of 1GB is reached (but it can be extended via command line argument).
However, I am not able to test a compressed file that would make the BrotliDecompressedSize function to fail: the size of each text document included is calculated successfully.
But I'm sure you can push it to the limit.
Let me know what you think.
Have a nice weekend,

Cosimo.

@khaledhosny
Copy link
Contributor

I’d rather prefer if the Python module was changed to use BrotliDecompress with callbacks that allocate memory as needed and make Python users need not worry about the compression buffer size. I have a semi-working code for this, but I’ll not be able to finish it for a couple of days or so.

@anthrotype
Copy link
Member Author

Yes, I also believe that'd be a better idea. I will wait for your patch then.
Thanks.

@anthrotype
Copy link
Member Author

I wonder if @khaledhosny had the chance to finish that patch to brotlimodule.cc which he referred to early on (i.e. the one using BrotliDecompress with callbacks to allocate memory as needed)?
thanks a lot,

C.

@khaledhosny
Copy link
Contributor

Sorry for the delay, open #36 for the decompress changes.

@anthrotype
Copy link
Member Author

thank you @khaledhosny!
I rebased this branch to use your new BrotliDecompress() function. However, I'm no longer able to pass all the tests.
It's quite strange because if I run the same command more than once, sometimes it passes, other times it raises BrotliDecompress failed. For example, using this test file:

./bro.py -f -d -i x.compressed -o x.uncompressed

I'll have closer look tomorrow, and see if I can better debug what's happening here.
Thanks again.

C.

@anthrotype
Copy link
Member Author

@khaledhosny
I think I found the culprit of the issue.
at line 80 of brotlimodule.cc, the output_callback function is missing a return statement.
If I add that, then all the python tests run fine.
See 4a927bc.

@anthrotype
Copy link
Member Author

I apologise for the rebase mess, I'm gonna close this pull request and open a new, cleaner one.

@khaledhosny
Copy link
Contributor

@anthrotype Thanks, I fixed the missing return statement in my PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants