fix Python binding compilation under Windows #31

anthrotype · 2015-02-25T19:38:46Z

Hi,
the setup.py needs to be modified to make sure that the building of the Python extension succeeds on a Windows machine.

The symlink trick that @khaledhosny devised to make distutils search the parent directory unfortunately does not work on Win. So we need to use absolute paths to the C/C++ source files instead of symbolic links.
also, the "bro" command line utility which is used for testing the Brotli library cannot be built under Windows because it requires several POSIX-only headers. Besides, in order to run the included shell scripts one also needs the bash shell and several other UNIX tools, which are not present on Windows by default. To be able to test the Python extension, I made a quick Python script which mimics the command line utility, it's called "bro.py". I also made equivalent test scripts ("roundtrip_test.py" and "compatibility_test.py") which do not require the presence of bash, cat or diff.

I tested the scripts on Windows, Cygwin and OS X. You can use them to test that the Python extension works on non-UNIX systems.

All best,

Cosimo

szabadka · 2015-02-26T09:23:21Z

python/tests/bro.py

+
+    try:
+        if options.decompress:
+            bufsize = options.bufsize or 10*len(data)


The default here seems a bit fragile. The compression ratio could easily be more that 10x, the easiest example is a huge file with only zeros in it.

If you want to be more robust, the solutions I can think of, in order of increasing complexity:

Do the decompression in a loop, with exponentially increasing buffer sizes, until a limit is reached. The limit can be a command-line argument with a reasonable default, e.g. 1GB

Add a python binding for the BrotliDecompressedSize() and try to use that. Drawback is that it does not work if there are many metablocks, so you would have to fall back to solution 1)

Add a python binding for the decompression function that uses two callbacks, and then you can allocate more output if it is needed.

anthrotype · 2015-02-26T09:42:45Z

thanks for your comments and suggestions. I'll look into it and try to fix it.
Btw, in the current Python bindings, the default decompression buffer size is even less that 10x.

anthrotype · 2015-02-27T15:28:30Z

I cherry-picked two commits from #32 since they relate to the Python binding build process.
Basically, the setup.py will pass the -std=c++0x flag only when the compiler is GCC, and not when it is MSVC (which cannot handle it).

Besides, the latter gives a warning if the flag /EHsc is not enabled and C++ style exceptions are used. See https://msdn.microsoft.com/en-us/library/2axwkyt4.aspx

anthrotype · 2015-02-27T18:55:39Z

@szabadka I added a binding for the BrotliDecompressedSize function (I called it "get_decompressed_size"), as you have suggested.
I also modified the bro.py script so that it first tries to calculate the decompressed size, and if it can't find one, then it does the decompression in a loop with increasing buffer size, until a limit of 1GB is reached (but it can be extended via command line argument).
However, I am not able to test a compressed file that would make the BrotliDecompressedSize function to fail: the size of each text document included is calculated successfully.
But I'm sure you can push it to the limit.
Let me know what you think.
Have a nice weekend,

Cosimo.

khaledhosny · 2015-02-27T21:02:01Z

I’d rather prefer if the Python module was changed to use BrotliDecompress with callbacks that allocate memory as needed and make Python users need not worry about the compression buffer size. I have a semi-working code for this, but I’ll not be able to finish it for a couple of days or so.

anthrotype · 2015-02-27T21:12:41Z

Yes, I also believe that'd be a better idea. I will wait for your patch then.
Thanks.

anthrotype · 2015-03-13T15:21:13Z

I wonder if @khaledhosny had the chance to finish that patch to brotlimodule.cc which he referred to early on (i.e. the one using BrotliDecompress with callbacks to allocate memory as needed)?
thanks a lot,

C.

khaledhosny · 2015-03-13T22:07:24Z

Sorry for the delay, open #36 for the decompress changes.

anthrotype · 2015-03-15T20:40:26Z

thank you @khaledhosny!
I rebased this branch to use your new BrotliDecompress() function. However, I'm no longer able to pass all the tests.
It's quite strange because if I run the same command more than once, sometimes it passes, other times it raises BrotliDecompress failed. For example, using this test file:

./bro.py -f -d -i x.compressed -o x.uncompressed

I'll have closer look tomorrow, and see if I can better debug what's happening here.
Thanks again.

C.

anthrotype · 2015-03-16T13:51:49Z

@khaledhosny
I think I found the culprit of the issue.
at line 80 of brotlimodule.cc, the output_callback function is missing a return statement.
If I add that, then all the python tests run fine.
See 4a927bc.

- Don't read the whole input to memory. - Support reading from stdin and writing to stdout.

Use memmove() for copying overlapping buffers.

This also fixes two "comparison between signed and unsigned" warnings.

… on Win)

…ing C4530

So that we can use a callback to dynamically allocate the decompression buffer, getting rid of the optional bufsize argument to decompress.decompress().

anthrotype · 2015-03-16T16:25:41Z

I apologise for the rebase mess, I'm gonna close this pull request and open a new, cleaner one.

khaledhosny · 2015-03-16T19:16:01Z

@anthrotype Thanks, I fixed the missing return statement in my PR.

szabadka reviewed Feb 26, 2015
View reviewed changes

anthrotype force-pushed the win_build_6 branch from 1752dd4 to fad6a92 Compare February 27, 2015 18:47

anthrotype force-pushed the win_build_6 branch from f61ba32 to 0ff8aae Compare March 15, 2015 20:19

anthrotype force-pushed the win_build_6 branch from 4a927bc to 56fdd1c Compare March 16, 2015 16:21

szabadka and others added 15 commits March 16, 2015 16:22

Add command-line tool and tests.

fec561a

Improvements to the command-line tool.

9af2e7f

- Don't read the whole input to memory. - Support reading from stdin and writing to stdout.

Add .gitignore file

a60406e

Remove unneeded malloc.h header.

1cfc9d0

Fix undefined behavior in decoder.

f5addf3

Use memmove() for copying overlapping buffers.

Add more test cases.

03300b3

Add Python bindings

14d90b6

Compile decoder with -Wall flag

60dd20f

This also fixes two "comparison between signed and unsigned" warnings.

add python's build and dist directories to .gitignore

819f151

[python/setup.py] use relative paths instead of symlinks (unsupported…

a2b67d4

… on Win)

add python scripts for testing Brotli on non-UNIX environments

fda3bac

[setup.py] use "-std=c++0x" only with GCC compiler (usupported on MSVC)

4dcbe1c

[setup.py] enable C++ exception handling on MSVC compiler to fix warn…

f9dcf46

…ing C4530

[python] Use BrotliDecompress()

ddba612

So that we can use a callback to dynamically allocate the decompression buffer, getting rid of the optional bufsize argument to decompress.decompress().

[brotlimodule.cc] add missing return to output_callback function

8e4da4c

anthrotype force-pushed the win_build_6 branch from 56fdd1c to 8e4da4c Compare March 16, 2015 16:23

anthrotype closed this Mar 16, 2015

anthrotype mentioned this pull request Mar 16, 2015

[python] setup.py fixes for Windows #37

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix Python binding compilation under Windows #31

fix Python binding compilation under Windows #31

anthrotype commented Feb 25, 2015

szabadka Feb 26, 2015

anthrotype commented Feb 26, 2015

anthrotype commented Feb 27, 2015

anthrotype commented Feb 27, 2015

khaledhosny commented Feb 27, 2015

anthrotype commented Feb 27, 2015

anthrotype commented Mar 13, 2015

khaledhosny commented Mar 13, 2015

anthrotype commented Mar 15, 2015

anthrotype commented Mar 16, 2015

anthrotype commented Mar 16, 2015

khaledhosny commented Mar 16, 2015

fix Python binding compilation under Windows #31

fix Python binding compilation under Windows #31

Conversation

anthrotype commented Feb 25, 2015

szabadka Feb 26, 2015

Choose a reason for hiding this comment

anthrotype commented Feb 26, 2015

anthrotype commented Feb 27, 2015

anthrotype commented Feb 27, 2015

khaledhosny commented Feb 27, 2015

anthrotype commented Feb 27, 2015

anthrotype commented Mar 13, 2015

khaledhosny commented Mar 13, 2015

anthrotype commented Mar 15, 2015

anthrotype commented Mar 16, 2015

anthrotype commented Mar 16, 2015

khaledhosny commented Mar 16, 2015