Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparison to 3bz #1

Closed
Shinmera opened this issue Mar 18, 2023 · 3 comments
Closed

Comparison to 3bz #1

Shinmera opened this issue Mar 18, 2023 · 3 comments

Comments

@Shinmera
Copy link

Heya. You mention that "to your knowledge" this is the fastest deflate decompression library, but I see no reference to 3bz.

@se-mz
Copy link
Owner

se-mz commented Mar 18, 2023

Thanks, I wasn't aware of this project. I'd be interested in adding it to the comparison (it seems faster for the zlib test files), but is the version on QL/GH stale? (3bz:decompress-vector (alexandria:read-file-into-byte-vector #p"html_standard.html.gz") :format :gzip) for instance tries to allocate gigabytes upon gigabytes of buffers before eventually OOMing.

Also, what would be the idiomatic way to do file-to-file decompression for 3bz? Doing your own buffering?

@3b
Copy link

3b commented Mar 22, 2023

apparently nobody has tried to use the gzip support in 3bz for anything serious (or they didn't report bugs if they did). Should work now.

The most idiomatic way to do file-file decompression would probably be mmap if available, otherwise manually buffering input, otherwise stream input. For output, decompress into a 128k or so buffer and write as it is filled (it has to copy 32k every time output fills up, so larger helps a bit. Probably wouldn't bother with anything larger than 2^20 or so though.)

possibly something like

(defun decompress-file-to-file (source destination &key (if-exists :error))
  (flet ((decompress (in)
           (with-open-file (out destination :if-exists if-exists
                                            :direction :output
                                            :element-type '3bz::octet)
             (let* ((buf (make-array (expt 2 17) :element-type '3bz::octet))
                    (state (3bz:make-gzip-state :output-buffer buf
                                                :output-offset 0)))
               (loop until (3bz:finished state)
                     do (3bz:replace-output-buffer state buf)
                        (let ((c (3bz:decompress in state)))
                          (assert (not (3bz:input-underrun state)))
                          (if (zerop c)
                              (assert (3bz:finished state))
                              (write-sequence buf out :end c))))))))
    #+(or abcl mezzano)
    (with-open-file (in source :element-type '3bz::octet)
      (decompress (3bz:make-octet-stream-context in)))
    #-(or abcl mezzano)
    (mmap:with-mmap (pointer fd size source)
      (3bz:with-octet-pointer (pointer-binding pointer size)
        (decompress (3bz:make-octet-pointer-context pointer-binding))))))

though it should probably read the input in chunks for the non-mmap path since it knows it has a file, and so knows the size of the input.

Currently 3bz stream input is very slow, since it tries to be careful about not reading past end of input data or blocking if data isn't available yet (i think those were the issues, been a while since I thought about that stuff). I never thought of a good way to handle that efficiently for the general case, and never got around to optimizing for the cases where the stream is seekable and/or the user doesn't care about blocking.

@se-mz
Copy link
Owner

se-mz commented Apr 10, 2023

Added.

@se-mz se-mz closed this as completed Apr 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants