RFC: remove compression library dependencies #3168
Replies: 8 comments 2 replies
-
Thanks for asking this question. I'll take some time to reply thoughtfully. |
Beta Was this translation helpful? Give feedback.
-
Also please see related upstream issues https://gitlab.gnome.org/GNOME/libxml2/-/issues/709 and https://gitlab.gnome.org/GNOME/libxml2/-/issues/708 |
Beta Was this translation helpful? Give feedback.
-
Thank you |
Beta Was this translation helpful? Give feedback.
-
@gastonmorixe Thanks for your patience. libxml2's tarballs have been With respect to removing compression code from the library, https://gitlab.gnome.org/GNOME/libxml2/-/issues/709 indicates where upstream is headed. While many distros may continue to ship with compression enabled (and this project can't control that) we can control what options we compile the vendored libxml2/libxslt with, and I'll explore whether we can do that without breaking Nokogiri functionality. Action items for me
|
Beta Was this translation helpful? Give feedback.
-
OK, I've done some exploration of removing the compression libraries from the libxml2 build and adding this test: commit 22dd128b
Author: Mike Dalessio <[email protected]>
Date: 2024-04-07 13:30:12 -0400
add test for how we handle compressed files
this is insufficient coverage, there are other methods that likely
support decompression. this test is only intended to demonstrate the
problem with removing zlib/lzma from the libxml2 build.
diff --git a/test/files/staff.xml.gz b/test/files/staff.xml.gz
new file mode 100644
index 00000000..41404cce
Binary files /dev/null and b/test/files/staff.xml.gz differ
diff --git a/test/xml/sax/test_parser.rb b/test/xml/sax/test_parser.rb
index 8551a346..5d9aa343 100644
--- a/test/xml/sax/test_parser.rb
+++ b/test/xml/sax/test_parser.rb
@@ -274,6 +274,14 @@ def call_parse_io_with_encoding(encoding)
end
end
+ it "parses a compressed file" do
+ filename = XML_FILE + ".gz"
+ parser.parse_file(filename)
+
+ refute_nil(parser.document.start_elements)
+ assert_operator(parser.document.start_elements.count, :>, 30)
+ end
+
it :test_render_parse_nil_param do
assert_raises(TypeError) { parser.parse_memory(nil) }
end This test passes on CRuby This test demonstrates that there are a handful of "file open" methods that will decompress on the fly if the file is compressed and libxml2 supports it. This is undocumented behavior, and we have no test coverage for it. But I'm hesitant to rip it out without some careful thought given how widely used Nokogiri is. Going to think about this for a bit, but I think our options are:
In an ideal world, I'd like to do 3, but that seems like a lot of work and there is more urgent work I'd prefer to do in Nokogiri. Like I said, I'm going to think about this. |
Beta Was this translation helpful? Give feedback.
-
@flavorjones appreciate your prompt response on the matter. My personal philosophy for future proof safety would be to rely as less as possible on any 3rd party lib, specially ones that need or handle binary stuff. That's why we use ruby, not only because it's beautiful but because it's sandboxed. I understand the performance implications of this but I guess most people would be ok with trading some performance for the sake of sandboxing and security by not including or relying on binary stuff we don't control and risk things we don't even know. That being said I am not an expert on nokogiri, just found an issue at an old project that was relying on an outdated version and after removing xz from my mac it was crashing the server because the lack of xz. I upgraded and it worked but I am still confused whether xz is precompiled or not in the gem, I couldn't see anything in the latest version I downloaded but was worth asking. As you are the maintainer I trust your expertise and direction on the matter. It's good we are at least thinking about this, since we got lucky this time. Thank you for your time |
Beta Was this translation helpful? Give feedback.
-
For reference I have analyzed a little more the latest 1.16.3-arm64 precompiled gem, which indeed liblzma seems present but just linked. That being said, are you including zlib binary instead? @flavorjones Thanks! ❯ otool -L /Users/gastonmorixe/Downloads/nokogiri-1.16.3-arm64-darwin/data/lib/nokogiri/3.3/nokogiri.bundle
/Users/gastonmorixe/Downloads/nokogiri-1.16.3-arm64-darwin/data/lib/nokogiri/3.3/nokogiri.bundle:
/usr/lib/liblzma.5.dylib (compatibility version 6.0.0, current version 6.3.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1292.60.1)
~
❯ binwalk /Users/gastonmorixe/Downloads/nokogiri-1.16.3-arm64-darwin/data/lib/nokogiri/3.3/nokogiri.bundle
DECIMAL HEXADECIMAL DESCRIPTION
--------------------------------------------------------------------------------
1616 0x650 Unix path: /usr/lib/liblzma.5.dylib
2266112 0x229400 CRC32 polynomial table, little endian
2293901 0x23008D Copyright string: "Copyright 1995-2023 Jean-loup Gailly and Mark Adler "
2296331 0x230A0B Copyright string: "Copyright 1995-2023 Mark Adler "
2343769 0x23C359 mcrypt 2.2 encrypted data, algorithm: DES, mode: CBC, keymode: 4bit
2468655 0x25AB2F mcrypt 2.2 encrypted data, algorithm: blowfish-448, mode: CBC, keymode: 8bit
2545596 0x26D7BC Unix path: /home/flavorjones/code/oss/nokogiri/ports/arm64-darwin/libxml2/2.12.6/etc/xml/catalog
2556569 0x270299 Copyright string: "copyright sign, U+00A9 ISOnum"
2568746 0x27322A HTML document header
2571189 0x273BB5 XML document, version: ""1."
3198523 0x30CE3B Unix path: /home/flavorjones/code/oss/nokogiri/tmp/arm64-darwin/nokogiri/3.3.0/../../../../ext/nokogiri/
~
❯ strings /Users/gastonmorixe/Downloads/nokogiri-1.16.3-arm64-darwin/data/lib/nokogiri/3.3/nokogiri.bundle | grep liblzma
~
❯ strings /Users/gastonmorixe/Downloads/nokogiri-1.16.3-arm64-darwin/data/lib/nokogiri/3.3/nokogiri.bundle | grep lzma
~
❯ strings /Users/gastonmorixe/Downloads/nokogiri-1.16.3-arm64-darwin/data/lib/nokogiri/3.2/nokogiri.bundle | grep lzma
~
❯ strings /Users/gastonmorixe/Downloads/nokogiri-1.16.3-arm64-darwin/data/lib/nokogiri/3.1/nokogiri.bundle | grep lzma
~
❯ strings /Users/gastonmorixe/Downloads/nokogiri-1.16.3-arm64-darwin/data/lib/nokogiri/3.0/nokogiri.bundle | grep lzma
~
❯ strings /Users/gastonmorixe/Downloads/nokogiri-1.16.3-arm64-darwin/data/lib/nokogiri/3.0/nokogiri.bundle | grep zlib
zlib:1.3,libiconv:1.17,libgumbo:1.0.0-nokogiri
Error flushing zlib buffers. Error code
~
❯ strings /Users/gastonmorixe/Downloads/nokogiri-1.16.3-arm64-darwin/data/lib/nokogiri/3.3/nokogiri.bundle | grep zlib
zlib:1.3,libiconv:1.17,libgumbo:1.0.0-nokogiri
Error flushing zlib buffers. Error code
~
❯ strings /Users/gastonmorixe/Downloads/nokogiri-1.16.3-arm64-darwin/data/lib/nokogiri/3.3/nokogiri.bundle | grep lib
0001-Remove-script-macro-support.patch 0002-Update-entities-to-remove-handling-of-ssi.patch 0003-libxml2.la-is-in-top_builddir.patch 0009-allow-wildcard-namespaces.patch 0010-update-config.guess-and-config.sub-for-libxml2.patch 0011-rip-out-libxml2-s-libc_single_threaded-support.patch
0001-update-config.guess-and-config.sub-for-libxslt.patch
zlib:1.3,libiconv:1.17,libgumbo:1.0.0-nokogiri
libxml2: Failed to allocate globals for thread
libxml2: See xmlCheckThreadLocalStorage
Fatal: program compiled against libxml %d using libxml %d
Warning: program compiled against libxml %d using older %d
Error flushing zlib buffers. Error code
file:///home/flavorjones/code/oss/nokogiri/ports/arm64-darwin/libxml2/2.12.6/etc/xml/catalog
Relax-NG types library '%s' already registered
adding types library
Relax-NG types library failed to register '%s'
Use of unregistered type library '%s'
Internal error with type library '%s': no 'have'
Error type '%s' is not exported by type library '%s'
Type library '%s' does not allow type parameters
fake node libxslt
fake node libxslt
xsltExtensionInstructionResultFinalize is unsupported in this release of libxslt.
libxslt:test element test worked
libxslt (SAXON 6.2 compatible)
libxslt |
Beta Was this translation helpful? Give feedback.
-
Dropping a note here to say that libxml2 v2.13.0 turned off support for zlib, lblzma, and HTTP by default. This seems like the right thing to do, but will be a breaking change for some Nokogiri users. I think what I'd like to do is keep these features on for all future Nokogiri v1.x releases, but turn it off in an upcoming v2.0 release. I'd love feedback on this proposal. |
Beta Was this translation helpful? Give feedback.
-
I see liblzma / xz is referenced in a few places and that the latest versions including "pre-build" dependencies (though I haven't seen them including liblzma/xz if I checked correctly).
Out of extra precaution, is it really necessary? asking not only for xz but - if we learn something from what happened (CVE-2024-3094) - for other dependencies as well to minimize future supply chain risks.
Thank you
https://nokogiri.org/tutorials/installing_nokogiri.html?h=liblzma
#1694 (comment)
nokogiri/ext/nokogiri/extconf.rb
Line 878 in da098df
nokogiri/ext/nokogiri/extconf.rb
Line 878 in da098df
nokogiri/rakelib/extensions.rake
Lines 297 to 302 in da098df
Beta Was this translation helpful? Give feedback.
All reactions