XML: modernize API when available & workaround issues with legacy versions#15899
Merged
straight-shoota merged 6 commits intocrystal-lang:masterfrom Jun 18, 2025
Merged
Conversation
Use the new per-context error handlers when available (libxml 2.13 and newer) that now report errors properly (and enforces NOERROR). Tries to avoid thread safety issues in libxml 2.9 to 2.12 by resetting the default error handlers while the current fiber may swapcontext, for example while doing IO, and setting them back afterwards. This will at least avoid leaking error handlers in the thread locals while we switched to another fiber that may mess with these (oops), and makes sure to restore them properly, even if the fiber is resumed to another thread (no lost references). Deprecates the public API for generic and structured error collect handlers. They're unsafe and musn't be used!
Use the new per-context indentation configuration from libxml 2.14 when availmable, otherwise tries to avoid concurrency and parallelism issues by saving the globals before a potential fiber swapcontext and restoring them afterwards.
ysbaddaden
commented
Jun 13, 2025
ysbaddaden
commented
Jun 13, 2025
| line_number = LibXML.xmlTextReaderLocatorLineNumber(locator) | ||
| raise Error.new(msg_str, line_number) | ||
| end | ||
| LibXML.xmlTextReaderSetStructuredErrorHandler(@reader, ->Error.structured_callback, Box.box(@errors)) |
Collaborator
Author
There was a problem hiding this comment.
I notice this is a breaking change... but I don't understand why "read from string" should raise, while "read from io" should collect errors instead 🤨
ysbaddaden
commented
Jun 13, 2025
ysbaddaden
commented
Jun 13, 2025
| end | ||
|
|
||
| # :nodoc: | ||
| SAVE_MUTEX = ::Mutex.new |
Collaborator
Author
There was a problem hiding this comment.
We shouldn't need the mutex anymore and it would still not restore the xmlIndentTreeOutput global thread local when the fiber is resumed by another thread.
ysbaddaden
commented
Jun 13, 2025
Sija
reviewed
Jun 16, 2025
ysbaddaden
added a commit
to ysbaddaden/crystal
that referenced
this pull request
Jun 17, 2025
) XML::Reader: use xmlTextReaderSetStructuredErrorHandler libxml: add bindings to new bindings in 2.13 and 2.14 (improved thread safety) libxml 2.13+ never report errors when NOERROR option is set Modernize the error handling & improve thread safety Use the new per-context error handlers when available (libxml 2.13 and newer) that now report errors properly (and enforces NOERROR). Tries to avoid thread safety issues in libxml 2.9 to 2.12 by resetting the default error handlers while the current fiber may swapcontext, for example while doing IO, and setting them back afterwards. This will at least avoid leaking error handlers in the thread locals while we switched to another fiber that may mess with these (oops), and makes sure to restore them properly, even if the fiber is resumed to another thread (no lost references). Deprecates the public API for generic and structured error collect handlers. They're unsafe and musn't be used! Modernize XML generation + improve fiber/thread issues Use the new per-context indentation configuration from libxml 2.14 when availmable, otherwise tries to avoid concurrency and parallelism issues by saving the globals before a potential fiber swapcontext and restoring them afterwards. Fix: review suggestions + format
straight-shoota
approved these changes
Jun 17, 2025
straight-shoota
added a commit
to straight-shoota/crystal-book
that referenced
this pull request
Jul 14, 2025
straight-shoota
added a commit
to crystal-lang/crystal-book
that referenced
this pull request
Jul 14, 2025
1 task
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Uses the libxml per context error handlers when available.
For example we can always use it for
XML::Readersince it's available since at least libxml 2.9 (released in 2012), that becomes the default expected libxml version.Sadly the other per context error handlers only appeared in libxml 2.13 (released in 2024) so we can't assume they exist, but can still use them when compiling against this version.
This patch adds a libxml version detection using
pkg-config. We can specify theLIBXML_VERSIONenvironment variable to target another release if the runtime version will be different (though it is highly recommended to use libxml 2.13+), or for the Windows MSVC target.On older libxml releases, errors can be raised per context (e.g. xml reader) and through the structured error handler (now deprecated) and the older generic (long deprecated).
For these older releases, this patch still sets the globals (actually thread locals), be they error handlers or some xml save configuration, but saves them when the fiber might
yield, restoring the default handlers; it eventually restores the globals when the fiber is resumed, so the current thread will see the values previously configured for the fiber, whatever if another fiber did something else, or the fiber got resumed by another thread (execution context).NOTE: libxml 2.14 almost always segfaults during GC while runningfixed in #15906.spec/std/xml, while the 2.9 to 2.13 releases are fine.