HTML Cleaner allows crafted scripts in special contexts like svg or math to pass through

Impact

The HTML Parser in lxml does not properly handle context-switching for special HTML tags such as <svg>, <math> and <noscript>. This behavior deviates from how web browsers parse and interpret such tags. Specifically, content in CSS comments is ignored by lxml_html_clean but may be interpreted differently by web browsers, enabling malicious scripts to bypass the cleaning process. This vulnerability could lead to Cross-Site Scripting (XSS) attacks, compromising the security of users relying on lxml_html_clean in default configuration for sanitizing untrusted HTML content.

Patches

Users employing the HTML cleaner in a security-sensitive context should upgrade to lxml 0.4.0, which addresses this issue.

Workarounds

As a temporary mitigation, users can configure lxml_html_clean with the following settings to prevent the exploitation of this vulnerability:

remove_tags: Specify tags to remove - their content is moved to their parents' tags.
kill_tags: Specify tags to be removed completely.
allow_tags: Restrict the set of permissible tags, excluding context-switching tags like <svg>, <math> and <noscript>.

References

frenzymadness published to fedora-python/lxml_html_clean Nov 19, 2024

Published to the GitHub Advisory Database Nov 19, 2024

Reviewed Nov 19, 2024

Published by the National Vulnerability Database Nov 19, 2024

Last updated Jan 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Package

Affected versions

Patched versions

Description

Impact

Patches

Workarounds

References

References

Severity

CVSS overall score

CVSS v3 base metrics

CVSS v3 base metrics

EPSS score

Exploit Prediction Scoring System (EPSS)

Weaknesses

CVE ID

GHSA ID

Source code

Credits