-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set safe defaults for parser settings #177
Conversation
@shado23 perhaps now that scala-xml 1.1.0 is out the door, we could return to this. would you mind rebasing it against current master...? |
@SethTisue Sure thing. It's done. |
@ashawley No, it doesn't need to. That was just out of convenience because they both touch the same code. I've changed it. |
I called for reviewers at https://contributors.scala-lang.org/t/scala-xml-security-settings/1623 and https://gitter.im/scala/contributors?at=5a8eef8253c1dbb743628db2 and on some Slack channels I'm on, too |
Is it possible to add a test or two to validate it is indeed working to reject behaviors that these settings enabled? |
I'm not sure. The ones that cause automatic HTTP calls would probably require setting up an HTTP server and verifying that you don't receive a call, and the ones that can cause denial of service level resource usage might still be too costly to run as a regular part of the test suite. I'm also currently quite busy at work, so I don't really have to time to research this too deeply right now. |
parser.setFeature("http://xml.org/sax/features/resolve-dtd-uris", false) | ||
parser.setXIncludeAware(false) | ||
parser.setNamespaceAware(false) | ||
parser.setNamespaceAware(false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duplicate line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good catch.
The library should be safe by default and potentially unsafe features should be explicitly enabled by the user if needed.
I asked a security expert I know about this PR and he said:
|
|
||
parser.setFeature("http://javax.xml.XMLConstants/feature/secure-processing", true) | ||
parser.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false) | ||
parser.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how many people use this library to parse HTML, but I'm sure there are some. Either way having a DOCTYPE in your XML will fail to load:
val html =
"""<!DOCTYPE html>
|<html lang="en">
|</html>
|""".stripMargin
XML.loadString(html)
The result is: org.xml.sax.SAXParseException: DOCTYPE is disallowed when the feature "http://apache.org/xml/features/disallow-doctype-decl" set to true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's worth noting that a lowercase doctype
, which is very common, currently fails already before changing these settings:
val html =
"""<!doctype html>
|<html lang="en">
|</html>
|""".stripMargin
XML.loadString(html)
That's because it's malformed, org.xml.sax.SAXParseException: The markup in the document preceding the root element must be well-formed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how many people use this library to parse HTML
I'm comfortable with (not in a 1.1.1 release, but in a 1.2 or a 2.0) requiring those users to explicitly override the default. We might double check if there is suitable documentation in appropriate places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... that is unfortunate, but it's insecure:
https://www.owasp.org/index.php/XML_External_Entity_(XXE)_Prevention_Cheat_Sheet#General_Guidance
If we leave that enabled we leave ourselves open to DOS attacks. The cleanest solution would probably be to document this and have people that need it provide their own parser instance. You just have to do XML.withSAXParser(myParser).load(html) instead, so it shouldn't be a huge burden.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the comment (though it/s not a Scaladoc comment) says /* Override this to use a different SAXParser. */
, is that not the right recommendation to make...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's still unsafe by default
we definitely want to make things safe by default, absolutely.
the question is rather how best to support people who want to use the safe settings as a starting point but then tweak them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we definitely want to make things safe by default, absolutely.
Ah, sorry. I misunderstood you.
So the idea is some variation of:
def parser(f: SAXParserFactory => SAXParserFactory = identity)
... where the factory has safe defaults already applied to allow for user customization?
That makes a lot of sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, something like that would be awesome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a tradeoff of breaking runtime behavior and securing by default here. It is a security issue, but we should also be mindful that not everyone who deals with XML deals with XML documents of untrusted origin. So this is more nuanced case-by-case tradeoff.
Given that this issue has been around, and if informed users have been disabling external DTD loading features manually, then the rest are casual/not-so-informed users by process of elimination. I am not too sure if they will read the release notes whether it comes out in version 1.1.x or 1.2.x.
A potential way of dealing with this is to use static typing. As in we would deprecate any methods that are current unsafe during 1.1.x and provide safer one as the alternative. In 1.2.x you can remove the current unsafe methods.
See also #17 where this was discussed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for weighing in on this, Eugene.
I put a general comment on this discussion, yesterday, at the top-level (outside of the patch review), see below #177 (comment)
The "proposed defaults" in this patch?
And |
yes
hmm... I suppose SAXParser is the way it is for historical/backwards-compat type reasons? just speculating |
It seems like this is a low risk change, and one that we can and should accept it, but I have only been able to think of a few scenarios so far. One happy accident in the way scala-xml uses the Here's an example of doing just that: https://github.com/scala/scala-xml/wiki/XML-validation This is what So with this change you can do those kinds of customizations and continue doing them with the next release. But it cuts the other way as well, because these people who did customize will not benefit from the fix. That includes, for example, people who had already been using the hacks mentioned in #17! So if people have monkey patched their parser in any of these ways, upgrading won't help. Along that same line, if we happen to decide to hold off on disabling some settings, the people who want more security will still have to disable all the settings themselves plus the new ones they want. There's no possibility for inheriting or just adding only the new disabled settings. It makes it all or nothing: Either fully-configure your SAXParser, or don't at all. I'd say that at the very least we're helping the latter people, and for the former people they can source the original implementation as a better default. Alternatively, we look in to refactoring |
I'm a little behind on the state of JDK's and what version/fork of Xerces they are using, but something that needs to be considered is what happens if the SAX implementation in the JDK doesn't support the options we are passing? As I understand it, an exception will be thrown. And while everything actually uses Xerces, there are different versions (and possibly forks) of Xerces being used that use different namespaces for the different options. We used to explicitly depend on a version of Xerces published to Maven in Play for this reason. |
Also, perhaps a good time to raise this, this CVE was made public late last year: https://www.cvedetails.com/cve/CVE-2012-0881/ Note that the vulnerability was originally reported in 2012, but it hasn't been made public until now. There is also an equivalent vulnerability for Xerces in C++. The vulnerability hasn't been fixed. Why? I guess no one could be bothered, and no one believes its important enough to fix it. The vulnerability is scant on details, but it says that it exploits hash collisions. I guess that by selecting elements/attributes with names whose java.lang.String.hashCode collides, you can craft a document that requires O(n2) to parse, and then subsequent operations that are usually constant, such as getting an element by tag name, may end up taking linear time. It doesn't seem like a huge problem to me - the additional CPU you could use would be large but probably not crippling, depending on the maximum size of the documents that you accept. Also fixing it would be a pain, you'd need to use a randomised hash function per parse, instead of just relying on HashMap with String's hash code. And the reality is, this isn't limited to XML, the vulnerability affects JSON, application/www-form-urlencoded forms, anything with key values, and in spite of its wide spread surface area, I've never heard of anyone exploiting it in the wild. Anyway, it is something to be aware of. |
It seems work to fix CVE-2012-0881 in Xerces-J just wasn't broadcast widely, see: https://issues.apache.org/jira/projects/XERCESJ/issues/XERCESJ-1685
|
Java 6 was released with Xerces 2.7.1, and each Java release was slowly updated with Xerces fixes, until version 2.11 was merged in Java 9: https://bugs.openjdk.java.net/browse/JDK-8044086 |
is anyone interested in pushing this forward for 2.0? |
Sure, what's missing to get this merged? |
The major thing that's definitely missing is documentation. Currently the PR simply changes the defaults. We'll need good text to put in the release notes, since this is a breaking change. Another thing that may be missing here is support for overriding the defaults. As I wrote:
and as @eed3si9n wrote:
when I say "support" for overriding the defaults, I'm not sure if that means code changes. Perhaps it's already possible in a reasonably convenient way, and how you do it simply needs to be documented? |
Here's what I've been doing with scalaxb since 2010 - eed3si9n/scalaxb@5933fc5 object CustomXML extends XMLLoader[Elem] {
override def parser: SAXParser = {
val factory = javax.xml.parsers.SAXParserFactory.newInstance()
factory.setFeature("http://xml.org/sax/features/validation", false)
factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", false)
factory.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false)
factory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false)
factory.newSAXParser()
}
}
....
val elem = CustomXML.load(reader) You can pass whatever |
anyone want to try to get this across — or at least closer to — the finish line? |
Sorry for the delay I was quite busy, and, to be honest, still confused on what's actually missing apart from a short blurb for release notes. If that's all it is, I've added a draft section to the end of this comment that you're welcome to use or adapt as required. I've though about better ways to facilitate customization and while it may be possible to come up with schemes that might be slightly more convenient in some ways, they would probably require reworking the structure of the library more than would be desirable and adversely affect backwards compatibility. So I think @eed3si9n is right on the money, that this seems to be the intended way to customize the parser (and what I've done myself). The code has an explicit comment to this effect: trait XMLLoader[T <: Node] {
/* Override this to use a different SAXParser. */
def parser: SAXParser = ... Another option might be to use the scala.xml.XML.withSAXParser method, but it's essentially the same thing. So I think it's best to just document this and move on. I'm still not really clear how and where you want this to be documented. Safe parser defaultsInternally Scala XML makes use of the JDK SAXParser library, which by default enables support for a number of standard XML features that can cause security issues when exposed to untrusted input, allowing for denial of service attacks through resource exhaustion, exfiltration and exposure of local files or network resources, among other issues. To be more robust against these attacks out-of-the-box, the default parser settings have now been changed to a more restricted and safer subset, disabling potentially exploitable features such as external/remote schemas, doctype and entities as well as XIncludes and namespaces. Should you require support for any of these features and are confident your application is prevented from processing untrusted XML input, or have another need to customize these settings, you can create and use a custom XMLLoader instance like this: import scala.xml.Elem
import scala.xml.factory.XMLLoader
import javax.xml.parsers.{SAXParser, SAXParserFactory}
object CustomXML extends XMLLoader[Elem] {
override def parser: SAXParser = {
val factory = SAXParserFactory.newInstance()
factory.setFeature("http://xml.org/sax/features/validation", false)
factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", false)
factory.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false)
factory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false)
factory.newSAXParser()
}
}
...
val elem = CustomXML.load(reader) or this import scala.xml.{XML, Elem}
import scala.xml.factory.XMLLoader
import javax.xml.parsers.{SAXParser, SAXParserFactory}
val customXML: XMLLoader[Elem] = XML.withSAXParser {
val factory = SAXParserFactory.newInstance()
factory.setFeature("http://xml.org/sax/features/validation", false)
factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", false)
factory.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false)
factory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false)
factory.newSAXParser()
}
...
val elem = customXML.load(reader) |
I recall the concerns are minimized now that Java has patched a lot of the underlying security problems. This fix raises a concern of breaking parsing for users who might need those features. However, this route is the simplest fix, so if there are issues it's easy to rollback and make a bug-fix release. |
Since this is a vulnerability, should this have a CVE assigned to it? |
I'm not an expert on CVEs, but offhand I wouldn't think so, since:
|
Benji @benjifin, Leeya @leeyashalti (from Snyk), Can you weigh in here? |
Depends on pull request #176, fixes issues: #135, #17.
While touching this code I couldn't resist fixing the shamefully insecure defaults.
The library should be safe by default and potentially unsafe features should be explicitly enabled by the user if needed.
Initially I had made it a val (because the ThreadLocal is inherently lazy) but Scala.js didn't like that, so I changed it into a lazy val which it can eliminate.