Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set safe defaults for parser settings #177

Merged
merged 2 commits into from
Mar 4, 2021
Merged

Set safe defaults for parser settings #177

merged 2 commits into from
Mar 4, 2021

Conversation

fgoepel
Copy link
Contributor

@fgoepel fgoepel commented Nov 16, 2017

Depends on pull request #176, fixes issues: #135, #17.

While touching this code I couldn't resist fixing the shamefully insecure defaults.
The library should be safe by default and potentially unsafe features should be explicitly enabled by the user if needed.

Initially I had made it a val (because the ThreadLocal is inherently lazy) but Scala.js didn't like that, so I changed it into a lazy val which it can eliminate.

@SethTisue
Copy link
Member

@shado23 perhaps now that scala-xml 1.1.0 is out the door, we could return to this. would you mind rebasing it against current master...?

@fgoepel
Copy link
Contributor Author

fgoepel commented Feb 21, 2018

@SethTisue Sure thing. It's done.

@ashawley
Copy link
Member

ashawley commented Feb 22, 2018

Depends on pull request #176

Does it need to? I believe that breaks binary compatibility. The continuous integration build fails for this reason. Can you make the security fix without it? We can entertain #176 once we can break binary compatibility.

@fgoepel
Copy link
Contributor Author

fgoepel commented Feb 22, 2018

@ashawley No, it doesn't need to. That was just out of convenience because they both touch the same code. I've changed it.

@SethTisue
Copy link
Member

SethTisue commented Feb 22, 2018

@ashawley ashawley added this to the 1.1.1 milestone Feb 22, 2018
@ashawley
Copy link
Member

ashawley commented Feb 22, 2018

Is it possible to add a test or two to validate it is indeed working to reject behaviors that these settings enabled?

@fgoepel
Copy link
Contributor Author

fgoepel commented Feb 22, 2018

Is it possible to add a test or two to validate it is indeed working to reject behaviors that these settings enabled?

I'm not sure. The ones that cause automatic HTTP calls would probably require setting up an HTTP server and verifying that you don't receive a call, and the ones that can cause denial of service level resource usage might still be too costly to run as a regular part of the test suite.

I'm also currently quite busy at work, so I don't really have to time to research this too deeply right now.

parser.setFeature("http://xml.org/sax/features/resolve-dtd-uris", false)
parser.setXIncludeAware(false)
parser.setNamespaceAware(false)
parser.setNamespaceAware(false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicate line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good catch.

The library should be safe by default and potentially unsafe features should be explicitly enabled by the user if needed.
@SethTisue
Copy link
Member

I asked a security expert I know about this PR and he said:

The proposed defaults are secure for XXE and XEE. It makes sense to apply them
Thats what most XML parsers in the JDK default to as well (now)


parser.setFeature("http://javax.xml.XMLConstants/feature/secure-processing", true)
parser.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false)
parser.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how many people use this library to parse HTML, but I'm sure there are some. Either way having a DOCTYPE in your XML will fail to load:

    val html =
      """<!DOCTYPE html>
        |<html lang="en">
        |</html>
        |""".stripMargin
    XML.loadString(html)

The result is: org.xml.sax.SAXParseException: DOCTYPE is disallowed when the feature "http://apache.org/xml/features/disallow-doctype-decl" set to true.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worth noting that a lowercase doctype, which is very common, currently fails already before changing these settings:

    val html =
      """<!doctype html>
        |<html lang="en">
        |</html>
        |""".stripMargin
    XML.loadString(html)

That's because it's malformed, org.xml.sax.SAXParseException: The markup in the document preceding the root element must be well-formed.

Copy link
Member

@SethTisue SethTisue Feb 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how many people use this library to parse HTML

I'm comfortable with (not in a 1.1.1 release, but in a 1.2 or a 2.0) requiring those users to explicitly override the default. We might double check if there is suitable documentation in appropriate places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... that is unfortunate, but it's insecure:
https://www.owasp.org/index.php/XML_External_Entity_(XXE)_Prevention_Cheat_Sheet#General_Guidance

If we leave that enabled we leave ourselves open to DOS attacks. The cleanest solution would probably be to document this and have people that need it provide their own parser instance. You just have to do XML.withSAXParser(myParser).load(html) instead, so it shouldn't be a huge burden.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment (though it/s not a Scaladoc comment) says /* Override this to use a different SAXParser. */, is that not the right recommendation to make...?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still unsafe by default

we definitely want to make things safe by default, absolutely.

the question is rather how best to support people who want to use the safe settings as a starting point but then tweak them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we definitely want to make things safe by default, absolutely.

Ah, sorry. I misunderstood you.

So the idea is some variation of:

def parser(f: SAXParserFactory => SAXParserFactory = identity)

... where the factory has safe defaults already applied to allow for user customization?
That makes a lot of sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, something like that would be awesome.

Copy link
Member

@eed3si9n eed3si9n Feb 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a tradeoff of breaking runtime behavior and securing by default here. It is a security issue, but we should also be mindful that not everyone who deals with XML deals with XML documents of untrusted origin. So this is more nuanced case-by-case tradeoff.

Given that this issue has been around, and if informed users have been disabling external DTD loading features manually, then the rest are casual/not-so-informed users by process of elimination. I am not too sure if they will read the release notes whether it comes out in version 1.1.x or 1.2.x.

A potential way of dealing with this is to use static typing. As in we would deprecate any methods that are current unsafe during 1.1.x and provide safer one as the alternative. In 1.2.x you can remove the current unsafe methods.

See also #17 where this was discussed.

Copy link
Member

@ashawley ashawley Feb 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for weighing in on this, Eugene.

I put a general comment on this discussion, yesterday, at the top-level (outside of the patch review), see below #177 (comment)

@ashawley
Copy link
Member

ashawley commented Feb 22, 2018

The proposed defaults are secure for XXE and XEE. It makes sense to apply them

The "proposed defaults" in this patch?

Thats what most XML parsers in the JDK default to as well (now)

And javax.xml.parsers.SAXParser is not in the set of "most XML parsers in the JDK"?

@SethTisue
Copy link
Member

SethTisue commented Feb 22, 2018

The "proposed defaults" in this patch

yes

And javax.xml.parsers.SAXParser is not in the set of "most XML parsers in the JDK"?

hmm... I suppose SAXParser is the way it is for historical/backwards-compat type reasons? just speculating

@ashawley
Copy link
Member

It seems like this is a low risk change, and one that we can and should accept it, but I have only been able to think of a few scenarios so far.

One happy accident in the way scala-xml uses the SAXParser is that: If you wrote code that depended on some specially configured SAXParser, then you had to do it with an override.

Here's an example of doing just that:

https://github.com/scala/scala-xml/wiki/XML-validation

This is what XML.withSAXParser does internally, as well.

So with this change you can do those kinds of customizations and continue doing them with the next release. But it cuts the other way as well, because these people who did customize will not benefit from the fix. That includes, for example, people who had already been using the hacks mentioned in #17! So if people have monkey patched their parser in any of these ways, upgrading won't help.

Along that same line, if we happen to decide to hold off on disabling some settings, the people who want more security will still have to disable all the settings themselves plus the new ones they want. There's no possibility for inheriting or just adding only the new disabled settings. It makes it all or nothing: Either fully-configure your SAXParser, or don't at all.

I'd say that at the very least we're helping the latter people, and for the former people they can source the original implementation as a better default.

Alternatively, we look in to refactoring XML.withSAXParser or XMLLoader as Seth suggested.

@jroper
Copy link

jroper commented Feb 26, 2018

I'm a little behind on the state of JDK's and what version/fork of Xerces they are using, but something that needs to be considered is what happens if the SAX implementation in the JDK doesn't support the options we are passing? As I understand it, an exception will be thrown. And while everything actually uses Xerces, there are different versions (and possibly forks) of Xerces being used that use different namespaces for the different options. We used to explicitly depend on a version of Xerces published to Maven in Play for this reason.

@jroper
Copy link

jroper commented Feb 26, 2018

Also, perhaps a good time to raise this, this CVE was made public late last year:

https://www.cvedetails.com/cve/CVE-2012-0881/

Note that the vulnerability was originally reported in 2012, but it hasn't been made public until now. There is also an equivalent vulnerability for Xerces in C++. The vulnerability hasn't been fixed. Why? I guess no one could be bothered, and no one believes its important enough to fix it.

The vulnerability is scant on details, but it says that it exploits hash collisions. I guess that by selecting elements/attributes with names whose java.lang.String.hashCode collides, you can craft a document that requires O(n2) to parse, and then subsequent operations that are usually constant, such as getting an element by tag name, may end up taking linear time. It doesn't seem like a huge problem to me - the additional CPU you could use would be large but probably not crippling, depending on the maximum size of the documents that you accept. Also fixing it would be a pain, you'd need to use a randomised hash function per parse, instead of just relying on HashMap with String's hash code. And the reality is, this isn't limited to XML, the vulnerability affects JSON, application/www-form-urlencoded forms, anything with key values, and in spite of its wide spread surface area, I've never heard of anyone exploiting it in the wild.

Anyway, it is something to be aware of.

@ashawley
Copy link
Member

ashawley commented Apr 6, 2018

It seems work to fix CVE-2012-0881 in Xerces-J just wasn't broadcast widely, see:

https://issues.apache.org/jira/projects/XERCESJ/issues/XERCESJ-1685

Patches have been available for this issue since 2012. See: http://svn.apache.org/viewvc?view=revision&revision=1357381 for the fix.

@ashawley
Copy link
Member

ashawley commented Apr 6, 2018

Java 6 was released with Xerces 2.7.1, and each Java release was slowly updated with Xerces fixes, until version 2.11 was merged in Java 9: https://bugs.openjdk.java.net/browse/JDK-8044086

@ashawley ashawley mentioned this pull request Jun 26, 2018
@ashawley ashawley modified the milestones: 1.1.1, 1.2.0 Jun 27, 2018
@ashawley ashawley modified the milestones: 1.2.0, 2.0 Apr 4, 2019
@SethTisue
Copy link
Member

is anyone interested in pushing this forward for 2.0?

@fgoepel
Copy link
Contributor Author

fgoepel commented Dec 6, 2020

Sure, what's missing to get this merged?

@SethTisue
Copy link
Member

The major thing that's definitely missing is documentation. Currently the PR simply changes the defaults. We'll need good text to put in the release notes, since this is a breaking change.

Another thing that may be missing here is support for overriding the defaults. As I wrote:

the question is rather how best to support people who want to use the safe settings as a starting point but then tweak them

and as @eed3si9n wrote:

There's a tradeoff of breaking runtime behavior and securing by default here. It is a security issue, but we should also be mindful that not everyone who deals with XML deals with XML documents of untrusted origin.

when I say "support" for overriding the defaults, I'm not sure if that means code changes. Perhaps it's already possible in a reasonably convenient way, and how you do it simply needs to be documented?

@eed3si9n
Copy link
Member

eed3si9n commented Dec 6, 2020

Here's what I've been doing with scalaxb since 2010 - eed3si9n/scalaxb@5933fc5

  object CustomXML extends XMLLoader[Elem] {
    override def parser: SAXParser = {
      val factory = javax.xml.parsers.SAXParserFactory.newInstance()
      factory.setFeature("http://xml.org/sax/features/validation", false)
      factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", false)
      factory.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false)
      factory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false)
      factory.newSAXParser()
    }
  }
  ....
  val elem = CustomXML.load(reader)

You can pass whatever factory.setFeature(...) like this based on your needs.

@SethTisue
Copy link
Member

anyone want to try to get this across — or at least closer to — the finish line?

@fgoepel
Copy link
Contributor Author

fgoepel commented Feb 24, 2021

Sorry for the delay I was quite busy, and, to be honest, still confused on what's actually missing apart from a short blurb for release notes. If that's all it is, I've added a draft section to the end of this comment that you're welcome to use or adapt as required.

I've though about better ways to facilitate customization and while it may be possible to come up with schemes that might be slightly more convenient in some ways, they would probably require reworking the structure of the library more than would be desirable and adversely affect backwards compatibility.

So I think @eed3si9n is right on the money, that this seems to be the intended way to customize the parser (and what I've done myself).

The code has an explicit comment to this effect:

trait XMLLoader[T <: Node] {

  /* Override this to use a different SAXParser. */
  def parser: SAXParser = ...

Another option might be to use the scala.xml.XML.withSAXParser method, but it's essentially the same thing.

So I think it's best to just document this and move on. I'm still not really clear how and where you want this to be documented.
If it's supposed to be part of the commit please let me know, if a blurb for the release notes is all you need, how about something like the following:

Safe parser defaults

Internally Scala XML makes use of the JDK SAXParser library, which by default enables support for a number of standard XML features that can cause security issues when exposed to untrusted input, allowing for denial of service attacks through resource exhaustion, exfiltration and exposure of local files or network resources, among other issues.

To be more robust against these attacks out-of-the-box, the default parser settings have now been changed to a more restricted and safer subset, disabling potentially exploitable features such as external/remote schemas, doctype and entities as well as XIncludes and namespaces.

Should you require support for any of these features and are confident your application is prevented from processing untrusted XML input, or have another need to customize these settings, you can create and use a custom XMLLoader instance like this:

import scala.xml.Elem
import scala.xml.factory.XMLLoader
import javax.xml.parsers.{SAXParser, SAXParserFactory}

object CustomXML extends XMLLoader[Elem] {
  override def parser: SAXParser = {
    val factory = SAXParserFactory.newInstance()
    factory.setFeature("http://xml.org/sax/features/validation", false)
    factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", false)
    factory.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false)
    factory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false)
    factory.newSAXParser()
  }
}
...
val elem = CustomXML.load(reader)

or this

import scala.xml.{XML, Elem}
import scala.xml.factory.XMLLoader
import javax.xml.parsers.{SAXParser, SAXParserFactory}

val customXML: XMLLoader[Elem] = XML.withSAXParser {
  val factory = SAXParserFactory.newInstance()
  factory.setFeature("http://xml.org/sax/features/validation", false)
  factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", false)
  factory.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false)
  factory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false)
  factory.newSAXParser()
}
...
val elem = customXML.load(reader)

@SethTisue SethTisue self-assigned this Mar 3, 2021
@SethTisue
Copy link
Member

SethTisue commented Mar 3, 2021

So I think it's best to just document this and move on

Unless @ashawley disagrees, I'm happy to use the documentation you've provided (thank you!) and merge both PRs in time for 2.0.0-RC1 (#432)

@ashawley
Copy link
Member

ashawley commented Mar 3, 2021

I recall the concerns are minimized now that Java has patched a lot of the underlying security problems. This fix raises a concern of breaking parsing for users who might need those features. However, this route is the simplest fix, so if there are issues it's easy to rollback and make a bug-fix release.

@SethTisue SethTisue merged commit 97fabfb into scala:master Mar 4, 2021
@SethTisue SethTisue mentioned this pull request Mar 4, 2021
@JLLeitschuh
Copy link

Since this is a vulnerability, should this have a CVE assigned to it?

@SethTisue
Copy link
Member

SethTisue commented Mar 4, 2021

I'm not an expert on CVEs, but offhand I wouldn't think so, since:

  • any security issues are coming directly from the underlying SAXParser and not from the code in this repo
  • those security issues have been widely known for years
  • scala-xml can be configured securely, it just wasn't the default until now

@JLLeitschuh
Copy link

Benji @benjifin, Leeya @leeyashalti (from Snyk),

Can you weigh in here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants