Set safe defaults for parser settings #177

fgoepel · 2017-11-16T18:26:00Z

Depends on pull request #176, fixes issues: #135, #17.

While touching this code I couldn't resist fixing the shamefully insecure defaults.
The library should be safe by default and potentially unsafe features should be explicitly enabled by the user if needed.

Initially I had made it a val (because the ThreadLocal is inherently lazy) but Scala.js didn't like that, so I changed it into a lazy val which it can eliminate.

SethTisue · 2018-02-20T19:34:21Z

@shado23 perhaps now that scala-xml 1.1.0 is out the door, we could return to this. would you mind rebasing it against current master...?

fgoepel · 2018-02-21T22:27:16Z

@SethTisue Sure thing. It's done.

ashawley · 2018-02-22T11:26:55Z

Depends on pull request #176

Does it need to? I believe that breaks binary compatibility. The continuous integration build fails for this reason. Can you make the security fix without it? We can entertain #176 once we can break binary compatibility.

fgoepel · 2018-02-22T15:31:23Z

@ashawley No, it doesn't need to. That was just out of convenience because they both touch the same code. I've changed it.

SethTisue · 2018-02-22T16:29:32Z

I called for reviewers at https://contributors.scala-lang.org/t/scala-xml-security-settings/1623 and https://gitter.im/scala/contributors?at=5a8eef8253c1dbb743628db2

and on some Slack channels I'm on, too

ashawley · 2018-02-22T16:36:44Z

Is it possible to add a test or two to validate it is indeed working to reject behaviors that these settings enabled?

fgoepel · 2018-02-22T16:47:14Z

Is it possible to add a test or two to validate it is indeed working to reject behaviors that these settings enabled?

I'm not sure. The ones that cause automatic HTTP calls would probably require setting up an HTTP server and verifying that you don't receive a call, and the ones that can cause denial of service level resource usage might still be too costly to run as a regular part of the test suite.

I'm also currently quite busy at work, so I don't really have to time to research this too deeply right now.

NthPortal · 2018-02-22T16:58:08Z

shared/src/main/scala/scala/xml/factory/XMLLoader.scala

+    parser.setFeature("http://xml.org/sax/features/resolve-dtd-uris", false)
+    parser.setXIncludeAware(false)
+    parser.setNamespaceAware(false)
+    parser.setNamespaceAware(false)


duplicate line?

Yes, good catch.

The library should be safe by default and potentially unsafe features should be explicitly enabled by the user if needed.

SethTisue · 2018-02-22T17:35:00Z

I asked a security expert I know about this PR and he said:

The proposed defaults are secure for XXE and XEE. It makes sense to apply them
Thats what most XML parsers in the JDK default to as well (now)

ashawley · 2018-02-22T17:42:26Z

shared/src/main/scala/scala/xml/factory/XMLLoader.scala

+
+    parser.setFeature("http://javax.xml.XMLConstants/feature/secure-processing", true)
+    parser.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false)
+    parser.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true)


I'm not sure how many people use this library to parse HTML, but I'm sure there are some. Either way having a DOCTYPE in your XML will fail to load:

val html = """<!DOCTYPE html> |<html lang="en"> |</html> |""".stripMargin XML.loadString(html)

The result is: org.xml.sax.SAXParseException: DOCTYPE is disallowed when the feature "http://apache.org/xml/features/disallow-doctype-decl" set to true.

It's worth noting that a lowercase doctype, which is very common, currently fails already before changing these settings:

val html = """<!doctype html> |<html lang="en"> |</html> |""".stripMargin XML.loadString(html)

That's because it's malformed, org.xml.sax.SAXParseException: The markup in the document preceding the root element must be well-formed.

I'm not sure how many people use this library to parse HTML

I'm comfortable with (not in a 1.1.1 release, but in a 1.2 or a 2.0) requiring those users to explicitly override the default. We might double check if there is suitable documentation in appropriate places.

Hmm... that is unfortunate, but it's insecure:
https://www.owasp.org/index.php/XML_External_Entity_(XXE)_Prevention_Cheat_Sheet#General_Guidance

If we leave that enabled we leave ourselves open to DOS attacks. The cleanest solution would probably be to document this and have people that need it provide their own parser instance. You just have to do XML.withSAXParser(myParser).load(html) instead, so it shouldn't be a huge burden.

the comment (though it/s not a Scaladoc comment) says /* Override this to use a different SAXParser. */, is that not the right recommendation to make...?

It's still unsafe by default

we definitely want to make things safe by default, absolutely.

the question is rather how best to support people who want to use the safe settings as a starting point but then tweak them.

we definitely want to make things safe by default, absolutely.

Ah, sorry. I misunderstood you.

So the idea is some variation of:

def parser(f: SAXParserFactory => SAXParserFactory = identity)

... where the factory has safe defaults already applied to allow for user customization?
That makes a lot of sense.

Yeah, something like that would be awesome.

There's a tradeoff of breaking runtime behavior and securing by default here. It is a security issue, but we should also be mindful that not everyone who deals with XML deals with XML documents of untrusted origin. So this is more nuanced case-by-case tradeoff.

Given that this issue has been around, and if informed users have been disabling external DTD loading features manually, then the rest are casual/not-so-informed users by process of elimination. I am not too sure if they will read the release notes whether it comes out in version 1.1.x or 1.2.x.

A potential way of dealing with this is to use static typing. As in we would deprecate any methods that are current unsafe during 1.1.x and provide safer one as the alternative. In 1.2.x you can remove the current unsafe methods.

See also #17 where this was discussed.

Thanks for weighing in on this, Eugene.

I put a general comment on this discussion, yesterday, at the top-level (outside of the patch review), see below #177 (comment)

ashawley · 2018-02-22T17:45:39Z

The proposed defaults are secure for XXE and XEE. It makes sense to apply them

The "proposed defaults" in this patch?

Thats what most XML parsers in the JDK default to as well (now)

And javax.xml.parsers.SAXParser is not in the set of "most XML parsers in the JDK"?

SethTisue · 2018-02-22T17:49:17Z

The "proposed defaults" in this patch

yes

And javax.xml.parsers.SAXParser is not in the set of "most XML parsers in the JDK"?

hmm... I suppose SAXParser is the way it is for historical/backwards-compat type reasons? just speculating

ashawley · 2018-02-22T18:42:39Z

It seems like this is a low risk change, and one that we can and should accept it, but I have only been able to think of a few scenarios so far.

One happy accident in the way scala-xml uses the SAXParser is that: If you wrote code that depended on some specially configured SAXParser, then you had to do it with an override.

Here's an example of doing just that:

https://github.com/scala/scala-xml/wiki/XML-validation

This is what XML.withSAXParser does internally, as well.

So with this change you can do those kinds of customizations and continue doing them with the next release. But it cuts the other way as well, because these people who did customize will not benefit from the fix. That includes, for example, people who had already been using the hacks mentioned in #17! So if people have monkey patched their parser in any of these ways, upgrading won't help.

Along that same line, if we happen to decide to hold off on disabling some settings, the people who want more security will still have to disable all the settings themselves plus the new ones they want. There's no possibility for inheriting or just adding only the new disabled settings. It makes it all or nothing: Either fully-configure your SAXParser, or don't at all.

I'd say that at the very least we're helping the latter people, and for the former people they can source the original implementation as a better default.

Alternatively, we look in to refactoring XML.withSAXParser or XMLLoader as Seth suggested.

jroper · 2018-02-26T02:11:25Z

I'm a little behind on the state of JDK's and what version/fork of Xerces they are using, but something that needs to be considered is what happens if the SAX implementation in the JDK doesn't support the options we are passing? As I understand it, an exception will be thrown. And while everything actually uses Xerces, there are different versions (and possibly forks) of Xerces being used that use different namespaces for the different options. We used to explicitly depend on a version of Xerces published to Maven in Play for this reason.

jroper · 2018-02-26T02:20:34Z

Also, perhaps a good time to raise this, this CVE was made public late last year:

https://www.cvedetails.com/cve/CVE-2012-0881/

Note that the vulnerability was originally reported in 2012, but it hasn't been made public until now. There is also an equivalent vulnerability for Xerces in C++. The vulnerability hasn't been fixed. Why? I guess no one could be bothered, and no one believes its important enough to fix it.

The vulnerability is scant on details, but it says that it exploits hash collisions. I guess that by selecting elements/attributes with names whose java.lang.String.hashCode collides, you can craft a document that requires O(n2) to parse, and then subsequent operations that are usually constant, such as getting an element by tag name, may end up taking linear time. It doesn't seem like a huge problem to me - the additional CPU you could use would be large but probably not crippling, depending on the maximum size of the documents that you accept. Also fixing it would be a pain, you'd need to use a randomised hash function per parse, instead of just relying on HashMap with String's hash code. And the reality is, this isn't limited to XML, the vulnerability affects JSON, application/www-form-urlencoded forms, anything with key values, and in spite of its wide spread surface area, I've never heard of anyone exploiting it in the wild.

Anyway, it is something to be aware of.

ashawley · 2018-04-06T18:27:54Z

It seems work to fix CVE-2012-0881 in Xerces-J just wasn't broadcast widely, see:

https://issues.apache.org/jira/projects/XERCESJ/issues/XERCESJ-1685

Patches have been available for this issue since 2012. See: http://svn.apache.org/viewvc?view=revision&revision=1357381 for the fix.

ashawley · 2018-04-06T18:46:41Z

Java 6 was released with Xerces 2.7.1, and each Java release was slowly updated with Xerces fixes, until version 2.11 was merged in Java 9: https://bugs.openjdk.java.net/browse/JDK-8044086

SethTisue · 2020-12-05T19:55:16Z

is anyone interested in pushing this forward for 2.0?

fgoepel · 2020-12-06T15:27:41Z

Sure, what's missing to get this merged?

SethTisue · 2020-12-06T16:28:18Z

The major thing that's definitely missing is documentation. Currently the PR simply changes the defaults. We'll need good text to put in the release notes, since this is a breaking change.

Another thing that may be missing here is support for overriding the defaults. As I wrote:

the question is rather how best to support people who want to use the safe settings as a starting point but then tweak them

and as @eed3si9n wrote:

There's a tradeoff of breaking runtime behavior and securing by default here. It is a security issue, but we should also be mindful that not everyone who deals with XML deals with XML documents of untrusted origin.

when I say "support" for overriding the defaults, I'm not sure if that means code changes. Perhaps it's already possible in a reasonably convenient way, and how you do it simply needs to be documented?

eed3si9n · 2020-12-06T18:55:07Z

Here's what I've been doing with scalaxb since 2010 - eed3si9n/scalaxb@5933fc5

  object CustomXML extends XMLLoader[Elem] {
    override def parser: SAXParser = {
      val factory = javax.xml.parsers.SAXParserFactory.newInstance()
      factory.setFeature("http://xml.org/sax/features/validation", false)
      factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", false)
      factory.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false)
      factory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false)
      factory.newSAXParser()
    }
  }
  ....
  val elem = CustomXML.load(reader)

You can pass whatever factory.setFeature(...) like this based on your needs.

SethTisue · 2021-01-13T00:05:56Z

anyone want to try to get this across — or at least closer to — the finish line?

fgoepel · 2021-02-24T14:56:54Z

Sorry for the delay I was quite busy, and, to be honest, still confused on what's actually missing apart from a short blurb for release notes. If that's all it is, I've added a draft section to the end of this comment that you're welcome to use or adapt as required.

I've though about better ways to facilitate customization and while it may be possible to come up with schemes that might be slightly more convenient in some ways, they would probably require reworking the structure of the library more than would be desirable and adversely affect backwards compatibility.

So I think @eed3si9n is right on the money, that this seems to be the intended way to customize the parser (and what I've done myself).

The code has an explicit comment to this effect:

trait XMLLoader[T <: Node] {

  /* Override this to use a different SAXParser. */
  def parser: SAXParser = ...

Another option might be to use the scala.xml.XML.withSAXParser method, but it's essentially the same thing.

So I think it's best to just document this and move on. I'm still not really clear how and where you want this to be documented.
If it's supposed to be part of the commit please let me know, if a blurb for the release notes is all you need, how about something like the following:

Safe parser defaults

Internally Scala XML makes use of the JDK SAXParser library, which by default enables support for a number of standard XML features that can cause security issues when exposed to untrusted input, allowing for denial of service attacks through resource exhaustion, exfiltration and exposure of local files or network resources, among other issues.

To be more robust against these attacks out-of-the-box, the default parser settings have now been changed to a more restricted and safer subset, disabling potentially exploitable features such as external/remote schemas, doctype and entities as well as XIncludes and namespaces.

Should you require support for any of these features and are confident your application is prevented from processing untrusted XML input, or have another need to customize these settings, you can create and use a custom XMLLoader instance like this:

import scala.xml.Elem
import scala.xml.factory.XMLLoader
import javax.xml.parsers.{SAXParser, SAXParserFactory}

object CustomXML extends XMLLoader[Elem] {
  override def parser: SAXParser = {
    val factory = SAXParserFactory.newInstance()
    factory.setFeature("http://xml.org/sax/features/validation", false)
    factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", false)
    factory.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false)
    factory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false)
    factory.newSAXParser()
  }
}
...
val elem = CustomXML.load(reader)

or this

import scala.xml.{XML, Elem}
import scala.xml.factory.XMLLoader
import javax.xml.parsers.{SAXParser, SAXParserFactory}

val customXML: XMLLoader[Elem] = XML.withSAXParser {
  val factory = SAXParserFactory.newInstance()
  factory.setFeature("http://xml.org/sax/features/validation", false)
  factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", false)
  factory.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false)
  factory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false)
  factory.newSAXParser()
}
...
val elem = customXML.load(reader)

SethTisue · 2021-03-03T20:39:08Z

So I think it's best to just document this and move on

Unless @ashawley disagrees, I'm happy to use the documentation you've provided (thank you!) and merge both PRs in time for 2.0.0-RC1 (#432)

ashawley · 2021-03-03T20:58:06Z

I recall the concerns are minimized now that Java has patched a lot of the underlying security problems. This fix raises a concern of breaking parsing for users who might need those features. However, this route is the simplest fix, so if there are issues it's easy to rollback and make a bug-fix release.

JLLeitschuh · 2021-03-04T14:37:00Z

Since this is a vulnerability, should this have a CVE assigned to it?

SethTisue · 2021-03-04T14:43:53Z

I'm not an expert on CVEs, but offhand I wouldn't think so, since:

any security issues are coming directly from the underlying SAXParser and not from the code in this repo
those security issues have been widely known for years
scala-xml can be configured securely, it just wasn't the default until now

JLLeitschuh · 2021-03-04T14:47:40Z

Benji @benjifin, Leeya @leeyashalti (from Snyk),

Can you weigh in here?

SethTisue mentioned this pull request Feb 20, 2018

Don't load remote DTDs by default (was SI-7726) #135

Closed

ashawley added this to the 1.1.1 milestone Feb 22, 2018

NthPortal reviewed Feb 22, 2018

View reviewed changes

Set safe defaults for parser settings

93caa1a

The library should be safe by default and potentially unsafe features should be explicitly enabled by the user if needed.

ashawley reviewed Feb 22, 2018

View reviewed changes

ashawley mentioned this pull request Apr 6, 2018

Check secure feature processing enabled #204

Merged

ashawley mentioned this pull request Jun 26, 2018

Releasing 1.1.1 #236

Closed

ashawley modified the milestones: 1.1.1, 1.2.0 Jun 27, 2018

ashawley modified the milestones: 1.2.0, 2.0 Apr 4, 2019

fgoepel mentioned this pull request Dec 6, 2020

Use a ThreadLocal to allow reusing parser instances #176

Merged

SethTisue self-assigned this Mar 3, 2021

SethTisue mentioned this pull request Mar 3, 2021

Release 2.0.0 (RC, then final) #432

Closed

ashawley approved these changes Mar 3, 2021

View reviewed changes

Merge branch 'master' into safe-defaults

0ba347b

SethTisue merged commit 97fabfb into scala:master Mar 4, 2021

SethTisue mentioned this pull request Mar 4, 2021

More secure parsing #17

Closed

rossabaker mentioned this pull request Mar 14, 2021

Configure more secure defaults for SAXParserFactory http4s/http4s#4619

Closed

ckipp01 mentioned this pull request May 20, 2021

Update scala-xml to 2.0.0 scoverage/scalac-scoverage-plugin#345

Closed

SethTisue mentioned this pull request Aug 10, 2022

Consider bumping scala-compiler's scala-xml dependency to 2.x in Scala 2.12 scala/bug#12632

Closed

lrytz mentioned this pull request Nov 12, 2024

Call reset on reusable SAXParser instance #742

Open

Set safe defaults for parser settings #177

Set safe defaults for parser settings #177

Conversation

fgoepel commented Nov 16, 2017

SethTisue commented Feb 20, 2018

fgoepel commented Feb 21, 2018

ashawley commented Feb 22, 2018 • edited Loading

fgoepel commented Feb 22, 2018

SethTisue commented Feb 22, 2018 • edited Loading

ashawley commented Feb 22, 2018 • edited Loading

fgoepel commented Feb 22, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SethTisue commented Feb 22, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SethTisue Feb 22, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eed3si9n Feb 23, 2018 • edited Loading

Choose a reason for hiding this comment

ashawley Feb 23, 2018 • edited Loading

Choose a reason for hiding this comment

ashawley commented Feb 22, 2018 • edited Loading

SethTisue commented Feb 22, 2018 • edited Loading

ashawley commented Feb 22, 2018

jroper commented Feb 26, 2018

jroper commented Feb 26, 2018 • edited Loading

ashawley commented Apr 6, 2018

ashawley commented Apr 6, 2018

SethTisue commented Dec 5, 2020

fgoepel commented Dec 6, 2020

SethTisue commented Dec 6, 2020

eed3si9n commented Dec 6, 2020

SethTisue commented Jan 13, 2021

fgoepel commented Feb 24, 2021

Safe parser defaults

SethTisue commented Mar 3, 2021 • edited Loading

ashawley commented Mar 3, 2021

JLLeitschuh commented Mar 4, 2021

SethTisue commented Mar 4, 2021 • edited Loading

JLLeitschuh commented Mar 4, 2021

ashawley commented Feb 22, 2018 •

edited

Loading

SethTisue commented Feb 22, 2018 •

edited

Loading

ashawley commented Feb 22, 2018 •

edited

Loading

SethTisue Feb 22, 2018 •

edited

Loading

eed3si9n Feb 23, 2018 •

edited

Loading

ashawley Feb 23, 2018 •

edited

Loading

ashawley commented Feb 22, 2018 •

edited

Loading

SethTisue commented Feb 22, 2018 •

edited

Loading

jroper commented Feb 26, 2018 •

edited

Loading

SethTisue commented Mar 3, 2021 •

edited

Loading

SethTisue commented Mar 4, 2021 •

edited

Loading