-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MNG-5862] Support XML entities and XInclude #1205
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this PR only supports external entities, not XInclude?
There might be some security implications here. There certainly will be people who claim there are.
This likely needs doc changes as well.
maven-model-transform/src/main/java/org/apache/maven/model/transform/stax/XmlUtils.java
Outdated
Show resolved
Hide resolved
Both, but the code for XInclude support is in third-party library for now: https://github.com/gnodet/stax-xinclude
Those new options are conditioned by reading a local file with strict mode (which means not when loading a POM from a dependency, i.e. only for projects being built). If needed, we could use a custom resolver and restrict to loading entities / xinclude from the file system, or even the project tree, but I'd like to avoid any restriction if they are not needed. As the files are loaded from locally built POM files, the user should be in control of those files.
Yes, I'll try to find where... |
2950c9f
to
6d99605
Compare
I've modified the xinclude support so that it reuses the XMLResolver. This allows using a single implementation to load external stuff. I've thus modified it to reject any non relative URI, which means the code can only access files under the user's control, so that should be fine from a security perspective. |
Is this wise? |
maven-model-builder/src/main/java/org/apache/maven/model/io/DefaultModelReader.java
Outdated
Show resolved
Hide resolved
maven-model-builder/src/main/java/org/apache/maven/model/io/DefaultModelReader.java
Outdated
Show resolved
Hide resolved
I tried to mitigate the risks by only loading from rejecting any absolute URI. Also, no file with such entities / xinclude import should end up in maven central, those are translated when installed / uploaded. So this should only happen at build time, for the pom.xml that are parts of the build. The security problems are then kinda in the hand of the developper I would think. That's said, I'd like to find a consensus, and if it's seen as too risky, we could go with only mixins. That's the whole point of the discussion I started on dev. |
Would you consider adding a CLI option that a user can specify to Maven to explicitly tell it not to support remote entities / xinclude stuff (and possibly even default the option to having remote entities / xinclude off)? |
2b731fe
to
e33a326
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the library you wrote uses the org.apache.maven package, it needs to move into this repo (package protected?) or into another package or something before maven core can depend on it.
I've added a way to opt-out using |
Thank you! |
@elharo can I go ahead with this PR ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not convinced this is a good idea, but if it's going in, there are still some ciode issues to be addressed.
api/maven-api-core/src/main/java/org/apache/maven/api/feature/Features.java
Outdated
Show resolved
Hide resolved
api/maven-api-core/src/main/java/org/apache/maven/api/feature/Features.java
Outdated
Show resolved
Hide resolved
api/maven-api-core/src/main/java/org/apache/maven/api/feature/Features.java
Outdated
Show resolved
Hide resolved
maven-core/src/test/java/org/apache/maven/project/ProjectBuilderTest.java
Outdated
Show resolved
Hide resolved
maven-model-transform/src/main/java/org/apache/maven/model/transform/stax/BufferingParser.java
Outdated
Show resolved
Hide resolved
maven-stax-xinclude/src/main/java/org/apache/maven/stax/xpointer/ElementPointerPart.java
Outdated
Show resolved
Hide resolved
maven-stax-xinclude/src/main/java/org/apache/maven/stax/xpointer/ElementPointerPart.java
Outdated
Show resolved
Hide resolved
maven-stax-xinclude/src/main/java/org/apache/maven/stax/xpointer/InvalidXPointerException.java
Outdated
Show resolved
Hide resolved
maven-stax-xinclude/src/main/java/org/apache/maven/stax/xpointer/XMLElementEvaluator.java
Outdated
Show resolved
Hide resolved
Well, I tried to kick a discussion on dev@ but not much interest there... The only concerns were the same that have been raised here around security, and while I agree it could have been a problem, I think those concerns have been addressed in the PR. The idea of xinclude/xml entities is slightly redundant with pom mixins, so I would have expected that this would have been raised... This discussion was started mid-august, definitely not the best time to have people involved. |
FYI, just from a simple user's perspective, it seems peculiar that "experimental" ( Typically I would expect a feature/option that a software developer considers "experimental" to be something that a user has to opt-in to enable, not opt-out to disable. I'm not sure if that means these new features should be disabled by default, or if the options should be renamed to better indicate they aren't considered "experimental"? |
@gnodet Thanks for you work. Any plans on resolving the requested change? |
One possibility would be to create an extension to support this feature. This should be possible as the entities/xinclude are only processed at build time and the consumer pom is flattened. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly nits.
More generally, rereading this I'm struck at how much work this is. You've provided a nearly complete XInclude implementation. The maintenance burden on this code is likely to be high. It feels like this belongs in the parser, or perhaps a separate library on top of the parser, and all Maven should do when reading a pom is flip the flag to turn on XInclude support.
Maven's XML parsing is already quite non-standard and avoids the normal parsers like Xerces or the JDK's because of decisions made 20 years ago when the easier path was not so obvious. This is unfortunately baked into a lot of the code and even the public APIs, so I'm not sure how much of that can be repaired now.
Nonetheless, I do wonder if this code belongs here instead of in the parser.
maven-core/src/test/java/org/apache/maven/project/ProjectBuilderTest.java
Outdated
Show resolved
Hide resolved
maven-core/src/test/java/org/apache/maven/project/ProjectBuilderTest.java
Show resolved
Hide resolved
@@ -246,8 +246,7 @@ void testReadInvalidPom() throws Exception { | |||
assertThat(pex.getResults().get(0).getProblems().size(), greaterThan(0)); | |||
assertThat( | |||
pex.getResults(), | |||
contains(projectBuildingResultWithProblemMessage( | |||
"Received non-all-whitespace CHARACTERS or CDATA event in nextTag()"))); | |||
contains(projectBuildingResultWithProblemMessage("expected START_TAG or END_TAG, not CHARACTERS"))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't go further than asserting the message is non-null. Messages are informative, not part of the stable API
} | ||
|
||
XMLStreamReader parser; | ||
// We only support xml entities and xinclude when reading a file in strict mode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: external general entities and XInclude
maven-model-builder/src/main/java/org/apache/maven/model/io/DefaultModelReader.java
Show resolved
Hide resolved
@@ -45,6 +45,12 @@ public interface ModelReader { | |||
*/ | |||
String INPUT_SOURCE = "org.apache.maven.model.io.inputSource"; | |||
|
|||
/** | |||
* Name of the property used to store a boolean {@code true} if XInclude supports |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
supports --> support
/** | ||
* Evaluates the XPointer on the root Element and returns the resulting Element or null. | ||
* | ||
* @return an Element from the resultant evaluation of the root Element or null if evaluation fails |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete resultant
int endChar; | ||
|
||
// Find an NCName if it exists? | ||
startChar = schemeData.indexOf("/"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
declare here
package org.apache.maven.stax.xinclude; | ||
|
||
/** | ||
* This class represents Exceptions that can happen during parsing an XPointer Expression. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
during --> while
* This class represents the data type NCName use for XML non-colonized names. | ||
*/ | ||
@SuppressWarnings("checkstyle:MagicNumber") | ||
class NCName { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a lot to reinvent and maintain for one function. Maybe we need this to avoid extra dependencies, but I'm sure this already exists in Xerces, XML Commons, or any of several other projects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reused the woodstox methods.
@@ -62,6 +64,18 @@ public static boolean buildConsumer(@Nullable Session session) { | |||
return buildConsumer(session != null ? session.getUserProperties() : null); | |||
} | |||
|
|||
public static boolean xinclude(@Nullable Properties userProperties) { | |||
return doGet(userProperties, XINCLUDE, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would still advocate for this being off by default (opt-in) instead of on by default (opt-out) since XML parsers have a long history of security vulnerabilities surrounding this stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
They should be all fixed now.
We switched to a standard StaX parser in 4.x. The xml parser is now Woodstox, which is a fully conformant parser. Unfortunately, I don't know any StaX parser that provides support for XInclude, so I ended up writing one, borrowing the XPointer implementation from Apache Woden. I'd rather keep it separated from the parser so that we can eventually switch to Aalto which is slightly faster than Woodstox, but there were a few problems when I tried initially. As for the location, I originally put it in an external repository (https://github.com/gnodet/stax-xinclude), but you asked to move it back in this repository. I agree, this library is independent of Maven and would be better outside. I can maintain it as a personal project (and rename the package) if you think it's better, I don't really mind. |
A personal project won't really work, but is there any way this can go into Woodstox or the Apache XML project or something like that? |
|
Umm. Maybe I was referenced by mistake, but I'm not sure of my relevance to this? (If this is the context of the Apache Xerces project, while I'm on the PMC, I'm a committer on the C++ side, I stopped using the Java version a long time ago, and use the JDK for that exclusively now.) As a maintainer of software that uses Maven itself, I certainly have "opinions" but I think you all addressed the main one, which is to turn this off by default. All I can say is, please don't ever enable this by default. Ever. I say that as somebody that's worked with XML for 25 years and has been fixing security bugs in the use of it for about 20. Maybe that's why you asked me? |
The stigma surrounding xinclude is really quite infuriating. Is there a feature more maligned than this? The disparity between its notoriety and its simplicity is something almost poetic. The fact is pom files already compose in at least 3 other ways - so if there's some "security principle" at play its already broken. It could be argued the whole purpose of pom files is to compose. I'm also not sure that xincludes will really work in practice, but I've yet to see why not. |
Yes, entities, deservedly so. That was actually more my concern. I have no background on what this issue is about or why I was asked about it, but apologies for not clarifying which of the two I was expressing the opinion about. I know very little about XInclude as it's very poorly supported in general, it came too late, so I've generally ignored it. My opinion is that it's not sensible for Maven to even consider including a one-off implementation of any XML standard addendum. But that's a decision for its maintainers. |
I remember thinking about XInclude at some point wrt Woodstox, but it seemed rather complicated to support for a streaming parser. So likely it'd instead need to be some processor on top of generic Stax (or SAX interface), in which case Woodstox as well as any other compliant implementation would work. I don't think I personally have time to work on such thing, although I can see how it'd be useful. |
This should become a separate Maven extension. |
An extension will be available at |
Cool, will check it out! |
This PR is built on top of #1245 which can be integrated separately. This feature could be opt-in or even extracted as a separate extension (hence the PR to give the needed support).
Note that the tests do ensure that any POM installed/deployed is a standalone one, i.e. the xinclude / entities are inlined during the consumer pom transformation step.