-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Raise exception on duplicate keys #426
base: master
Are you sure you want to change the base?
Conversation
@yuki24 could we please merge this? |
We should warn it before raising an exception. This change breaks the user application. |
I agree with @hsbt. The current functionality has been like this forever. Besides that, I don't think this is the correct place to fix it. These stream handlers are used for building ASTs. I think it would be more appropriate to fix this where the AST is converted to a Ruby object. Would you mind fixing it in that place? Here's a starting point: diff --git a/lib/psych/visitors/to_ruby.rb b/lib/psych/visitors/to_ruby.rb
index a922f90..9ca2d9a 100644
--- a/lib/psych/visitors/to_ruby.rb
+++ b/lib/psych/visitors/to_ruby.rb
@@ -348,6 +348,8 @@ module Psych
end
val = accept(v)
+ raise Psych::Exception if hash.key? key
+
if key == SHOVEL && k.tag != "tag:yaml.org,2002:str"
case v
when Nodes::Alias, Nodes::Mapping The above patch makes the test in this PR pass, but there is another test in the suite that fails and I'm not sure why. |
Btw, adding this check is going to have a performance impact on loading hashes. It might be nice if there is a way we can let people who care more about performance and less about being true to the spec. Maybe introduce a |
Shouldn't this be enforced at the libyaml level? For the JRuby extension, we could enable this strict behavior by setting options to the SnakeYAML library we use: https://www.javadoc.io/doc/org.yaml/snakeyaml/1.20/org/yaml/snakeyaml/LoaderOptions.html It seems odd to enforce this at the Ruby level when it's part of the YAML spec. |
@headius does that actually impact the AST, or just the loader? I don't know SnakeYaml very well, but it doesn't look like we're using We could also do this while the AST is being created, but this PR seems to do post-processing on the AST and that is definitely not the right place to do it regardless. |
@tenderlove That's a good point... it appears the LoaderOptions are applied when loading YAML into an in-memory graph of Java objects (maps, lists, etc). From what I can read in the code, it's applied at the point the MappingNode gets processed into a Map. They don't do a separate scan of the node's contents. Instead they check each key inserted into the resulting map to see if something else was there already. |
Ok. So they're doing basically what I suggested to do. IOW the events and AST don't care that there are duplicate keys, but the thing that creates objects does. |
@tenderlove so are the next steps…
? What about @hsbt’s concern? Strict mode seems reasonable, but might expand the scope of this PR a lot. I’m happy to do some more work on this but I could use some more design direction since it’s my first time working in this repo. |
@getaaron yep, I think those are the next steps. The initialize method for the |
I forgot to this for Psych 5. It's good time to show deprecate warnings. |
The YAML spec is extremely clear that duplicate keys are forbidden. A few examples:
and:
and:
and:
For these reasons, psych should raise an exception when trying to parse YAML with duplicate keys. It should be considered a bug that this ever worked, so I don't think we need to worry about backwards compatibility to support out-of-spec YAML parsing.
This PR adds functionality to raise an exception on duplicates, and a test for it. Closes #79.