Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raise exception on duplicate keys #426

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

getaaron
Copy link

The YAML spec is extremely clear that duplicate keys are forbidden. A few examples:

an unordered association of unique keys to values

and:

keys are unordered and must be unique

and:

The content of a mapping node is an unordered set of key: value node pairs, with the restriction that each of the keys is unique

and:

YAML mappings require key uniqueness

For these reasons, psych should raise an exception when trying to parse YAML with duplicate keys. It should be considered a bug that this ever worked, so I don't think we need to worry about backwards compatibility to support out-of-spec YAML parsing.

This PR adds functionality to raise an exception on duplicates, and a test for it. Closes #79.

@getaaron
Copy link
Author

@yuki24 could we please merge this?

@hsbt
Copy link
Member

hsbt commented Mar 25, 2020

We should warn it before raising an exception. This change breaks the user application.

@tenderlove
Copy link
Member

I agree with @hsbt. The current functionality has been like this forever.

Besides that, I don't think this is the correct place to fix it. These stream handlers are used for building ASTs. I think it would be more appropriate to fix this where the AST is converted to a Ruby object. Would you mind fixing it in that place?

Here's a starting point:

diff --git a/lib/psych/visitors/to_ruby.rb b/lib/psych/visitors/to_ruby.rb
index a922f90..9ca2d9a 100644
--- a/lib/psych/visitors/to_ruby.rb
+++ b/lib/psych/visitors/to_ruby.rb
@@ -348,6 +348,8 @@ module Psych
           end
           val = accept(v)
 
+          raise Psych::Exception if hash.key? key
+
           if key == SHOVEL && k.tag != "tag:yaml.org,2002:str"
             case v
             when Nodes::Alias, Nodes::Mapping

The above patch makes the test in this PR pass, but there is another test in the suite that fails and I'm not sure why.

@tenderlove
Copy link
Member

Btw, adding this check is going to have a performance impact on loading hashes. It might be nice if there is a way we can let people who care more about performance and less about being true to the spec. Maybe introduce a strict mode?

@headius
Copy link
Contributor

headius commented May 8, 2020

Shouldn't this be enforced at the libyaml level?

For the JRuby extension, we could enable this strict behavior by setting options to the SnakeYAML library we use:

https://www.javadoc.io/doc/org.yaml/snakeyaml/1.20/org/yaml/snakeyaml/LoaderOptions.html

It seems odd to enforce this at the Ruby level when it's part of the YAML spec.

@tenderlove
Copy link
Member

@headius does that actually impact the AST, or just the loader? I don't know SnakeYaml very well, but it doesn't look like we're using LoaderOptions at all.

We could also do this while the AST is being created, but this PR seems to do post-processing on the AST and that is definitely not the right place to do it regardless.

@headius
Copy link
Contributor

headius commented May 8, 2020

@tenderlove That's a good point... it appears the LoaderOptions are applied when loading YAML into an in-memory graph of Java objects (maps, lists, etc). From what I can read in the code, it's applied at the point the MappingNode gets processed into a Map.

https://bitbucket.org/asomov/snakeyaml/src/70ff9d8b181f7e1aff9eb7349b2ca288cf308496/src/main/java/org/yaml/snakeyaml/constructor/SafeConstructor.java#lines-83:118

They don't do a separate scan of the node's contents. Instead they check each key inserted into the resulting map to see if something else was there already.

@tenderlove
Copy link
Member

They don't do a separate scan of the node's contents. Instead they check each key inserted into the resulting map to see if something else was there already.

Ok. So they're doing basically what I suggested to do. IOW the events and AST don't care that there are duplicate keys, but the thing that creates objects does.

@getaaron
Copy link
Author

getaaron commented May 8, 2020

@tenderlove so are the next steps…

  • Keep my test
  • Replace my implementation with your patch
  • investigate the other failing test you mentioned

?

What about @hsbt’s concern? Strict mode seems reasonable, but might expand the scope of this PR a lot.

I’m happy to do some more work on this but I could use some more design direction since it’s my first time working in this repo.

@tenderlove
Copy link
Member

@getaaron yep, I think those are the next steps. The initialize method for the ToRuby visitor already takes some options. I think we could use that to configure the strategy we need. I pushed a commit here that implements it (but I haven't tested it)

@hsbt
Copy link
Member

hsbt commented Jan 27, 2023

I forgot to this for Psych 5. It's good time to show deprecate warnings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Psych does not complain about duplicate keys
4 participants