Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quoting strings y/n when dumping? #443

Closed
tisba opened this issue Mar 25, 2020 · 15 comments · Fixed by #515
Closed

Quoting strings y/n when dumping? #443

tisba opened this issue Mar 25, 2020 · 15 comments · Fixed by #515

Comments

@tisba
Copy link

tisba commented Mar 25, 2020

Hey there 👋

I'm trying to wrap my head around booleans and YAML. We encountered the issue, when we tried to parse a YAML back generated by YAML.dump().

Environment: Ruby 2.7.0p0,

Example:

{"n" => "y"}.to_yaml

generates

---
n: y

When we parse this e.g. using https://github.com/go-yaml/yaml it parses as (written as Ruby):

{ false => true }

When I look at https://yaml.org/type/bool.html, I would assume that the expected generated YAML should be more like this a String rendered:

---
'n': 'y'

Am I missing something?

Some more experiments, in case that's any help, show that y, Y, n and N are not quoted as required:

# taken from https://yaml.org/type/bool.html
v = %w(y Y yes Yes YES n N no No NO true True TRUE false False FALSE on On ON off Off OFF)

v.each { |x| puts YAML.dump(x) }
# --- y
# --- Y
# --- 'yes'
# --- 'Yes'
# --- 'YES'
# --- n
# --- N
# --- 'no'
# --- 'No'
# --- 'NO'
# --- 'true'
# --- 'True'
# --- 'TRUE'
# --- 'false'
# --- 'False'
# --- 'FALSE'
# --- 'on'
# --- 'On'
# --- 'ON'
# --- 'off'
# --- 'Off'
# --- 'OFF'
@esotericpig
Copy link

According to the YAML spec, keys can be non-strings:

Since YAML mappings require key uniqueness, representations must include a mechanism for testing the equality of nodes. This is non-trivial since YAML allows various ways to format a given scalar content. For example, the integer eleven can be written as “013” (octal) or “0xB” (hexadecimal). If both forms are used as keys in the same mapping, only a YAML processor which recognizes integer formats would correctly flag the duplicate key as an error.

As you can see here, mapping keys can be integers (and therefore, they can also be booleans, etc.).

If you want a string, you must quote it:

Psych.load("---\n1: 2\n")            # => {1=>2}
Psych.load("---\n'1': '2'\n")        # => {"1"=>"2"}
Psych.load("---\ntrue: false\n")     # => {true=>false}
Psych.load("---\n'true': 'false'\n") # => {"true"=>"false"}

This is all functioning correctly, per the spec.

Now, in YAML 1.1, a boolean can be y|Y|yes|Yes|YES|n|N|no|No|NO |true|True|TRUE|false|False|FALSE |on|On|ON|off|Off|OFF.

In YAML 1.2, a boolean can only be true|false (case-insensitive).

Psych only allows YAML 1.1, so actually it's broken and produces a string:

# Incorrect according to spec!
Psych.load("%YAML 1.1\n---\nn: y\n") # => {"n"=>"y"}

# Correct
Psych.load("%YAML 1.1\n---\nno: yes\n") # => {false=>true}
Psych.load("%YAML 1.1\n---\non: off\n") # => {true=>false}

So actually there's a minor bug in Psych.

Anyway, to fix your issue, you must single/double quote the key, per the YAML spec.

Or, you could use YAML 1.2 for go-yaml, but this will raise an error with Psych:

%YAML 1.2
---
n: y

@esotericpig
Copy link

esotericpig commented Apr 12, 2020

Forgot to talk about dump, which is also broken:

# Broken! Should add quotes!
puts ({'y'    => 'n'    }).to_yaml # => y: n

puts ({'yes'  => 'no'   }).to_yaml # => 'yes': 'no'
puts ({'true' => 'false'}).to_yaml # => 'true': 'false'
puts ({true   => false  }).to_yaml # => true: false

I tried fixing it in a hacky way, but doesn't work:

require 'psych'

class NotBool < String
  def encode_with(coder)
    coder.scalar = self
    #coder.scalar = "'#{self}'" # This doesn't work either
    coder.style = Psych::Nodes::Scalar::SINGLE_QUOTED
    coder.tag = nil
  end
end

# Doesn't add quotes
puts ({NotBool.new('y') => NotBool.new('n')}).to_yaml

The only way I could get it to work is the super hacky way:

require 'yaml'

x = {'_notbool_y_notbool_' => '_notbool_n_notbool_'}

puts x.to_yaml.gsub('_notbool_',"'")

In short, never dump a String containing only y or n until it's fixed.

@esotericpig
Copy link

According to test_boolean.rb (test_y and test_n), I guess this isn't a bug. It says that Syck treats y/n as strings on purpose.

However, IMO, that's broken and not following YAML 1.1 spec, and therefore makes potentially incompatible YAML files for other YAML parsers to consume. It also seems to be bug to not allow adding quotes using the coder/encode_with.

It looks pretty easy to change the Ruby code though. Maybe I'll write a pull request and the Psych team can decide if want it or not.

@tisba
Copy link
Author

tisba commented Apr 21, 2020

@esotericpig The problem in our case is, that part of the data we serialise is user-provided. So we cannot "not" support "y", "n" etc.

The workaround for us was in this case to update the Golang library which in the newer version only supports YAML 1.2.

The solution before was what you did: Generate a unique string and replace it in the output. That's actually very messy, but it works.

I could understand when this won't be accepted as a general change, because it could lead to problems with applications relying on this buggy behaviour. But at least a "strict" flag or something like that for dump and load would be nice.

@tisba
Copy link
Author

tisba commented Jun 7, 2020

Oh and actually, that's also not following the YAML 1.0 spec. "y" should has to be encoded as a string while true maybe encoded as y (without quotes).

@tisba
Copy link
Author

tisba commented Jun 7, 2020

I'm not sure I'm reading #448 correctly, @esotericpig. It looks like it only changes parsing YAML, not generating, right? So the actual issue won't be fixed?

@esotericpig
Copy link

Oh and actually, that's also not following the YAML 1.0 spec. "y" should has to be encoded as a string while true maybe encoded as y (without quotes).

According to YAML 1.1 spec, true and y without quotes are translated to the language booleans. And yes, "y" with quotes should be translated to just a string. My wording was probably confusing, sorry.

I'm not sure I'm reading #448 correctly, @esotericpig. It looks like it only changes parsing YAML, not generating, right? So the actual issue won't be fixed?

Even though I only changed one spot in the code, it affects both. I tested it again to make sure:

irb(main)> {running: 'y'}.to_yaml
=> "---\n:running: 'y'\n"
irb(main)> {running: true}.to_yaml
=> "---\n:running: true\n"

It correctly quotes the y. You'll never do this in code (unless it's a variable):

{running: y}.to_yaml

And here is parsing:

irb(main)> Psych.load("---\n:running: true\n")
=> {:running=>true}
irb(main)> Psych.load("---\n:running: y\n")
=> {:running=>true}
irb(main)> Psych.load("---\n:running: 'y'\n")
=> {:running=>"y"}

===

Unfortunately, it looks like I need to change the code to use a strict flag instead after code review, but not sure if I need to wait for pull/426 first.

@alexjfisher
Copy link

This is similar to #387 where to_yaml doesn't quote things like some mac addresses. Parsers that correctly implement the (somewhat stupid) 1.1 spec then load them as integers.

@tenderlove
Copy link
Member

In short, never dump a String containing only y or n until it's fixed.

Yes, we should put quotes around "y" and "n" when dumping (regardless of "strictness"). Loading seems a bit more tricky because people might be expecting the current behavior.

Looks like I did some work here related to #426. If someone could incorporate that in to a PR I would merge it.

tenderlove added a commit that referenced this issue Aug 4, 2021
'y' and 'n' are kind of ambiguous.  Syck treated y and n literals in
YAML documents as strings.  But this is not what the YAML 1.1 spec says.
YAML 1.1 says they should be treated as booleans.  When we're dumping
documents, we know it's a string, so adding quotes will eliminate the
"ambiguity" in the emitted document

Fixes #443
@simi
Copy link
Contributor

simi commented Sep 8, 2022

What about this case? Would it be possible to quote it as well?

YAML.dump('A' => '2020-12-31')
=> "---\nA: 2020-12-31\n"

YAML.load(YAML.dump('A' => '2020-12-31'))
=> Tried to load unspecified class: Date (Psych::DisallowedClass)

@tisba
Copy link
Author

tisba commented Sep 8, 2022

That is obviously a bug too. You should create a new dedicated issue for that.

@simi
Copy link
Contributor

simi commented Sep 8, 2022

@tisba I'll try to prepare PR.

@tisba
Copy link
Author

tisba commented Sep 9, 2022

I cannot reproduce your issue with 4.0.3, @simi. This works fine for me

YAML.load(YAML.dump('A' => '2020-12-31'))
 => {"A"=>"2020-12-31"}

@simi
Copy link
Contributor

simi commented Sep 9, 2022

I cannot reproduce your issue with 4.0.3, @simi. This works fine for me

YAML.load(YAML.dump('A' => '2020-12-31'))
 => {"A"=>"2020-12-31"}

I have found out the problem is related actually to result of Date.strptime('2020-12-31', '%F', Date::GREGORIAN). In some environments that can fail and since that string is not considered date, it is not properly quoted.

My problem was caused by https://github.com/travisjeffery/timecop/blob/864bcf16f70f31a8e125cc3135889067fdd2a373/lib/timecop/time_extensions.rb#L48-L51. Patching stdlib is wrong, I think there's nothing we can improve on Psych side actually.

ColinDKelley added a commit to Invoca/psych that referenced this issue Jul 19, 2023
ColinDKelley added a commit to Invoca/psych that referenced this issue Jul 19, 2023
ColinDKelley added a commit to Invoca/psych that referenced this issue Jul 19, 2023
@ColinDKelley
Copy link
Contributor

ColinDKelley commented Jul 19, 2023

@tisba: JFYI, in PR #641, I've added a test case that iterates across exactly the list you point to ensure that all are properly quoted by psych so as not to cause errors if interpreted under the YAML 1.1 or earlier spec. (In our case it was kubernetes--kubectl apply in particular--that misinterpreted a Y string and converted it to true.)

Some more experiments, in case that's any help, show that y, Y, n and N are not quoted as required:

# taken from https://yaml.org/type/bool.html
v = %w(y Y yes Yes YES n N no No NO true True TRUE false False FALSE on On ON off Off OFF)
...

tenderlove added a commit that referenced this issue Jan 17, 2024
issue #443: quote Y and N when dumping
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

6 participants