Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression: Could not parse YAML metadata #4959

Closed
rriemann opened this issue Oct 8, 2018 · 6 comments
Closed

Regression: Could not parse YAML metadata #4959

rriemann opened this issue Oct 8, 2018 · 6 comments

Comments

@rriemann
Copy link

rriemann commented Oct 8, 2018

The following problem does not occur in Pandoc 2.2.1 and occurs in all recent versions starting with Pandoc 2.2.2.

Minimal Example

test.yml contains:

---
reason: 'Was geht?

'
---

This is how libyaml embedded by the Ruby programming language outputs strings with trailing newline "\n". The file can be produced with this ruby command:

ruby -r yaml -e 'puts Hash({"reason" => "Was geht?\n"}).to_yaml + "---"'

In my actual setup, I generate meta-data and use this in a document. For the minimal example, I just output the meta-data in JSON AST format.

pandoc test.yml -t json

Expected Output

The output with pandoc 2.2.1 is:

pandoc-2.2.1/bin/pandoc test.yml -t json
{"blocks":[],"pandoc-api-version":[1,17,4,2],"meta":{"reason":{"t":"MetaBlocks","c":[{"t":"Plain","c":[{"t":"Str","c":"Was"},{"t":"Space"},{"t":"Str","c":"geht?"}]}]}}}

Erronous Output

Pandoc 2.2.2 and higher gives a different output:

pandoc-2.2.2/bin/pandoc test.yml -t json                                                                                                      rriemann@mars
[WARNING] Could not parse YAML metadata at line 1 column 1: :2:18: Unexpected '
  '
{"blocks":[],"pandoc-api-version":[1,17,5,1],"meta":{}}

As you can see, the meta data is empty.

The cause is certainly linked to the dependency change to HsYAML from @hvr, that I kindly ask to help determining if the file test.yml is actually supported syntax.

@jgm
Copy link
Owner

jgm commented Oct 8, 2018 via email

@jgm
Copy link
Owner

jgm commented Oct 9, 2018

Testing directly with HsYAML:

Data.YAML> decodeNode' failsafeSchemaResolver False False (fromStringLazy "foo: 'hi\n'")
Left ":1:8: Unexpected '\n'"
Data.YAML> decodeNode' failsafeSchemaResolver False False (fromStringLazy "foo: 'hi\n '")
Right [Doc (Mapping Nothing (fromList [(Scalar (SUnknown Nothing "foo"),Scalar (SStr "hi "))]))]
Data.YAML> decodeNode' failsafeSchemaResolver False False (fromStringLazy "'hi\n'")
Right [Doc (Scalar (SStr "hi "))]

@rriemann
Copy link
Author

rriemann commented Oct 9, 2018

So the Ruby lib is based on a C lib libyaml that does not support YAML 1.2 yet.
Upstream Bug report: yaml/libyaml#20

I could not find out whether my test file is YAML 1.2 compliant.

@jgm
Copy link
Owner

jgm commented Oct 10, 2018

I don't think there's much more we can do about this on the pandoc side. If you find there's a bug in HsYAML, you should report there.

@jgm jgm closed this as completed Oct 10, 2018
@hvr
Copy link
Contributor

hvr commented Oct 10, 2018

I'm pretty confident that

- 'Was geht?

'

or

reason: 'Was geht?

'

are in fact not valid YAML 1.2

If you look at section 7.3.2. Single-Quoted Style, you'll notice that the rules

[123]	nb-ns-single-in-line	::=	( s-white* ns-single-char )*	 
[124]	s-single-next-line(n)	::=	s-flow-folded(n) ( ns-single-char nb-ns-single-in-line   ( s-single-ext-line(n) | s-white* ) )?	 
[125]	nb-single-multi-line(n)	::=	nb-ns-single-in-line ( s-single-next-line(n) | s-white* )

all have a n parameter which is used to keep track of the relative indentation level to encode the general rule that nodes must be indented one bit more than the block node they're contained in. And in particular, the s-flow-folded(n) production enforces leading indentation before non-space content of amount n.

And as such, if e.g. - (yaml sequence indicator) is at n = 0, then the single-quoted scalar inside that block collection is e.g. at least at level n = 1.


PS: As it turns out, there's a negative test in the YAML testsuite at http://matrix.yaml.io/sheet/invalid.html#QB6E which expects a compliant YAML parser to fail on

---
quoted: "a
b
c"

@rriemann
Copy link
Author

Thanks for telling us @hvr.

I just report here for those running into similar issues. I used the YAML 1.2 compliant lib ruamel.yaml to find out the YAML 1.2 compliant fix for the example meta data file. One solution (maybe there are others) is:

---
reason: "Was geht?\n"
---

What is different?

  1. use of double quotation marks
  2. use of escape sequence "\n" for new line instead of two actual new lines

My solution is to produce my file with Ruby and then fix this one problem manually with regular expressions. Of course, with a different feature set used in the YAML file, other problems may occur that also need manual treatment. So I hope that in the long run, a YAML 1.2 compliant Ruby lib becomes available.

# fix YAML 1.2 compatibility for pandoc > 2.2.1, see https://stackoverflow.com/a/30049447/1407622
sed -r -z -i "s/: '([^']+)\n\n'/: \"\1\\\n\"/g" test.yml

jennybc added a commit to tidyverse/reprex that referenced this issue Nov 3, 2018
From Pandoc 2.2.2, HsYAML is the new YAML parser and it has a stricter
interpretation of YAML 1.2. My previous indentation was causing an
error that resulted, I believe, in the pandoc_args being ignored (?).

I detected this via a test failure re: standard output. The parsing
error was showing up on standard output in a case where standard output
should have been empty.

https://stackoverflow.com/questions/31839686/wrapping-a-list-over-multiple-lines-yaml

jgm/pandoc#4959 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants