Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Documented normalized YAML style #77

Open
ssbarnea opened this issue Sep 30, 2020 · 10 comments
Open

Proposal: Documented normalized YAML style #77

ssbarnea opened this issue Sep 30, 2020 · 10 comments

Comments

@ssbarnea
Copy link

It would really be nice if we could get some official and clear guidelines, ones that document a canonical format to follow, at least for those that want to be strict.

This style-reference should be used as reference for implementing style checkers, linters and formatters. The goal should be to avoid alternatives, so the result would be predictable, minimizing chances or introducing

One example of formatting rule that needs to be addressed is when document-start should be or not at the start of a file. Current spec allows both, creating a diverging point.

Background info copied from original ticket

I used to assume that ideally --- should be at start of any document, but received multiple complaints regarding "why? what is the benefit?". I may be biased as while I use YAML a lot, it is always for configuration files/management and I never encountered a real need to use multi-documents per-file.

Especially for linters and formatters it is essential to get help from the spec team in order to define clear behaviors. This is important because a predictable formatting avoids divergence and endless discussions during code reviews: john: you forgot the ---! alan: why you added the ---? ...

Please note that I am not proposing making one option or another a syntax error, but I am proposing documenting a canonical/normalized format. A good example of similar issue was the canonical format with Unicode, one that aimed to fix issues where the same text could be written in multiple forms, causing serious platform compatibility bugs. They introduced a normalized forms for that. Lucky for yaml is should be much easier to do something like that.

Basically this is about avoiding undetermined desired behavior when using "optional". If document-start is optional at the beginning of the file, we should also mention the desirable normalized form with or without it.

@perlpunk
Copy link
Member

perlpunk commented Sep 30, 2020

I think guidelines are good, yes :)

There might be differences depending on the context.
For example, a YAML processor should by default always output --- at the start of the document.

However, the default for a linter or formatter might be different.

I personally think, having --- on top by default is a good thing. It visually marks something as YAML.

For the dumper output we have the YAML Test Suite.
Wherever there is a dump.yaml in the test, this should be the default Dumper output.
But the data might change; so far we have been concentrating on parsing.

I'm currently working on yamltidy, and I'll try to add extra test cases for this context.
The advantage of yamltidy over yamllint (in this case) is, that it can automatically add --- on top of the files, so if it was forgotten, people don't have to manually add it. (Note that I haven't implemented this feature yet.)
People simply run yamltidy after making changes and before submitting their PRs.

There are also things that yamltidy won't be able to fix, for example mis-indented comments, where it's not clear how they should be indented instead.

I hope yamllint and yamltidy will be used more in the future, and I also hope that we can agree on some sane defaults.

@The-Compiler
Copy link

My $0.02: It mostly depends on whether you think of YAML as a format for serialization or as a config format.

No idea if this is universally true, but at least in my bubble, the primary use I see for YAML is as a configuration format - think Travis, GitHub Actions, various linters, etc. etc.

I believe this is the most likely usage of YAML a first-timer is going to see. I've never seen the documentation of any of those tools mention the --- marker - heck, even yamllint itself doesn't.

Thus, I believe the --- is rather confusing and I'd argue it's redundant for configuration which is typically stored in a single .yml / .yaml file anyways.

@ingydotnet
Copy link
Member

Re canonicalization, we are working on a guide for that. I'm fairly sure you'll see it before the end of the year, and possibly quite soon.

Re --- (document header), it will always be optional for "simple" single document streams. We are planning to make it mandatory for top level nodes that are scalars, or have an anchor or tag. These are less common situations, and it makes things more clear.

Linters should never error on the lack of a header, but may offer suggestions to use it.

@ssbarnea
Copy link
Author

As long the recommendation can be made into code that does not need AI to make a decision, I will be happy with it.

As "style" is renowned to create endless debates I hope that this guide will learn from others experience and start small and slowly grow (so users gradually get used to the rules). If not the risk is nobody will adopt the rules.

My favorite success example is https://github.com/psf/black#pragmatism which, despite encountering lots of issues, managed to gather more and more python projects. Some decisions will need surveys for how users prefer to do stuff in the wild (indentation rules). Black had a rough decision to make between double and single quoting, one that is still causing debates on various projects.

@ingydotnet
Copy link
Member

I'll add to this that I agree with @perlpunk that seeing --- at the top of a text is a good confirmation that it is YAML. I strongly support making it mandatory in places that help people recognize unfamiliar texts as YAML. Here are some examples:

# Simple top level collections are (usually) easily recognizable as YAML
name: Sue
color: blue

# or
- one
- two

# scalars however often aren't recognizable as YAML
Roses are blue,
People are too...

# quotes don't help much
"Is this YAML?"

# literals don't look as good without ---
|
  true = false
  
# or folds
>
  this and
  that

# or tags and anchors
&this
!foo
thing

Preferred for these edge cases is:

---
  Roses are blue,
  People are too...

--- "Is this YAML?"

--- |
  true = false
  
--- >
  this and
  that

--- &this !foo thing

@ingydotnet
Copy link
Member

ingydotnet commented Sep 30, 2020

@ssbarnea I think there is a slight disconnect here. You want to know the best practices for humans writing YAML. We are more thinking about the best defaults for YAML dumpers/emitters.

I would suggest for starters that you use a libyaml based dumper as a preliminary guideline. It is quite close to what we are leaning towards for best practices.

Here's a Python (pyyaml usually uses libyaml, and has the same defaults when it doesn't) one liner you could use for this purpose:

python -c 'import sys, yaml; print(yaml.dump(yaml.unsafe_load(sys.stdin.read()), default_flow_style=False))'

That will read YAML on stdin and write libyaml's preferred style of YAML on output. For example:

echo '{foo: [1,2,3]}' | python -c 'import sys, yaml; print(yaml.dump(yaml.unsafe_load(sys.stdin.read()), default_flow_style=False))'

This (load and dump) is what I have done for a yaml-tidy solution for years. Of course this loses comments. Some YAML frameworks out there have support for preserving comments.

@ssbarnea
Copy link
Author

Yes, I am interested only about the human side. AFAIK, if it was about machine-only format I would have being used JSON/XML/protobuf/... ;)

One deal breaker here is the fact that libyaml has no support for keeping comments, something that is very important for the human side.

Maybe i should rephrase the proposal and make it clear that is about files that are already valid YAML, not about files that do not load as YAML.

@perlpunk
Copy link
Member

perlpunk commented Sep 30, 2020

One deal breaker here is the fact that libyaml has no support for keeping comments, something that is very important for the human side.

FWIW, yamltidy is based on libyaml.
While the parsing events do not have whitespace or comment information, it is possible to keep them by looking at the parsing event start and end positions and the corresponding lines in the original content.
That's what I'm doing right now in yamltidy.
I hope I can add more features soon.

@ingydotnet
Copy link
Member

@ssbarnea here's one way you can tidy YAML, keeping comments, with a one line script:

★ ~ $ python3 -m venv venv
★ ~ $ source venv/bin/activate
(venv) ★ ~ $ pip install ruamel.yaml
...
Successfully installed ruamel.yaml-0.16.12 ruamel.yaml.clib-0.2.2
(venv) ★ ~ $ cat test.yaml 
# comment

  foo: # comment
   - bar    # comment
   - baz      # comment
(venv) ★ ~ $ python -c 'import sys, ruamel.yaml; yaml = ruamel.yaml.YAML(); yaml.dump(yaml.load(sys.stdin.read()), sys.stdout)' < test.yaml 
# comment

foo:   # comment
- bar       # comment
- baz         # comment
(venv) ★ ~ $ 

@ssbarnea
Copy link
Author

I know, I am already using ruamel for a good number of years with https://github.com/ansible/ansible-lint

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants