Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can comments appear before commas in in-line arrays? #766

Closed
Validark opened this issue Aug 14, 2020 · 12 comments
Closed

Can comments appear before commas in in-line arrays? #766

Validark opened this issue Aug 14, 2020 · 12 comments

Comments

@Validark
Copy link

Arrays can span multiple lines. A terminating comma (also called trailing comma) is ok after the last value of the array. There can be an arbitrary number of newlines and comments before a value and before the closing bracket.

My question is this: Can comments appear before commas in in-line arrays?

integers3 = [
  1,
  2 # is this a valid toml document?
  , 3
]

According to the ABNF file, the above is not a valid toml document.

However, tomlplusplus parses the above document just fine: https://godbolt.org/z/f6MG38

@marzer
Copy link
Contributor

marzer commented Aug 14, 2020

This is a good question. Given that I've recently verified toml++'s compliance against the two canonical TOML compliance test suites (BurntSushi's and iarna's), this implicitly means that any other parser passing those might still have a discrepancy here as well.

My hope is that this part of the spec is just a case of unclear wording and this should actually be valid, as it would be in just about any other language with single-line comments. As far as the parser is concerned it's much simpler to allow this than to prohibit it. It's also not unheard-of for people to use leading commas instead of trailing commas when splitting things over multiple lines, as odd as it looks.

@eksortso
Copy link
Contributor

eksortso commented Aug 14, 2020

I'm inclined to agree with @marzer here. SQL coders tend to use preceding commas in select clauses, and I've written enough SQL to understand why.

The ABNF would be easy to change. Replace ws with ws-comment-newline on two lines, then shorten up the lines with further refinement. The language of the spec is oddly specific: comments are allowed before values, and before closing brackets. That ought to be changed so that comments (with newlines) are allowed between values, commas, and brackets

@Validark
Copy link
Author

I agree, every major language I'm aware of allows this. The spec likely just forgot to mention this case. Perhaps it should instead say something like:

"Arrays can span multiple lines. A terminating comma (also called trailing comma) is permitted after the last value of the array. Any number of newlines and comments may precede values, commas, and the closing bracket."

@ChristianSi
Copy link
Contributor

As it stands, the spec (both written as ABNF) is unambiguous: newlines and comments may appear before values and before the closing bracket, BUT NOT before commas.

This also means that this (odd, but occasionally used) style is forbidden even in the absence of comments:

# THIS IS (as of now) NOT A VALID TOML DOC!
values = [
    1
  , 2
  , 3
]

I suppose this feature was modeled after TOML's key/value pairs – values must start on the line where the corresponding key ends, with no line breaks or comments allowed before or after the equals sign. Array elements and the commas that follow them were seen as belonging together in the same way. It makes certainly sense for key/value pairs, but for array elements?? Personally, I'm inclined to follow the other commenters here and plead for the relaxation of this restriction. As long as a comma appears, it should not matter whether it's preceded or followed by line breaks and comments.

This means I'm in favor of changing the ABNF as suggested by @eksortso and the written spec as suggested by @Validark.

One thing to consider, though: this is a nontrivial change, since it will force all compliant parsers to modify their behavior (allowing things that were formerly forbidden). RC 2 was recently released and I had hoped it could be promoted to 1.0 Final without further changes. But this change means that an RC 3 is called for before we can hope for Final. Well, I guess it can't be helped...

@Validark
Copy link
Author

One thing to consider, though: this is a nontrivial change, since it will force all compliant parsers to modify their behavior (allowing things that were formerly forbidden).

I believe most spec-compliant parsers are likely to already allow this behavior. The spec NEEDS to be updated one way or the other. If newlines should not be allowed before commas in TOML documents, that would need to be added to the spec so implementations actually know they need to ban this. As it sits, I am guessing people like @marzer just assumed the case being discussed here is fine, as it should be.

ToruNiina added a commit to ToruNiina/toml11 that referenced this issue Aug 16, 2020
replace ws by ws_comment_newline, as suggested.
discussed here: toml-lang/toml/issues/766
@ToruNiina
Copy link
Contributor

I'm writing this comment because the commit is linked to here. It is not merged yet but seems to work fine.

Actually, the current version of toml11, another C++ implementation, does not allow this. It only allows whitespaces, as described in the ABNF. There could be more implementations that are aware of this case. As @ChristianSi pointed out, the current spec does not allow comments before a comma explicitly and the toml.abnf clearly inhibit this. So currently there is no reason to allow it. Although the toml.abnf says it is not authoritative yet, it is so unambiguous, any implementation that refers to it could check this case.

I'm rather positive to this change because it looks legal. But there could be some implementations that does not allow this currently.

@marzer
Copy link
Contributor

marzer commented Aug 16, 2020

As it sits, I am guessing people like @marzer just assumed the case being discussed here is fine, as it should be.

Yup, guilty as charged. I'm very surprised this wasn't legal.

eksortso added a commit to eksortso/toml that referenced this issue Aug 16, 2020
Per toml-lang#766 this change updates the specification to allow comments with newlines before commas in arrays.
@eksortso
Copy link
Contributor

PR #767 is introduced to update the specification. I don't know if it's worth adding examples to toml.md to make this clear, but my intuition say it's not.

@eksortso
Copy link
Contributor

I agree, every major language I'm aware of allows this. The spec likely just forgot to mention this case. Perhaps it should instead say something like:

"Arrays can span multiple lines. A terminating comma (also called trailing comma) is permitted after the last value of the array. Any number of newlines and comments may precede values, commas, and the closing bracket."

Hope you don't mind, @Validark, but I used your exact text in the PR. It was too perfect to rewrite.

@Validark
Copy link
Author

@eksortso Feel free!

@pradyunsg
Copy link
Member

Whee! Nice catch @Validark! ^>^

I think it definitely makes sense to allow this as valid TOML.

@eksortso
Copy link
Contributor

eksortso commented Sep 3, 2020

Since #767 is merged, we can now say yes to the original question. @Validark, if you're happy with the update, cold you close this issue?

@Validark Validark closed this as completed Sep 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants