Skip to content
This repository has been archived by the owner on Jul 9, 2023. It is now read-only.

Minimalist YAML parser with support for borrowed string slices #9

Open
dtolnay opened this issue Jan 21, 2019 · 31 comments
Open

Minimalist YAML parser with support for borrowed string slices #9

dtolnay opened this issue Jan 21, 2019 · 31 comments

Comments

@dtolnay
Copy link
Owner

dtolnay commented Jan 21, 2019

In dtolnay/serde-yaml#94 I would like to move serde-yaml off of yaml-rust as a YAML backend and to a much simpler backend with support for a minimum viable subset of YAML syntax.

I think the minimal API for this would be:

pub enum Yaml<'a> {
    // strings, booleans, numbers, nulls, all treated the same
    Scalar(&'a str),

    // flow style like `[x, x, x]`
    // or block style like:
    //     - x
    //     - x
    Sequence(Vec<Yaml<'a>>),

    // flow style like `{x: X, x: X}`
    // or block style like:
    //     x: X
    //     x: X
    Mapping(Vec<Entry<'a>>),
}

pub struct Entry<'a> {
    key: Yaml<'a>,
    value: Yaml<'a>,
}

pub fn parse<'a>(input: &'a str) -> Result<Yaml<'a>>;

The long tail of seldom used YAML features need not be supported. That means no tags, anchors, aliases, countless distinct string types, ...

@ghost
Copy link

ghost commented Jan 22, 2019

In python world there's strictyaml with similar goal, if you want to check what elements are needed

@dtolnay
Copy link
Owner Author

dtolnay commented Jan 24, 2019

Thanks! That looks like a super nicely done library with similar motivations as this project.

@jakeschurch
Copy link

jakeschurch commented Feb 6, 2019

@dtolnay I think I would like to take a crack at this

@dtolnay
Copy link
Owner Author

dtolnay commented Feb 17, 2019

Just checking in -- any progress on this @jakeschurch? Is there anything you are stuck on?

@jakeschurch
Copy link

Thanks for checking in @dtolnay -- so far things are going well - will def. Check in if I have any questions or problems

@dtolnay
Copy link
Owner Author

dtolnay commented Feb 17, 2019

Sounds good. I noticed that you forked serde-yaml and made some big changes to its Value type. I would recommend sticking to a 100% self contained fn parse<'a>(input: &'a str) -> Result<Yaml<'a>> instead -- serde_yaml::Value is a frontend type and shouldn't need to change at all when we later swap backends.

@jakeschurch
Copy link

OK. Sounds like I need to read a little more into the code.

@dtolnay
Copy link
Owner Author

dtolnay commented Feb 18, 2019

Less into the code. 😉

Nothing in serde_yaml is relevant to implementing this function.

@jakeschurch
Copy link

Sorry! Just meant what parts of the code now hook up to yaml-rust so I can have the signatures in mind implementing the back-end

@jakeschurch
Copy link

hey @dtolnay just wanted to let you know that I haven't gotten around to implementing this yet cause I'm super busy with work - whatever you want to do, let me know -- sorry to keep you waiting

@pickfire
Copy link

pickfire commented May 13, 2019

I believe it would be nice to support the additional feature using feature gates. I will be trying this as it is a blocker for me to implement yaml2json.

@pickfire
Copy link

pickfire commented Jun 12, 2019

I added my implementation in https://github.com/pickfire/mini-yaml, very experimental. Currently can parse certain flow scalar, flow/block sequence, flow mapping and block implicit mapping. Still figuring out how to parse implicit flow mapping and maybe later need to check the spec again, maybe later can add serialization and deserialization.

@dtolnay
Copy link
Owner Author

dtolnay commented Jun 12, 2019

Nice, looks like this is on the right track. When you wrap up the initial set of features it would be good to write a tool to programmatically partition the yaml-test-suite based on ones that are supported by your library and ones that are not supported, and check those into your repo as two separate directories. That will make it easy to scan through and tell whether the set of supported cases is sufficient for what someone needs, or if they require some feature that is in the unsupported set.

@pickfire
Copy link

@dtolnay Thanks a lot for sharing that link. By the way, do you think I should implement it in a function way instead of using an iterator like what yaml-rust does? Maybe it might help in streaming serialization/deserialization?

@dtolnay
Copy link
Owner Author

dtolnay commented Jun 14, 2019

I recommend not doing stream style. Stream support would be important for a lossy deserializer, for example one that deserializes 1.00 as an f64 by default. Like if the input document is:

---
n: 1.00

then a non-streaming lossy deserializer interpreting this as yaml::Map(map! { yaml::String("n") => yaml::Number(1.0f64) }) would be problematic because the caller may ultimately need the data as struct Input { n: String } in which the correct deserialization would have n = "1.00".

Since the design in this thread is specced to produce lossless &'a str for all yaml scalars, there shouldn't need to be a stream API.

@pickfire
Copy link

@dtolnay Looks a lot harder than I think, I probably will not be continuing the project.

If anyone is interested can take a look at https://github.com/pickfire/mini-yaml, the implementation is a bit bad but the main issue is support for implicit block mapping and block and flow scalar. I believe supporting strictyaml would be better than yaml since it's too broad for start.

@Stupremee
Copy link

I will try to create a crate based on the StrictYAML syntax.

@dtolnay
Copy link
Owner Author

dtolnay commented Nov 2, 2019

Sounds good. Thanks! The API design in #9 (comment) should work just as well for StrictYAML.

@Stupremee
Copy link

Stupremee commented Nov 2, 2019

To make it simpler to write a parser, I thought about only implementing a super simple and minimalistic version of yaml.

These would be all the possibilities:

key1: 'value'
key2: "value"
key3: value
list:
  - one
  - two
list2: [foo, bar]
object:
  some: "thing"
  foo: bar
object2: {a: one, b: two}

value can also be a number
Would this be okay?

@dtolnay
Copy link
Owner Author

dtolnay commented Nov 2, 2019

Yes that sounds fine. If someone needs more syntax support they can extend your library.

@Stupremee
Copy link

Should the library support serializing and deserializing?

@pickfire
Copy link

pickfire commented Nov 6, 2019

@Stupremee Serializing and deserializing could be added later but it should be designed with that in mind, taking a look at serde_json and serde_cbor would help but the design of yaml-rust is way too different from those.

@dtolnay
Copy link
Owner Author

dtolnay commented Nov 6, 2019

@Stupremee⁠—Personally for this issue I really only care about one function.

fn parse<'a>(input: &'a str) -> Result<Yaml<'a>>

Once that is in a good state, it would be reasonable to add the reverse as well. That would be as simple as impl Display for Yaml<'_>.

@nathanwhit
Copy link

@dtolnay I noticed this has been sitting for a while, so I decided to pick this up. The repository with my work up to this point is here. Support for the bare-bones syntax you described is already implemented, and I do have a basic level of testing (~40 unit tests, but I plan to add more). I haven't published the crate yet, as I feel it needs a bit more polish and more thorough testing first, but tentatively a release doesn't seem too far off. If you have any spare time, feedback on the current state of the crate would be wonderful.

@NilsIrl
Copy link

NilsIrl commented Feb 14, 2020

to a much simpler backend with support for a minimum viable subset of YAML syntax

How would someone that needs these "non-minimalist features" do?

@pickfire
Copy link

@NilsIrl YAML is complicated, last time I tried implementing I was surprised how different YAML libraries got different results for different edge cases. I think StrictYAML may be a better and stricter spec and easier to follow with minimalist features.

@NilsIrl
Copy link

NilsIrl commented Feb 14, 2020

@NilsIrl YAML is complicated, last time I tried implementing I was surprised how different YAML libraries got different results for different edge cases. I think StrictYAML may be a better and stricter spec and easier to follow with minimalist features.

I understand that very well, but if this is supposed to replace https://github.com/chyh1990/yaml-rust for https://github.com/dtolnay/serde-yaml then it can be problematic.

@dtolnay
Copy link
Owner Author

dtolnay commented Jul 5, 2020

@nathanwhit it looks like the feedback I filed on https://github.com/nathanwhit/minimal-yaml were mostly resolved. Any plan on what polish or testing are missing still before it would be released?

@nathanwhit
Copy link

@dtolnay I'm comfortable releasing it in its current state. I wouldn't be too surprised if there are some edge cases being handled improperly, but there shouldn't be any major issues at this point. I've gone ahead and published the crate here.

@sanbox-irl
Copy link

@dtolnay how are we doing on this?

@qm3ster
Copy link

qm3ster commented Oct 29, 2021

I would like to note that StrictYAML actually doesn't allow "flow style" (other than in dirty_load mode, which also allows and interprets "node anchors and refs"), so the following would not be valid:

list2: [foo, bar]
object2: {a: one, b: two}

I wonder if this means that in non-dirty_load mode there are less characters that have to be avoided in strings.
Will I finally be able to write

action_jq: 'if . then "ON" else "OFF" end | {state_l1: .}'

unquoted?

action_jq: if . then "ON" else "OFF" end | {state_l1: .}

Currently, neither the serde_yaml nor VS Code's YAML syntax higlighting manage to survive a non-leading opening { in an unqioted string, despite there not being any possible interpretation.
But fully giving up on flow style would ublock all of []{} anywhere, and that may be a worthwhile tradeoff.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants