Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why from_reader expect that source will live at least the same time as result? #12

Open
Mingun opened this issue Sep 23, 2024 · 4 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@Mingun
Copy link
Contributor

Mingun commented Sep 23, 2024

The method from_reader accepts Read + 'de

serde-yaml-ng/src/de.rs

Lines 87 to 93 in e158e71

pub fn from_reader<R>(rdr: R) -> Self
where
R: io::Read + 'de,
{
let progress = Progress::Read(Box::new(rdr));
Deserializer { progress }
}

which mean that any references in type R (the type of reader) which implements Read must outlive 'de. The practical aspects of this restriction is unclear. When we read from std::io::Read data is always copied from R to the internal buffer, so it cannot be linked to the lifetime of any references in R:
Progress::Read(mut rdr) => {
let mut buffer = Vec::new();
if let Err(io_error) = rdr.read_to_end(&mut buffer) {
return Err(error::new(ErrorImpl::Io(io_error)));
}
Cow::Owned(buffer)
}

Even if not use read_to_end, the methods in Read trait does not allow to link lifetime of output buffer with lifetimes of the reader (the R type). They are two completely unrelated lifetimes.

One practical limitation of such restriction is that this code does not compile (in this case that is not so bad, because it anyway will return deserialization error in runtime, because the data cannot be borrowed from a reader):

#[test]
fn test() {
  let data = {
    let yaml = "test".to_string();
    <&str>::deserialize(serde_yaml_ng::Deserializer::from_reader(yaml.as_bytes())).unwrap()
  };
  assert_eq!(data, "test");
}

Error:

error[E0597]: `yaml` does not live long enough
   --> src\parser\mod.rs:908:66
    |
906 |   let data = {
    |       ---- borrow later stored here
907 |     let yaml = "test".to_string();
    |         ---- binding `yaml` declared here
908 |     <&str>::deserialize(serde_yaml_ng::Deserializer::from_reader(yaml.as_bytes())).unwrap()
    |                                                                  ^^^^ borrowed value does not live long enough
909 |   };
    |   - `yaml` dropped here while still borrowed

@dtolnay, you add this implementation in 4da5934. Could you explain, why you use Read + 'de instead of just Read (which assumes Read + 'static) or some different lifetime Read + 'reader?

My question about this because in other fork, serde_yml @sebastienrousseau changed the reader in a such way, that this test will success while it is failed in serde_yaml_ng:

// serde_yml = "0.0.12"
#[test]
fn yml() {
  let yaml = "test".to_string();
  let data = <&str>::deserialize(serde_yml::Deserializer::from_reader(yaml.as_bytes())).unwrap();
  // Failed with this error, because first 2 bytes somehow was corrupted,
  // but in many real cases can pass. For example, if change the input to
  // (add two spaces):
  // let yaml = "  test".to_string();
  // ---- parser::yml stdout ----
  // thread 'parser::yml' panicked at src\parser\mod.rs:901:3:
  // assertion failed: `(left == right)`
  // 
  // Diff < left / right > :
  // <♀ st
  // >test
  assert_eq!(data, "test");
}

// serde_yaml_ng = "0.10.0"
#[test]
fn yaml_ng() {
  let yaml = "test".to_string();
  // ---- parser::yaml_ng stdout ----
  // thread 'parser::yaml_ng' panicked at src\parser\mod.rs:908:93:
  // called `Result::unwrap()` on an `Err` value: Error("invalid type: string \"test\", expected a borrowed string", line: 1, column: 1)
  let data = <&str>::deserialize(serde_yaml_ng::Deserializer::from_reader(yaml.as_bytes())).unwrap();
  assert_eq!(data, "test");
}

It seems to me that the change is obviously incorrect, because you borrowed from the internal buffer of the serde_yml::Deserializer, but Rust borrowing rules allows to write such code and I does not yet find a way to crash it (which is definitely should be possible). It should be possible if we will try to change yaml via the deserialized data, but in that case need that data have a &mut str type which doesn't implement Deserialize trait. So it seems that we can change the string only using unsafe conversion of its type, which couldn't be considered as a true demonstration of a problem.

Unfortunately, @sebastienrousseau make this change in one single god commit without any explanations why it was changed. (Digression: the borrowing is possible, if repr field of the Scalar would be Some (it also very unpleasant why this filed is not included in debug representation while it has so much influence on results of deserialization)).

The changes in serde_yml that makes the test above work in those lines (filling repr field):
https://github.com/sebastienrousseau/serde_yml/blob/24187306a30cc972780449e843ccc7b6f82b6086/src/libyml/parser.rs#L256-L269
https://github.com/sebastienrousseau/serde_yml/blob/24187306a30cc972780449e843ccc7b6f82b6086/src/libyml/parser.rs#L308-L315
He only changes &Cow<input, str> to &input Cow<input, str> in convert_event function and the borrow checker became happy. I do not understand why test does not crashed, and if this change has some rationaly, is it possible to apply it to this fork? Actually, I dig into this rabbit hole because I want to use

#[derive(Deserialize)]
#[serde(from = "&str")]
struct Path(Vec<String>);
impl<'a> From<&'a str> for Path {
  fn from(path: &'a str) -> Self {
    Self(path.split("::").map(|s| s.to_owned()).collect())
  }
}

This code works with serde_yml and does not work with serde_yaml_ng.

@acatton
Copy link
Owner

acatton commented Sep 23, 2024

This is late, and I've only skimed through this ticket, I need to look into details, however, I wanted to mention this part

@dtolnay, you add this implementation in 4da5934. Could you explain, why you use Read + 'de instead of just Read (which assumes Read + 'static) or some different lifetime Read + 'reader?

David Tolnay is the original author of https://github.com/dtolnay/serde-yaml, which he abandoned, and I forked. Please refer to the "Why?" section of the README.md.

The life of a open source library author and maintainer is hard, and obviously, David didn't want to deal with this charge of work, I don't want us to bother him, the maintenance of this library is my burden, and mine alone.

Let's not mention him, and let's not ping him, let's leave him and the original authors alone, and in peace. Let's figure this out together :) .

Anyway, thanks for your report, and I'll take a look at it once I had some rest.

@Mingun
Copy link
Contributor Author

Mingun commented Sep 23, 2024

David Tolnay is the original author of https://github.com/dtolnay/serde-yaml, which he abandoned, and I forked. Please refer to the "Why?" section of the README.md.

I know that, as you might have guessed, I looked at the history, not to mention what I had previously used serde_yaml.

The life of a open source library author and maintainer is hard, and obviously, David didn't want to deal with this charge of work, I don't want us to bother him, the maintenance of this library is my burden, and mine alone.

I think, that asking questions costs nothing, especially considering what kind of question. I don't think anyone can get into his head and answer for him. Of course, he is free not to answer.

@acatton
Copy link
Owner

acatton commented Oct 9, 2024

I think, that asking questions costs nothing, especially considering what kind of question. I don't think anyone can get into his head and answer for him. Of course, he is free not to answer.

I understand, if you would have told me ~10 years ago I would have agreed with you. However, now that I have a few projects that I have abandoned, I would understand how annoying and noisy it is, when your github notifications or your inbox is full of people mentioning your for help on stuff you don't want to deal with anymore.

So this is why I'm trying to mindful with David :) . But anyway, it was an request, not a demand, you're free to think otherwise 🙂

@acatton
Copy link
Owner

acatton commented Oct 9, 2024

Lets close the side discussion from above, and go back to the original issue of this thread.

I looked at the code, and read your analysis, Thanks you for analyzing this @Mingun . I think you're right. I would be opened to a pull request to remove the life time. I think it was a mistake, and while it might breaking backward compatibility (some people might be explicitly using these lifetimes), I think it's fine for 99% of users.

I will eventually take on this task if you nobody has taken it before me.

@acatton acatton added enhancement New feature or request good first issue Good for newcomers labels Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants