Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rustdoc-json: Guard agains 2 commits making the same modification to FORMAT_VERSION #94591

Open
aDotInTheVoid opened this issue Mar 4, 2022 · 15 comments
Labels
A-rustdoc-json Area: Rustdoc JSON backend T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.

Comments

@aDotInTheVoid
Copy link
Member

If merge A and merge B both try to increase the version from 10 to 11, git will helpfully merge both of them, resulting in the version 11 having both changes. If a nightly is released in the time between A and B are merged, then both will claim to have the save version, but with different features.

This almost happened with #94009 and #94150, but was prevented by manual noticing and intervension, which is not ideal.

We should figure out a way such that if 2 commits both attempt to merge the same increments, git will raise this as a merge conflict, so this situation wont occor again.

A way to do this would be to store in a comment the logical "latest feature"

Eg: PR Foo

-// Latest Feature: Initial release
-const FORMAT_VERSION = 1;
+// Latest Feature: Foo
+const FORMAT_VERSION = 2;

PR Bar:

-// Latest Feature: Initial release
-const FORMAT_VERSION = 1;
+// Latest Feature: Bar
+const FORMAT_VERSION = 2;

If Foo is merged, when Bar is attempted to be merged, the following conflict will occor

<<<<<<< HEAD
// Latest Feature: Foo
=======
// Latest Feature: Bar
>>>>>>> bar
const FORMAT_VERSION = 2;

This could work, but will have a problem if two features are given the same name, and then we're back to where we were. If someone knows some git-trickery to get around that, that would be great, but if not, I still thing doing this (or something to this effect) would be worthwhile.

cc @Dylan-DPC @CraftSpider

@rustbot modify labels: +T-rustdoc +A-rustdoc-json

@rustbot rustbot added A-rustdoc-json Area: Rustdoc JSON backend T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. labels Mar 4, 2022
@Dylan-DPC
Copy link
Member

cc @rust-lang/rustdoc

@GuillaumeGomez
Copy link
Member

I'm not sure what the best solution would be here... Adding a tidy check on the version number wouldn't work in this case unless you rebased your PR. Maybe adding this comment would be better, not sure...

@est31
Copy link
Member

est31 commented Mar 4, 2022

You could make a const FORMAT_CHANGELOG = ["Initial version", "Done foo", "Done bar", ...];, and then require PRs that change the version to add to that list. Then you could have const FORMAT_VERSION = FORMAT_CHANGELOG.len();. Or you could alternatively do it with PR numbers, but those are not known upon filing.

@camelid
Copy link
Member

camelid commented Mar 4, 2022

Do we definitely need a format version?

@aDotInTheVoid
Copy link
Member Author

Anacdotaly it is used to detect when the json moves, which does happen quite a bit:

https://github.com/awslabs/smithy-rs/blob/d823f61156577ab42590709627906d1dc35a5f49/tools/api-linter/src/cargo.rs#L78-L96

Iff and when where talking about stabilization, we should reconsider, but I think it works well for now

@aDotInTheVoid
Copy link
Member Author

aDotInTheVoid commented Mar 5, 2022

cc @hkmatsumoto @jdisanti @Urgau @maan2003 @akiross @Enselic: Do you use FORMAT_VERSION, and if not why? Would you like a fuller changelog, just a number, or nothing at all?

@notriddle
Copy link
Contributor

This could work, but will have a problem if two features are given the same name

Instead of names, pull request URLs?

@Urgau
Copy link
Member

Urgau commented Mar 5, 2022

Do you use FORMAT_VERSION, and if not why?

No, not yet; I just didn't found the time and the right way to get it but the solution from smithy-rs is quite simple so I've use it.

Would you like a fuller changelog, just a number, or nothing at all?

Yes, a changelog would be useful but I don't think it's not required. The format number is good no need for more.
I would just like to have a recommended way to get and check it. The smithy-rs solution could maybe be integrated in the crate?

#[derive(Deserialize)]
struct CrateFormatVersion {
    format_version: u32,
}

@camelid
Copy link
Member

camelid commented Mar 5, 2022

Instead of names, pull request URLs?

I think the issue with that is that you can't determine it before opening a PR, but we could have people just update that after opening their PR maybe?

@Enselic
Copy link
Member

Enselic commented Mar 6, 2022

I currently do not make direct use of FORMAT_VERSION, but I think there is value in keeping it anyway.

There are several reasons I don't make direct use of it:

  • It does not mean anything to regular users of my tools. If I say "you need FORMAT_VERSION 12" no one knows what that means. Whereas if I say "you need at least nightly-2022-03-04" people know what that means.
  • In the past it JSON was sometimes changed without a bump in FORMAT_VERSION, which made nightly version the most accurate identifier anyway
  • Sometimes it would be unnecessarily strict to enforce identical FORMAT_VERSIONs. For example, FORMAT_VERSION = 12 output can be read by code that assume FORMAT_VERSION = 11, because 12 only added fields/properties to existing structs. So requiring FORMAT_VERSION to be the same would be unnecessarily limiting.

That said, I think there is value in keeping FORMAT_VERSION, because it:

  • Signals that the JSON representation is subject to change
  • Gives a quick indication of what kind of JSON you are dealing with if you are encountering some JSON without additional context

The risk of two features having the same logical name is low, but can be eliminated by scoping with username:

// Latest Feature: Foo by Enselic

I'm not a big fan of having to force push directly after creating a PR to update with the PR number.

I think the single number FORMAT_VERSION suffices for now. No need for a changelog for example. But I agree that when it is time to prepare rustdoc JSON for stabilization, there is a clear need for something more formal. Maybe semver, although it is not clear to me we could ever change MAJOR in such a version scheme when rustdoc JSON reaches stable.

@akiross
Copy link

akiross commented Mar 7, 2022

Regarding my usage: no, I don't use FORMAT_VERSION yet, simply because I tailored my tool on a specific version and I did not care at the time of writing, but if I had to make the tool public (which is planned) I'd definitely depend on it.

Changelog is probably unnecessary for my purposes, but I agree on the fact that something more structured than a single number could be more useful on the long run, and semver could make sense... Although this does not (deterministically) solve the problem of conflicting increments in the number if it remains confined in the code.

Regarding the problem itself, I don't see a solid solution which doesn't use git-related information (e.g. the hash of the commit that last changed a particular file or that changed the FORMAT_VERSION value). Automation could be put in place here, but it's on a different level, not on the code (e.g. a hook that prevents a merge if the FORMAT_VERSION was changed in two different commits with the same value).

On a higher level, if this kind of feature is needed at all... Typically if a software library produces different outputs given the same input, its semver should change... So, in principle, if that happened to rustdoc's (json) output, wouldn't be this a change in rustdoc's version? OTOH, having the version number in the output json moves the issue of parsing the json to the consumer level, which could support multiple versions, which might be desirable for various reasons. It seems to me that a "clean solution" would be to change rustdoc's semver if the json output changes, but also include that same version in the produced json file.

... But I'm sure this solution would conflict with many other things I'm not aware of, so just take it as my humble and ignorant perspective.

@jdisanti
Copy link

jdisanti commented Mar 7, 2022

Do you use FORMAT_VERSION, and if not why? Would you like a fuller changelog, just a number, or nothing at all?

I use FORMAT_VERSION to be notified of changes to the JSON format so that I can make sure my tool is working correctly with those changes. If you wanted to use a list of feature names, that works for me too since I could just compare the last known feature name with the end of that list.

Thinking about it more though, I don't absolutely need a format version or change log since I have a comprehensive test suite that should detect any breaking changes to the format that actually impact my tool.

@aDotInTheVoid
Copy link
Member Author

This is at risk of happening again with #99688 and #99787 for the 16->17 change

@aDotInTheVoid
Copy link
Member Author

This is at risk of happening with #127289 and #127276 for the 30->31 change. Both of these PR's are mine and were spawned from the same zulip thread, which is why there's overlapping changes to such a low-traffic area of the codebase

@its-the-shrimp
Copy link
Contributor

One solution to this could be releasing breaking changes to rustdoc-json-types only alongside a new compiler version itself, i.e. with the new 6-week cycle, if there is any change to be made, but such an approach would require stabilising the existence of JSON as an output format of rustdoc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-rustdoc-json Area: Rustdoc JSON backend T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests