Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate non-reproducible build #27

Open
sethp opened this issue Mar 21, 2023 · 6 comments
Open

Investigate non-reproducible build #27

sethp opened this issue Mar 21, 2023 · 6 comments

Comments

@sethp
Copy link
Contributor

sethp commented Mar 21, 2023

What we know: when @dougli1sqrd tried to build b5a9c5c (pre-#19), he observed build failures, e.g.:

failed to resolve on #[interrupt]

and

let qspi = Spi::new_quad_send_only(
   |                     ^^^^^^^^^^^^^^^^^^
   |                     |
   |                     function or associated item not found in `Spi<'_, _>`

Spi::new_quad_send_only is a function we've added in our rustbox/esp-hal fork, and the "failed to resolve" indicates an older version of the #[interrupt] proc macro than I was using, so it seemed like he was building against a stale version of the upstream repository. Moving his Cargo.lock aside, the next cargo build successfully produced an artifact.

Why it is?

@sethp
Copy link
Contributor Author

sethp commented Mar 21, 2023

Ah, a clue:

git checkout b5a9c5cb7e41
cp Cargo.lock.eric Cargo.lock
cargo tree | grep -E 'esp-hal-common|esp32c3-hal'

produced:

warning: Patch `esp-hal-common v0.7.1 (https://github.com/rustbox/esp-hal?rev=716934cd#716934cd)` was not used in the crate graph.
Patch `esp32c3-hal v0.7.0 (https://github.com/rustbox/esp-hal?rev=716934cd#716934cd)` was not used in the crate graph.
Check that the patched package version and available features are compatible
with the dependency requirements. If the patch has a different version from
what is locked in the Cargo.lock file, run `cargo update` to use the new
version. This may also occur with an optional dependency that is not enabled.
├── esp32c3-hal v0.5.0
│   ├── esp-hal-common v0.5.0

and, sure enough, the Cargo.lock pins e.g. esp-hal-common to v0.5.0 upstream:

[[package]]
name = "esp-hal-common"
version = "0.5.0"
source = "registry+https://github.com/rust-lang/crates.io-index"

@sethp
Copy link
Contributor Author

sethp commented Mar 21, 2023

Well, a more targeted solution seems to be cargo update -p esp32c3-hal, which offers:

    Removing esp-hal-common v0.5.0
      Adding esp-hal-common v0.7.1 (https://github.com/rustbox/esp-hal?rev=716934cd#716934cd)
...
    Removing esp32c3-hal v0.5.0
      Adding esp32c3-hal v0.7.0 (https://github.com/rustbox/esp-hal?rev=716934cd#716934cd)

among others

@sethp
Copy link
Contributor Author

sethp commented Mar 21, 2023

Still unknown: where did that v0.5.0 in the lock file come from?

@dougli1sqrd
Copy link
Contributor

Wow love this trail of investigation! Yeah, also curious where v0.5.0 comes from. Did I have that in a Cargo.toml before an update to Cargo.toml, updating that dep?

@sethp
Copy link
Contributor Author

sethp commented Mar 21, 2023

Hmm, I wonder? I can't find a complete story that's represented in the git history: in c75f742 we moved from explicitly providing the esp-hal-common transitive dependency (which confused rust-analyzer):

esp-hal-common = { features = [
"esp32c3",
], git = "https://github.com/rustbox/esp-hal", branch = "vgaterm" }
esp-println = { version = "0.3.1", features = ["esp32c3"] }
esp32c3-hal = { features = [
"direct-boot",
], git = "https://github.com/rustbox/esp-hal", branch = "vgaterm" }

to selecting version "*" (which, more on that in a minute) and using a [patch.crates-io] to override the version:

esp32c3-hal = { features = [
"direct-boot",
], version = "*" }

[patch.crates-io]
esp32c3-hal = { git = "https://github.com/rustbox/esp-hal", branch = "vgaterm" }
esp-hal-common = { git = "https://github.com/rustbox/esp-hal", branch = "vgaterm" }

though that change didn't show up on the keyboard-help branch until 6cb9a63 (Mar 13).

I can't quite form that into a coherent narrative: from the vgaterm branch in our fork is v0.4.0, so it seems you should've jumped from v0.4.0 to v0.7.0 as the esp-hal "major" version.

The time before that we touched Cargo.toml was 8f00fe0 , which was roughly contemporaneous with 0.5.0 being the latest version of the esp-hal stuff, which ran from esp-rs/esp-hal@d03c267 (Jan 26th) to esp-rs/esp-hal@4e88e48 (Feb 10th), but that fact doesn't help much unless we have a way to explain why cargo would use a (really) stale version of the crates-io repository?

I guess now's as good a time as any to compare cargo versions? I'm experimenting with

cargo 1.70.0-nightly (7d3033d2e 2023-03-08)

aforementioned "more on "*""

I haven't been able to confirm this yet (experimental ideas definitely welcome), but I have a suspicion that "*" is giving us some grief. The purpose of "*" there is to say "don't pick a version, defer to the patch'd git source," but maybe it only appears to be doing that and is actually very broken.

I don't think there's any way you would've set the version to v0.5.0 on purpose, because that's also missing the Spi::new_quad_send_only function. But, cargo's default behavior is maximal version selection (see my other comment for the deets), and "*" means "any version," so there's a possible world where this happened:

  1. you checked out the Cargo.toml from 6cb9a63, but since Cargo.lock was still gitignore'd at that time, your local state persisted through the tree update. Meaning the lock file you had was generated from e.g. 8f00fe0 or the moral equivalent, where via cargo generate-lock-file I've confirmed that cargo seems to lock to "v0.4.0" from the vgaterm branch in our git source (as we asked it).
  2. when cargo re-ran dependency resolution after introducing the new Cargo.toml, it, for whatever reason, thought v0.5.0 was the latest upstream version of the esp-hal packages (despite this happening in mid-March)
  3. because we'd asked for whatever version ("*"), and cargo default to maximal, it said "ok, here's your 0.5.0!" and writes that to the lock file
  4. only at this point in the story does it start to consider the [patch.crates-io], wherein it finds a new source for esp-hal, and traversing to that sees a Cargo.toml representing v0.7.0

That would produce the stable state we observed: the lock file says esp32c3-hal = "*" is v0.5.0, and because "esp32c3-hal v0.5.0" is a totally different package to cargo from "esp32c3-hal v0.7.0",1 it says "couldn't apply that patch to any crate I found in the graph, guess it doesn't matter!" and then immediately fail to compile.

I'd really like to validate that story before going too far down the "possible fixes" road; but it's past time to hit "comment" already.

Footnotes

  1. I still find this very confusing, but I think the key in the [patch.crates-io] section is more or less ignored by cargo. So from our perspective we said "please override the hal from the repository with ours", but from cargo's we might as well have written nom = { package="serde", git = "..."}; it'll ignore the nom string completely, and follow the git link to figure out what it should even try to replace.

@sethp
Copy link
Contributor Author

sethp commented Mar 22, 2023

The test procedure I've got is:

  1. Replace Cargo.lock with an older version (I haven't been 100% consistent, but they're all pointing to 0.4.0 sourced from our fork)
  2. Modify Cargo.toml to point to one of the commits here: https://github.com/rustbox/esp-hal/tree/test/patch-version-behind-crates-io-latest
    1. esp-rs/esp-hal@0251645 corresponds to esp32c3-hal v0.6.0 ("older" than what state my cargo is currently resolving for the crates.io repository)
    2. esp-rs/esp-hal@25fef6a corresponds to esp32c3-hal v0.0.0 ("does not exist")
    3. esp-rs/esp-hal@6c13ec1 corresponds to esp32c3-hal v0.8.0 ("newer")
  3. run cargo tree, which implicitly will update the lockfile and tell me what versions it's selected.

(encoded as a script: https://gist.github.com/sethp/cc6a065fe3c73a2efa2872c40ad81489#file-test-sh ). I did turn off the patch for esp-hal-common in Cargo.toml, as the path = "../esp-hal-common" in esp32c3-hal seemed to be enough for it to get picked up.

The good news is that in each of these, "*" solves the problem it was intended to, i.e. it resolves to the patched source version in each of the above scenarios. I can't find another specifier that behaves that way; i.e. any number I pick works just as long as the patched revision is a "major"-version match and bombs out with some "unused" nonsense otherwise.

Trying again with the likely version of cargo you were using:

rustup install nightly-2023-01-12
rustup toolchain link eric $(rustc +nightly-2023-01-12 --print sysroot)
./test.sh +eric tree

Yields the same results. I didn't look too hard, but one difference that might be worth following up on between our versions is that mine is using the sparse registry protocol by default; it was noticeable in my testing just how much longer it took to update the crates.io index.

Some ideas for what to try next:

  1. Re-testing with esp-hal-common set to the same version as esp32c3-hal (above it was all v0.7.1), +/- also adjusting the "version" requirement on esp32c3-hal to see what effect that has vs. path
  2. Regenerating Cargo.lock.saved from Cargo.toml in a different commit, using the earlier version of cargo, to see if that's (meaningfully) different
  3. Possibly, mucking with the registry / git repo state to try and reset it to model the edge transition you followed: it seems probable that you had a very stale crates.io index (circa the 0.5.0 release maybe?) that got updated concurrently with pulling down new objects from our fork as well

I suppose the good news is that 2 is v. easy, and 1 should model the most common flow we expect (i.e. esp-rs/esp-hal releasing a new version into crates.io ahead of ours). So if those results are both negative, I think we can at least say we aren't mis-using cargo in an obvious way? And that it's at least somewhat reasonable to expect this to continue to work?

Also, we'll be in a slightly better position if this happens again now that we've got an auditable history for Cargo.lock: at least we won't have to guess what that piece of the "before" state looked like! For the same reason, we'll be in an even better position yet if we tackle #14. I wonder if there are other easy wins for pinning down the big ball o' mutable state that is cargo?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants