-
-
Notifications
You must be signed in to change notification settings - Fork 631
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JVM lockfile invalidation headers and verification #14185
JVM lockfile invalidation headers and verification #14185
Conversation
# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
…; deserializes more centrally # Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
…eading toml lockfiles # Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
…e to avoid a circuilar import later # Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
…e to avoid a circuilar import later [ci skip-rust] [ci skip-build-wheels] # Conflicts: # src/python/pants/jvm/goals/coursier.py # src/python/pants/jvm/resolve/coursier_fetch.py
# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
…ckfile-invalidation [ci skip-rust]
# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
# Conflicts: # 3rdparty/jvm/global_scalac_plugins.lock # 3rdparty/jvm/testprojects.lockfile # src/python/pants/backend/codegen/avro/java/avro-tools.default.lockfile.txt # src/python/pants/backend/codegen/avro/java/rules.py # src/python/pants/backend/codegen/protobuf/scala/rules.py # src/python/pants/backend/codegen/protobuf/scala/scalapbc.default.lockfile.txt # src/python/pants/backend/codegen/thrift/scrooge/rules.py # src/python/pants/backend/codegen/thrift/scrooge/scrooge.default.lockfile.txt # src/python/pants/backend/java/dependency_inference/java_parser.lockfile # src/python/pants/backend/java/lint/google_java_format/google_java_format.default.lockfile.txt # src/python/pants/backend/java/lint/google_java_format/rules.py # src/python/pants/backend/scala/dependency_inference/scala_parser.lockfile # src/python/pants/backend/scala/lint/scalafmt/rules.py # src/python/pants/backend/scala/lint/scalafmt/scalafmt.default.lockfile.txt # src/python/pants/backend/scala/subsystems/scalatest.default.lockfile.txt # src/python/pants/backend/scala/test/scalatest.py # src/python/pants/jvm/goals/coursier.py # src/python/pants/jvm/resolve/coursier_fetch.py # src/python/pants/jvm/resolve/jvm_tool.py # src/python/pants/jvm/test/junit.default.lockfile.txt # src/python/pants/jvm/test/junit.py # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
resolved_serialized_lockfile = resolved_lockfile.to_serialized() | ||
metadata = JVMLockfileMetadata.new(requirements) | ||
resolved_serialized_lockfile = metadata.add_header_to_lockfile( | ||
resolved_serialized_lockfile, regenerate_command="./pants generate-lockfiles" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: check if this is correct after Eric's changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be correct once the code gets moved, but presently it isn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm close to putting up this change, so this is fine.
|
||
contents = toml.loads(lockfile_str) | ||
entries = tuple( | ||
CoursierLockfileEntry.from_json_dict(entry) for entry in (contents["entries"]) | ||
) | ||
metadata = JVMLockfileMetadata.from_lockfile(lockfile_bytes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was written before I factored out the TOML-parsing code in #14175, this will fail if the metadata header block is not in place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unsure if we want to add a super-quick deprecation cycle here, but I don't believe we've put the contents of #14175 in a release yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We haven't, no need to handle deprecation.
|
||
if lockfile.metadata and not lockfile.metadata.is_valid_for(requirements): | ||
raise ValueError( | ||
f"Lockfile for {request.options_scope} was generated with different " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error handling code needs to be better (at least as good as Python's :))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For context, the Python tool lockfile error message code is a hundred lines of code with carefully specified error messages. We should probably tackle this as a separate issue, as this PR is already pretty big.
# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
assert ( | ||
Path(rule_runner.build_root, "3rdparty/jvm/default.lock").read_bytes() | ||
== HAMCREST_EXPECTED_LOCKFILE.to_serialized() | ||
compare_lockfiles( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This changes the test -- we're now deserializing the file on disk and comparing to the structure we defined in the test, rather than checking that the structure is byte-for-byte correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much better :)
# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
lockfile_path = jvm.resolves[request.resolve] | ||
|
||
# If the lockfile hasn't changed, don't overwrite it. | ||
existing_lockfile_digest_contents = await Get(DigestContents, PathGlobs([lockfile_path])) | ||
if ( | ||
existing_lockfile_digest_contents | ||
and resolved_lockfile_serialized == existing_lockfile_digest_contents[0].content | ||
and resolved_serialized_lockfile == existing_lockfile_digest_contents[0].content |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like I chose different variable names in #14175. Whoops :)
@@ -449,13 +456,22 @@ async def fetch_with_coursier(request: CoursierFetchRequest) -> FallibleClasspat | |||
# TODO: Loading this per JvmArtifact. | |||
lockfile = await Get(CoursierResolvedLockfile, CoursierResolveKey, request.resolve) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NOTE: The following code is for validating user lockfiles, and is necessarily done per JvmArtifact
, because of how we bubble a resolved classpath up to the consuming code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That'd be good to put in a comment. To clarify, that is because this is the rule for CoursierFetchRequest
?
requirement = ArtifactRequirement.from_jvm_artifact_target(request.component.representative) | ||
|
||
if lockfile.metadata and not lockfile.metadata.is_valid_for([requirement]): | ||
raise ValueError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ValueError
seems to be the exception we've been using in this part of the codebase. Let me know if there's anything more appropriate here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could probably define a new error type, but NBD.
# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
Tagging @Eric-Arellano for familiarity with how we handle lockfiles on the Python side, and @stuhood for signoff on actual JVM functionality |
from pants.util.logging import LogLevel | ||
from pants.util.strutil import pluralize | ||
|
||
logger = logging.getLogger(__name__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: not necessary
resolved_serialized_lockfile = resolved_lockfile.to_serialized() | ||
metadata = JVMLockfileMetadata.new(requirements) | ||
resolved_serialized_lockfile = metadata.add_header_to_lockfile( | ||
resolved_serialized_lockfile, regenerate_command="./pants generate-lockfiles" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm close to putting up this change, so this is fine.
assert ( | ||
Path(rule_runner.build_root, "3rdparty/jvm/default.lock").read_bytes() | ||
== HAMCREST_EXPECTED_LOCKFILE.to_serialized() | ||
compare_lockfiles( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much better :)
|
||
contents = toml.loads(lockfile_str) | ||
entries = tuple( | ||
CoursierLockfileEntry.from_json_dict(entry) for entry in (contents["entries"]) | ||
) | ||
metadata = JVMLockfileMetadata.from_lockfile(lockfile_bytes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We haven't, no need to handle deprecation.
@@ -449,13 +456,22 @@ async def fetch_with_coursier(request: CoursierFetchRequest) -> FallibleClasspat | |||
# TODO: Loading this per JvmArtifact. | |||
lockfile = await Get(CoursierResolvedLockfile, CoursierResolveKey, request.resolve) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That'd be good to put in a comment. To clarify, that is because this is the rule for CoursierFetchRequest
?
requirement = ArtifactRequirement.from_jvm_artifact_target(request.component.representative) | ||
|
||
if lockfile.metadata and not lockfile.metadata.is_valid_for([requirement]): | ||
raise ValueError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could probably define a new error type, but NBD.
lockfiles=(tool.resolved_lockfile(),), | ||
lockfiles=(lockfile,), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh huh, this was violating the engine by using open()
inside lockfile_content
lockfile = request.lockfile | ||
requirements = await Get( | ||
ArtifactRequirements, | ||
GatherJvmCoordinatesRequest( | ||
request.artifact_inputs, f"[{request.options_scope}].artifacts" | ||
), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to reason about if this should be requesting JvmLockfileRequest
or not, and then looking at the request.artifacts
. It seems wrong for us to be computing the two requirements in distinct ways. But I'm having a hard time following everything
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I had some difficulty here. Let's discuss what a refactor would look like elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, not a blocker, but this is odd. I think that I might have caused some of the oddity though, because global scalac_plugins
have a lockfile, but don't have an associated subsystem. And so ValidatedJvmToolLockfileRequest
cannot take a subsystem as an argument, because not all requests have one.
Would it make sense to have this PR only add the metadata to the lockfiles, but not yet do anything with them? I'm having a hard time reasoning about how the validation happens, and it might change once I land very soon unifying I agree with your intuition that we would likely benefit from refactoring the JVM lockfile code. So I'm thinking it might make sense to do that before changing any consumption code. Regardless of what we do there, the writing of metadata looks great and seems ready to land. Wdyt? |
I'm OK with adding the invalidation headers in a separate PR, but I'm hesitant to defer the remainder of the PR for long. |
…ckfile-invalidation # Conflicts: # src/python/pants/jvm/goals/lockfile.py
def _header_dict(self) -> dict[Any, Any]: | ||
"""Produce a dictionary to be serialized into the lockfile header. | ||
|
||
Subclasses should call `super` and update the resulting dictionary. | ||
""" | ||
|
||
d = super()._header_dict() | ||
return d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why override this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have to override it in subclasses, and you need to keep super
in place. This override here is defensive work to make sure that people who eventually add things into _header_dict
in this class don't do it the wrong way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, I find that it is preferable in OO to avoid APIs that require overriding concrete methods, and to stick to completely abstract (or empty) methods or fields. But not a blocker.
Following discussion with @Eric-Arellano, we're probably good to merge this as-is. Can you review please, @stuhood? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed over DM, there are some known issues:
- Don't use
tool.resolved_lockfiles()
- Figure out how to get this working with scalac_plugins.py
- Figure out how the consumption rule should be computing the
ArtifactRequirements
But this gets us closer to where we want, and it's fine to fast-follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
It would be good to update the title to indicate that this is only for tool lockfiles (afaict).
class LockfileScope(Enum): | ||
JVM = "jvm" | ||
PYTHON = "python" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Eric-Arellano : Not a blocker for this change, but this seems like something that should be computed from one of the @union
s involved here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe which metadata class to create when generating a lockfile, but nothing else.
lockfile = request.lockfile | ||
requirements = await Get( | ||
ArtifactRequirements, | ||
GatherJvmCoordinatesRequest( | ||
request.artifact_inputs, f"[{request.options_scope}].artifacts" | ||
), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, not a blocker, but this is odd. I think that I might have caused some of the oddity though, because global scalac_plugins
have a lockfile, but don't have an associated subsystem. And so ValidatedJvmToolLockfileRequest
cannot take a subsystem as an argument, because not all requests have one.
def _header_dict(self) -> dict[Any, Any]: | ||
"""Produce a dictionary to be serialized into the lockfile header. | ||
|
||
Subclasses should call `super` and update the resulting dictionary. | ||
""" | ||
|
||
d = super()._header_dict() | ||
return d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, I find that it is preferable in OO to avoid APIs that require overriding concrete methods, and to stick to completely abstract (or empty) methods or fields. But not a blocker.
@stuhood This has support for user lockfiles now, too. |
So it does! I think that that means that |
…thRequest` (#14241) To consume a tool lockfile, we before would: 1. Read the lockfile, either by: 1. Using `ValidatedJvmToolLockfileRequest`, which only works if it's a `JvmToolBase` 2. Using `GenerateJvmLockfile -> CoursierResolvedLockfile` rule, which we used with `scalac_plugins.py` 2. Pass the lockfile to `ToolClasspathRequest -> ToolClasspath` The rule in 1.2 was not using the validation added in #14185, and it was generally confusing to have two code paths. This unifies everything by having `ToolClasspathRequest -> ToolClasspath` now handle reading, validating, and then fetching the lockfile. Reading and validating a lockfile requires the same metadata we need for generating a lockfile with `generate-lockfiles` via `GenerateJvmLockfile`. So, we reuse that type for the sake of DRY. [ci skip-rust]
This adds support for JVM lockfile invalidation headers. Most of the infrastructure is shared with the equivalent Python support code.
Closes #13373
Notes for reviewers:
The lockfile format now adds a header of the same form as our Python lockfiles (which have been thoroughly bikeshedded at this stage). Once you've seen one lockfile with contents in the
valid_for_requirements
field, you've seen them all, and feel free to skip.There's no new tests, but both the tool and user lockfile invalidation code seems to be hit pretty thoroughly by all the existing tests (note the numerous failures in the commit history :)). If there's any specific behaviour in
lockfile_metadata.py
that you feel needs testing, please note it in the review.