-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmake: Add make_cpio
, a script for generating reproducible cpio(5) archives
#67
base: master
Are you sure you want to change the base?
Conversation
2bba379
to
6a8d23e
Compare
I'm not a fan of maintaining our own If you weren't aware, there is already a script in seL4_tools/misc that strips all of the metadata out of a cpio archive: https://github.com/seL4/seL4_tools/blob/master/misc/cpio-strip.c. This used to be used but required the user to independently build and add it to their path before it would be used. We could do this again and use it as a fallback if the --reproducible check fails. The lack of seL4_tools/cmake-tool/helpers/cpio.cmake Line 46 in 2a6e245
If touch -d @0 isn't supported on different versions of touch, then using |
Does this not make our system more robust though? I think pulling complexity out of Cmake into more manageable tools does simplify things to some extent. For one Python is much more accessible than Cmake. Also, we already maintain a lot of python code, I don't see how using it in the build system will change anything. |
@@ -0,0 +1,221 @@ | |||
#!/usr/bin/env python3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This introduces a dependency on libarchive. Is this dependency encoded somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume it's the same as the dependency here: https://github.com/seL4/seL4/blob/master/tools/python-deps/setup.py#L39
I don't agree that it does. The motivation for this change is that a different
It appears that this change is doing more than changing CMake to python. It's also removing the use of the There are a couple of ways that the CMake part can be cleaned up, but while it's purpose is still constructing arguments and calls to other programs within build rules, or templating files, then it's still within the domain that CMake was designed for. |
So I suppose to resolve this it would fair to rewrite the cmake logic in python and call out to the original tool, or just leave this pr for now. |
At 2021-04-13 12:46:41 +10:00, @kent-mcleod wrote:
Given the complexity rolled up in the build system to discover the specific behaviours and foibles to drive cpio(1) appropriately, I'm not sure that I agree that it does what we want. At 2021-04-13 12:46:41 +10:00, @kent-mcleod wrote:
I'd argue that it'd be neater to generate the archive correctly to start with, rather than having to fix it after the fact. At 2021-04-13 12:46:41 +10:00, @kent-mcleod wrote:
If we went down this route, we should first fix the build system to allow compiling and running binary code in (what GNU tools call) the build environment. (AFAICT CMake cannot do this, which strikes me as a serious problem.) At 2021-04-13 14:29:02 +10:00, @oliver-wm wrote:
I wholeheartedly agree with this sentiment. My stance is that build systems should be responsible for orchestration and, otherwise, should be as simple, predictable, and fast as possible. That we have introduced so much complexity into our build system often makes it an exciting adventure to use, debug, and extend. Preferring to push the more complex constructions like this out into, say, Python, makes the build system and process simpler --- and leverages a language that's perhaps more widely understood. At 2021-04-14 11:24:30 +10:00, @kent-mcleod wrote:
No, that wasn't the motivation for the change. Whilst the initial work was trying to handle the specific behaviour, the more general point is that this portion of the build system encodes a non-trivial behaviour --- the sort of behaviour that could be pushed out into a tool that explicitly solves that problem --- and this change took the opportunity to build the appropriate tool for that behaviour. For one thing, I can test the Python (and C, which I would be landing were it not for the build-vs-host issue noted above) implementations of this behaviour. I cannot do the same with this CMake implementation. Similarly, it is much easier to identify failures and their causes, and debug them, given a Python implementation. Having seen a failure partway through during the execution of the computed command, and the resulting (surprisingly fiddly to debug) output and resulting state, I would consider the approach in CMake to be unmaintainable anyway. At 2021-04-14 11:24:30 +10:00, @kent-mcleod wrote:
If the concern is that the code introduced in this PR does not exactly translate the existing CMake rules --- insofar as it is not the invocation of a smorgasbord of external utilities --- I do not see that as a problem, and, indeed, I would consider that result to be problematic in and of itself. For one thing, we lose the guesswork about how to invoke cpio(1) and provide it appropriate flags to produce a desired result --- we instead have a tool that we know will do that. For another, we target the libarchive interface, which has explicit mechanisms for doing what we are attempting to do. For example: we create a temporary directory and copy a file (and subsequently must clean up) for, amongst other things, ensuring it has a specific pathname --- but we could instead just add the file with the desired pathname. Ditto ownership, timestamping, and the other various properties that the cpio(5) newc format stores and for which we want explicit values. To specifically address At 2021-04-14 11:24:30 +10:00, @kent-mcleod wrote:
Given the desired behaviour is producing reproducible cpio(5) archives, this is no more than we already need to know about the format. At 2021-04-14 11:24:30 +10:00, @kent-mcleod wrote:
At what point, then, do we leave the domain that CMake was designed for? As can probably be guessed, I'd argue that our existing mechanism for building these archives does exceed the bounds of what we should have in the build system. (Indeed, I'd argue any case where we have to compute a command-line to be passed to a shell is probably pushing the boundary between what should be in the build system proper and what should instead be pushed out into an external tool --- for example, the adjacent fragment that encapsulates a cpio(5) file in an ELF artifact, though this is a case much closer to the templating scenario described and where I'm also on the fence.) //~j |
6a8d23e
to
ad1e9a2
Compare
Move the process of constructing an archive out into a new Python script `make_cpio.py', and simplify the mechanism by directly using libarchive to generate a file with the desired properties. (I've previously run afoul of GNU cpio(1)'s `--reproducible' flag and the detection machinery thereof; and of subtle and tricky-to-debug build failures that occur partway through the generated command line. I opted to rework the CMake into Python; the result is that the CMake here simplifies to the invocation of a single, fairly simple tool with appropriate arguments, instead of the composition of a fairly involved shell script.) r2: `cmake-format'. r3: rebased, and reworded commit message to more clearly explain the rationale for the change, as this was open to misinterpretation. Signed-off-by: Jashank Jeremy <[email protected]>
ad1e9a2
to
2eb669c
Compare
It is pushing the boundary, but it's still chaining together a sequence of applications of external tools. Replacing these tools with our own implementations means taking on the additional obligation and burden of maintaining these implementations. The attached python script appears to depend on APIs not provided by the library it is using and has a couple other quirks it needs to handle itself which makes it hard for me to see how it's not increasing the total complexity involved to create a cpio archive.
cpio is a pretty common utility and it's command line interface removes a lot of the "guess work" required to create a working CPIO archive compared to knowing how to correctly create one at a lower abstraction level: https://github.com/seL4/seL4_tools/pull/67/files#diff-812e264539c59c5dcb3760f14192c8c009f3813a50afdaed61727bd42a20da2eR123-R133 I appreciate that you took the time to write out an alternative implementation, but I still think it's less work to maintain a few lines of CMake argument passing than a 200 line python CPIO archive creation utility script. Most of this complexity is hidden behind Also, it would be better to put effort towards removing CPIO archives as a dependency altogether. Most of the places that CPIO archives are used in the user level libraries we maintain are for containing ELF files where it would be better to load them into memory at 4k or larger page alignments which isn't possible inside a CPIO archive. |
Recently I've been running in a few issues with missing features and flexibility of |
I ran afoul of the
--reproducible
flag --- a GNU cpio(1) extension, as far as I can tell --- and opted to rework the CMake glue here, moving the "generate-an-archive" logic out into Python.(This PR follows more recent discussion with @oliver-wm about pushing out some intricate CMake constructs as stand-alone scripts.)