Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create .changes files for reproducible builds #3062

Closed

Conversation

JanZerebecki
Copy link
Member

@JanZerebecki JanZerebecki commented Feb 20, 2024

release rpm files currently do not have .changes files, but these are needed to extract SOURCE_DATE_EPOCH for reproducible builds.

Use the time stamp of the last commit to the 000package-groups
package as the date for the changes entry.

@codecov-commenter
Copy link

codecov-commenter commented Feb 20, 2024

Codecov Report

Attention: 8 lines in your changes are missing coverage. Please review.

Comparison is base (9919e02) 28.31% compared to head (d3dbe51) 28.30%.
Report is 3 commits behind head on master.

❗ Current head d3dbe51 differs from pull request most recent head 7c68a33. Consider uploading reports for the commit 7c68a33 to get more accurate results

Files Patch % Lines
pkglistgen/tool.py 0.00% 8 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3062      +/-   ##
==========================================
- Coverage   28.31%   28.30%   -0.02%     
==========================================
  Files          86       86              
  Lines       14799    14807       +8     
==========================================
  Hits         4191     4191              
- Misses      10608    10616       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@Vogtinator Vogtinator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this only needed for 000release-packages? If so, this probably fits better into

self.commit_package(release_dir)
, where it should use the current datetime and only write it if there was actually a change to commit.

date = datetime.strptime(product_version, "%Y%m%d")
except ValueError:
logging.warning(
"Warning: This should only happen in staging. Can not parse product version %s as date, "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also used for ALP, BCI, Leap, SLE, ... which do not use a date as version.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a problem...

@JanZerebecki
Copy link
Member Author

Like this PR is now, AFAIK the changes file gets copied when running PRODUCT_SERVICE to each release package, afterwards it is only used in 000release-packages.

The current date, meaning wall-clock, can not be used as it is not reproducible, meaning know from source instead of depending on when the build was started.

@DimStar77
Copy link
Contributor

DimStar77 commented Feb 21, 2024

In all fairness: reproducibility on a package that changes its version daily sounds like not exactly something we should care for :) (at least on TW - Leap/SLE might be different and have a few days of the same files)

@JanZerebecki
Copy link
Member Author

Right, we can just use the wall-clock of generation time until the move to git when we can use a date from a commit.

@Vogtinator
Copy link
Member

The current date, meaning wall-clock, can not be used as it is not reproducible, meaning know from source instead of depending on when the build was started.

i.e. you want to make the pkglistgen output reproducible and not just the generated package sources?
Not sure whether we actually need that, but it's pretty much already there with

use the current datetime and only write it if there was actually a change to commit.

As that way there will not be any change commited if the generated output does not change.

@JanZerebecki
Copy link
Member Author

Ah it is also easy to get the date of the last commit to 000package-groups from OBS. Like this ok?

@JanZerebecki JanZerebecki force-pushed the sde-create-release-changes branch 2 times, most recently from d3dbe51 to 7c68a33 Compare February 21, 2024 12:11
@@ -717,6 +720,14 @@ def update_and_solve_target(
if not product_version:
# for stagings the product version doesn't matter (I hope)
product_version = '1'
with open(product_dir + "/release.changes", 'w', encoding="utf-8") as f:
date = package_source_changed(api.apiurl, project, group).replace(tzinfo=timezone.utc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the changelog date should really state when the change was done and that's datetime.now()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

AFAIK it was not done now, only the build was done now. It may also have been rebuilt resulting in no change if the input was the same. So I think it was done when the source (the group package) was commited (or earlier). I feel it is similar to that we don't change the changelog when we rebuild rpms. I do think we gain no information by storing when it was built.

The idea of reproducible builds is that the build date doesn't matter as the output will be the same no matter at what time the build is done. The advantage of reproducible builds is that to security review you only review the source and not the build output, to verify the build output you only need to bit wise compare it to a rebuild. Embedding $now would prevent that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#3062 (comment): Only if a change was actually performed, add a changes entry which contains datetime.now().

package_source_changed(api.apiurl, project, group) is constant except for manual changes, so the current code will end up using a backdated changelog entry, resulting in mtimes which don't really make sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mh. Instead of just the date any input to the release package changed, a different date that seems useful to me is when any source package in the whole distro last changed. Sadly that is probably too work intensive to compute with OBS-vcs, but it hopefully will be easy when we move to git.

Your suggestion to only change the date when the output changes, but then to $now does not make sense to me, because that is neither of those and non-reproducible (by the definition of https://reproducible-builds.org/ ).

Any date is always sort of backdated due to latency until being observed. Latency from one CPU sending to it being observed, e.g. by an output device like a disk or another CPU core in a multi core machine. Time is not intuitive like you imply. So IMHO the date being backdated, doesn't make it make less sense, even when it is further back by magnitudes.

It seems to me that it being reproducible and knowing when a human with osc commit triggered the change and defined the full outcome of the computation has more value than knowing what clock the CPU perceived when it did the computation.

I don't know of any case where using $now here would actually break something, it just hinders making this build step reproducible, but at least not the next ones.

If you prefer I can change it to always $now and when we move to any distro from obs-vcs to git, we can discuss again.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #3064 for the alternative with $now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your suggestion to only change the date when the output changes, but then to $now does not make sense to me, because that is neither of those and non-reproducible (by the definition of https://reproducible-builds.org/ ).

It is perfectly reproducible as the date is saved. When I do osc vc, the same happens and it's fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There I'm not talkling about the build output of the release rpm, but the source of it, which is the build output of this script.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As pointed out in #3064 (comment) I missed that some of the input is from 000update-repos . I added that. It seems 000product is not an input. Did I miss anything else?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the binary list of the FTP tree is part of the input. Any name that disappears is added to the weakremovers.inc

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah it gets that from the solv modules. sigh. Then there is no easy way around it. Thank you.

@JanZerebecki
Copy link
Member Author

Note for which ever solution is picked: the %changelog in the spec.in files in 000package-groups probably needs to be removed before the following run.

release rpm files currently do not have .changes files, but these are
needed to extract SOURCE_DATE_EPOCH for reproducible builds.

Use the newest time stamp of the last commit of all inputs as the date
for the changes entry. All the inputs are are in the packages
000package-groups or 000update-repos.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants