Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Group transactions: fail on missing non-optional packages #1848

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

AdamWill
Copy link
Contributor

@AdamWill AdamWill commented Sep 2, 2022

If any default or mandatory package listed in a group does not
exist, raise dnf.exceptions.MarkingErrors.

This is a small change with a big impact and a lot of history.
In the early days, dnf treated both 'package doesn't exist'
and 'package exists but is not installable' as fatal errors.

Shortly after Fedora and RHEL switched from yum to dnf, this was
changed, because historically yum had not behaved this way, and
our existing comps definitions had lots of missing packages;
conditional and arch-specific comps entries also were not
properly handled by all tools, so cleaning up comps was not
possible.

dnf was switched for a while to treat neither as fatal, which
turned out to be too permissive, so eventually we gave it the
same behaviour yum used to have, in #1038.

These days, conditional and arch-specific comps entries work. I
have just cleaned up Fedora's comps file for Rawhide, so there
should be no 'missing' mandatory or default packages in any
group: https://pagure.io/fedora-comps/pull-request/767

I don't know if RHEL's or CentOS's comps have been cleaned up,
but if not, this presents an excellent opportunity to do it.

Links to the history here:
https://bugzilla.redhat.com/show_bug.cgi?id=1292892
https://bugzilla.redhat.com/show_bug.cgi?id=1427365
https://bugzilla.redhat.com/show_bug.cgi?id=1461539
#1038

Signed-off-by: Adam Williamson [email protected]

@pep8speaks
Copy link

pep8speaks commented Sep 2, 2022

Hello @AdamWill! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2022-09-08 15:52:06 UTC

@AdamWill
Copy link
Contributor Author

AdamWill commented Sep 2, 2022

I considered making this optional, but that involves adding a new config option, and I thought heck, why don't we make it a flag day? Distros should fix their comps before including dnf with this change. There's a check-missing script in the fedora comps repo now which makes finding these lines to fix them much easier, and could easily be adapted to use on CentOS / RHEL.

Note I would not recommend using its ability to just rewrite the file: run it in info-only mode and check the lines manually. You may need to add appropriate replacements for packages that have been renamed or superseded, rather than just removing them.

@AdamWill
Copy link
Contributor Author

AdamWill commented Sep 2, 2022

Tagging relevant folks: @nirik @sgallagher @mikem23 @poncovka

Please tag anyone else who would be interested, I'm not sure who are the best CentOS / RHEL folks to bring in.

@AdamWill
Copy link
Contributor Author

AdamWill commented Sep 2, 2022

Ugh, forgot to update the tests. After lunch!

@AdamWill AdamWill force-pushed the missing-strict branch 4 times, most recently from 77f4690 to 9b59a13 Compare September 2, 2022 23:44
@AdamWill
Copy link
Contributor Author

AdamWill commented Sep 3, 2022

Ugh, looks like I need to find where these integration tests live. I'll do that another day.

@m-blaha
Copy link
Member

m-blaha commented Sep 5, 2022

You can find integration tests in this repo: https://github.com/rpm-software-management/ci-dnf-stack
Currently this patch causes these scenarios to fail:

@poncovka
Copy link

poncovka commented Sep 5, 2022

Looks good to me. It means that base.resolve() will be able to raise also dnf.exceptions.MarkingErrors, right? Anaconda will need to add an exception handler for that, but that should be easy.

@rvykydal, we might want to test this change with our kickstart tests to see if there is any impact on the installation.

@AdamWill
Copy link
Contributor Author

AdamWill commented Sep 6, 2022

Looks good to me. It means that base.resolve() will be able to raise also dnf.exceptions.MarkingErrors, right? Anaconda will need to add an exception handler for that, but that should be easy.

Yeah, that's correct. You'd need to handle it in dnf_manager.py resolve_selection(), like you handle DepsolveError currently. You do already have a _handle_marking_errors used by several other things in there, so it should be pretty simple, and that would respect the existing config settings that exist for ignoring missing packages.

But, heh, that leads me to realize I should change the PR a bit, you can probably see why. Revision incoming...

@AdamWill
Copy link
Contributor Author

AdamWill commented Sep 6, 2022

hmm, actually, there's still an issue there :\ _finalize_comps_trans is the first thing resolve() does, so if we raise an exception there, we won't do the rest of resolve(), and the transaction won't be in a fit state for anaconda to continue doing stuff with it.

If we (on the DNF side) want to allow for things to catch the exception but decide to go ahead with the operation, we'd have to do this...differently...somehow. Not sure what might be the best design. Or we could just say we don't allow that, and tell anaconda to never use ignore_missing_packages=True for _handle_marking_errors in this case.

@poncovka
Copy link

poncovka commented Sep 6, 2022

Shouldn't the error be about a broken group instead of a missing package in this case? Should the installer allow to use a group with missing non-optional packages, or should it block the installation? If it is a blocking issue, we need to report it as a broken spec, instead of a missing spec. Then we don't have to care about ignore_missing_packages either.

@AdamWill
Copy link
Contributor Author

AdamWill commented Sep 6, 2022

Well, I guess I looked for the error that's most similar to the existing warning, which is why I went with a 'missing package'-type error.

We could make it any exception you like, really, I guess I'd defer to the dnf folks on what it 'should' be. As I see it, on the anaconda side, we could easily just not allow ignore_missing_packages for this case, like we already do for enable_modules and disable_modules - you could just have resolve_selection call _handle_marking_errors without passing in self._ignore_missing_packages, so it'd always be False and this'd always be a fatal error.

"Should the installer allow to use a group with missing non-optional packages, or should it block the installation?"

Well, my intent in pushing this PR is to make it block the installation by default, because I definitely want that behaviour for 'installations' as parts of image composes and so on. But for an actual end-user deployment installation, I could see that maybe someone would want to be able to configure it to be non-fatal. Still, I wouldn't cry too much if we just didn't allow that.

@AdamWill
Copy link
Contributor Author

AdamWill commented Sep 6, 2022

ah, hmm. the integration tests point up an interesting problem: this patch makes it fatal if you exclude a package from a group install. e.g. with this PR applied:

[adamw@toolbox dnf (missing-strict)]$ sudo dnf groupinstall -x libreoffice-calc deepin-desktop-office
Last metadata expiration check: 1:55:52 ago on Tue Sep  6 14:12:12 2022.
Error: Problems in request:
missing packages: libreoffice-calc

That's surprising, and I guess I was surprised because that implies the current code prints a warning about a 'missing package' if you exclude a package from a group install, which already seems like a bug.

Unfortunately I don't immediately see an easy way to handle this. Does anyone else?

edit: well, hmm. I suppose you could say this is just a case of "don't do that" - we could say it's fine to disallow excluding mandatory/default packages from groups. But that feels a bit awkward semantically, when the definition of 'default' is supposed to be that you can remove that package and the group is still considered to be 'installed'. So with that we'd sort of be saying "you can install the group and then remove the package and the group is still installed, but you can't exclude the package at group install time". Which feels weird. :(

It would feel less weird if the change did not apply to 'default' packages, but only mandatory and conditional. But I kinda want the change to apply to default packages, honestly. For the case I'm mainly concerned with - building our media - it seems desirable. For instance, firewalld is only default in the @core group, but if firewalld somehow got retired or renamed and we didn't notice, I would kinda want composes to blow up so we did notice and take appropriate steps. There are plenty of other similar cases. Hrrmmm.

@AdamWill
Copy link
Contributor Author

AdamWill commented Sep 7, 2022

Thinking about some of the test issues and some of the feedback I've got on this, I'm kinda coming round to the idea that maybe this shouldn't be fatal for interactive CLI use. What I really want it to be fatal for is image builds. So I'm gonna take some time tomorrow and see if I can rework it in such a way that CLI usage will be as it is now, but we can have anaconda and any other tool that does group installs as part of image builds get the info that groups were missing packages, which would allow them to treat that as a fatal error.

I don't think this necessarily needs to be a configuration setting at DNF level, that doesn't feel like the right fix.

@AdamWill
Copy link
Contributor Author

AdamWill commented Sep 7, 2022

Well, I couldn't sleep on it, so I did a very rough version of a possible approach. This - letting callers pass in a mutable object for the function to modify as a way of signalling information - isn't really a typical way to do things in Python, but it doesn't seem realistic to change the return value of resolve().

Another option I guess would be to raise an exception at the end of resolve() in the case there are missing group packages, and let the CLI catch it and handle it, but that would still change existing behaviour for all callers of resolve(). This way is I guess a bit less Pythonic, but if I thought it through right, it shouldn't change current behaviour at all, but allow us to adjust callers of resolve() the way we want.

I didn't test this at all yet or revise the tests, so there may be typos or something. I'll come back to it in the morning.

@j-mracek
Copy link
Contributor

j-mracek commented Sep 7, 2022

Thank you very much for the change proposal. with the proposal there are several questions that must be answered.

  1. Is it a really good time and place to make the change in DNF?
    We are going to replace DNF by DNF5. It is more suitable to have the new behavior in a different product rather then to modify behavior of the present one. DNF5 will be released to Fedora 38 and in Fedora 39 it should replace DNF.
  2. What about splitting the strict option into two - the first one will allow skipping absent packages, the second one uninstallable packages?
  3. Shall we handle the optional group differently in comparison to other comps types?
  4. Does the change requires Fedora system wide change? Because it will affect also anaconda it would be wise to consider it.

@AdamWill
Copy link
Contributor Author

AdamWill commented Sep 7, 2022

Thank you very much for the change proposal. with the proposal there are several questions that must be answered.

1. Is it a really good time and place to make the change in DNF?
   We are going to replace DNF by DNF5. It is more suitable to have the new behavior in a different product rather then to modify behavior of the present one. DNF5 will be released to Fedora 38 and in Fedora 39 it should replace DNF.

2. What about splitting the `strict` option into two - the first one will allow skipping absent packages, the second one uninstallable packages?

3. Shall we handle the optional group differently in comparison to other comps types?

4. Does the change requires Fedora system wide change? Because it will affect also anaconda it would be wise to consider it.

1 - yeah, that's worth considering. At the least we should look at how dnf5 behaves here and make sure it's consistent. I guess I'm a little sceptical about the F39 timeframe for dnf5 given all the stuff that will have to be changed, and I'd kinda like to have this change for F38, but it's not crucial. We certainly could just kinda lump this in with the dnf5 switchover instead.

2 - That was actually the first thing I considered and starting coding, but I just didn't really love it. The more I work on / think about it, the less this really feels like a config option. The latest idea (an optional arg to resolve() to set the desired behaviour) just feels more appropriate to me. What do you think?

3 - This makes sense to me just because of the meaning of 'optional'. Just thinking about the word, it's semantically weird to ever treat an optional thing being missing as a fatal error, right? And that certainly is how it feels to me in the specific case of comps. When fixing up Fedora's comps entries, I didn't actually bother fixing missing 'optional' entries for the same reason - it just feels to me like something 'optional' not being found is not really a problem. If others disagree we could change this, of course, it's technically easy, and I could go back and fix Fedora's comps again also.

4 - I think it depends to an extent on the implementation. The original implementation might possibly have needed one. The current one, I don't think so unless I wrote it wrong, since nothing should change for existing code with the current approach; you have to actively change consuming code to pass the new arg to resolve() to see any behaviour change.

@j-mracek
Copy link
Contributor

j-mracek commented Sep 7, 2022

If we will handle each group by a different set of rules then it will be tricky to provide general configuration that will modify the behavior for particular set of group packages.

First of all - we have two type of users - a) using commandline b) using APi and both must be able to achieve what they need.

If we will split strict option (in DNF5) to two configuration options ten they can be also applicable for package install and group install, but from API they can be overridden for particular job request. Why option can be handy? Skipping unavailable packages for package install command was the behavior of YUM-3 in RHEL7. Personally I don't like it dnf behaves more strictly. When people did not like it we recommend to use strict=False, but it also skips broken packages. The splitting make only sense when behavior for all comps group will e identical.

When some group package types will behave differently to each other then the configuration must be valid only for particular type of group packages therefore it will require to pass a pair (skip not available packages for mandatory packages).

Why it must be somehow configurable - users run into a problem and they must have an option how to easily resolve the issue. It is why dnf have possibility to use strict=False, best=False and so on.

@AdamWill
Copy link
Contributor Author

AdamWill commented Sep 7, 2022

I guess my take on it is a question: why would a CLI user ever want this behaviour to be strict?

If they want to not install when a group is missing a package, okay, they can look at the warning that gets printed and choose to cancel the transaction. They don't need dnf to crash or error out.

That's why my current thinking is that the CLI should just continue to behave as it does - i.e. loosely - and we should just make it possible for API users to achieve the 'strict' behaviour, because I think that's the only place it's really useful. If we're only concerned with API users, we don't really need a config option and all the baggage that comes with it.

@AdamWill
Copy link
Contributor Author

AdamWill commented Sep 7, 2022

So a slight twist on the current proposal would be to do this instead:

def resolve(self, allow_erasing=False, missing_raise=False):

and then if missing_raise was set True, raise an exception if any group packages were missing.

That avoids the somewhat unusual use of a dict for transferring information. But it does have a few downsides too: it re-introduces the issue of whether we should fail on 'optional' packages, which the dict approach avoids (because in the dict approach we just give the caller the information and leave it up to the caller what to do with it), and it also means we have to decide where to raise the exception - right after _finalize_comps_trans()? Or right at the end of resolve() so a caller could catch the exception and still decide to proceed?

@j-mracek
Copy link
Contributor

j-mracek commented Sep 8, 2022

Anyway I have a great news. The functionality is nearly implemented in DNF5. DNF5 provides such an information about problems, hints during resolve() of requested operation reports in structure. Then it will be up to user of API to make a decision whether it should continue or not (See get_resolve_log() in https://github.com/rpm-software-management/dnf5/blob/main/include/libdnf/base/transaction.hpp). For groups it is not yet finished, but there is a clear requirements to be able to track missing group packages. The information should contain type of package and group that the package belongs if it is possible.

This modifies `base.resolve()` so a caller may pass in a dict.
If so, the dict will be populated with information on any
'missing' group packages: that is, cases where a group that
was included in the transaction specifies a package name, but
no package by that name could be found in the repositories.

The dict has four keys matching the four libcomps package
'types', and the value for each key is a list of the missing
package names of that type (if any).

The intent of this change is to enable callers to treat
missing group packages as a fatal error (or do anything else
they like with the information), without changing default
behaviour, API, or the behaviour of the dnf CLI.

Links to some of the the history behind this change:
https://bugzilla.redhat.com/show_bug.cgi?id=1292892
https://bugzilla.redhat.com/show_bug.cgi?id=1427365
https://bugzilla.redhat.com/show_bug.cgi?id=1461539
rpm-software-management#1038

Signed-off-by: Adam Williamson <[email protected]>
@AdamWill
Copy link
Contributor Author

AdamWill commented Sep 8, 2022

Anyway I have a great news. The functionality is nearly implemented in DNF5. DNF5 provides such an information about problems, hints during resolve() of requested operation reports in structure. Then it will be up to user of API to make a decision whether it should continue or not (See get_resolve_log() in https://github.com/rpm-software-management/dnf5/blob/main/include/libdnf/base/transaction.hpp). For groups it is not yet finished, but there is a clear requirements to be able to track missing group packages. The information should contain type of package and group that the package belongs if it is possible.

Cool, that sounds like basically exactly how I'd want it to work. I was trying to figure this out but I find it much harder to poke my way around an unfamiliar C++ codebase than an unfamiliar Python one, so hadn't got there yet.

@j-mracek
Copy link
Contributor

j-mracek commented Sep 9, 2022

@AdamWill When group install will be implemented (merged) in DNF5, I will ping you with kind request for tuning the interface and the behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants