Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow project owners to set status markers ("deprecated", "archived", etc) to their PyPI projects #16844

Open
facutuesca opened this issue Oct 7, 2024 · 8 comments

Comments

@facutuesca
Copy link
Contributor

facutuesca commented Oct 7, 2024

What's the problem this feature will solve?

Currently, PyPI does not have a standard way of indicating that a project has been deprecated, archived, etc by the maintainers.
This information is (sometimes) included in the project's description, but there's no standard way of setting (or getting) it, other than manually checking.

If owners can mark a project as deprecated, archived, etc. on PyPI and this information is exposed through the web UI and the index APIs, downstream consumers can monitor this information to make better decisions about their supply chain.

Describe the solution you'd like

Recently warehouse added support for marking a project as quarantined: #16179.
This involved adding a new field to the Project model:

class LifecycleStatus(enum.StrEnum):
    QuarantineEnter = "quarantine-enter"
    QuarantineExit = "quarantine-exit"

We could model the new project statuses as lifecycle statuses. For example:

class LifecycleStatus(enum.StrEnum):
    QuarantineEnter = "quarantine-enter"
    QuarantineExit = "quarantine-exit"
    Deprecated = "deprecated"
    Archived = "archived"
    Finished = "finished"

The project owner should only be able to set the new statuses (not the quarantine related ones), and only if the project is not quarantined. Something like:

stateDiagram-v2
    direction LR
    [*] --> UserStates
    state UserStates {
        None
        Archived
        Deprecated
        Finished
    }
    UserStates --> UserStates
    UserStates --> QuarantineEnter : 
    QuarantineEnter --> QuarantineExit
    QuarantineExit --> QuarantineEnter
    QuarantineExit --> UserStates
Loading

The UI for setting the status can be a simple drop down:
image image

And the UI for the project's main page can be similar to the quarantined one:
image

Additional context

  • There have been previous requests for adding an "archived" status: Add option to archive project #4021, Feature - archive a project #15950. Also for unsupported/deprecated: Ability to mark a version of a package as deprecated or unsupported #345.
  • In this issue I suggested 3 new statuses, here's a more complete description of each:
    • “Finished”: the project is “complete” and not receiving active feature development;
    • “Archived”: the project is inactive and not receiving further updates of any kind;
    • “Deprecated”: the project has been obsoleted or replaced in functionality by another project.
  • Initially, we can make the project status only affect the web UI and the metadata included in the index APIs. We could later add restrictions to the project actions depending on the status (e.g.: disallowing uploads for deprecated projects), but this is optional and will probably need more discussion.
  • The changes required in the backend for this feature are pretty simple since the LifecycleStatus field already exists. Most of the work should be UI-related.

cc @miketheman @woodruffw

@facutuesca facutuesca added feature request requires triaging maintainers need to do initial inspection of issue labels Oct 7, 2024
@miketheman
Copy link
Member

Here's an even older issue for context: #345
There's also conversations like https://discuss.python.org/t/reporting-outdated-unmaintained-projects-on-pypi/2264 and https://discuss.python.org/t/marking-packages-on-pypi-as-unmaintained-or-obsolete/37742
You may wish to explore those topics and links for historical context - I'll admit I'm not fresh on these myself, only the ones I was able to find with a quick search that appeared relevant.

Adding LifecycleStatus was intended to be extended exactly as you're describing - so yay!

I think there's a lot ideas out there on what these mean, how they are used, and what behaviors they would perform - especially when it comes to client-side warnings/errors, so this is probably a bit of a larger conversation around what these lifecycle statuses mean to the wider ecosystem. See https://peps.python.org/pep-0592/ for an example of how yank was adopted into the ecosystem.

And naming things is hard - so I'd encourage some deeper thought into what these names mean, and whether or not they convey the appropriate meaning.
See https://github.com/pypa/trove-classifiers/blob/4e5d35f7a09b57f0a2b65a3dd471c001e5bc7478/src/trove_classifiers/__init__.py#L9-L11 for trove classifiers folks use today.

Another consideration is whether these state changes change anything in Journals (I don't think they should, but quarantine explicitly does to help inform mirrors of the removal/addition) and Project Events (probably should add some history to record who and when these changes were made).

@woodruffw
Copy link
Member

so I'd encourage some deeper thought into what these names mean, and whether or not they convey the appropriate meaning.

Yeah, it'd be good to sample the community a bit and see if they find these names intuitive/understandable. My thinking is that we'd include documentation on docs.pypi as well for these, and link to those docs from the respective project headers.

To whit: calling @sethmlarson, @webknjaz, and @ofek to the opinions phone 🙂

Another consideration is whether these state changes change anything in Journals (I don't think they should, but quarantine explicitly does to help inform mirrors of the removal/addition) and Project Events (probably should add some history to record who and when these changes were made).

Yep, big +1 to recording these state changes in the project events, and I'm a slight -1 on including them in the journal (for the reasons you've said 🙂)

@webknjaz
Copy link
Member

webknjaz commented Oct 7, 2024

Here's some unstructured brain dump from me..
Should this include/inherit things that trove classifiers already expose? There are 7 development statuses already. My understanding is that the difference is in the classifiers being dist metadata while the suggested metadata bit is project-global and is not bound to a specific version, right? Wouldn't it be useful to mark a subset of old versions as deprecated, as opposed to an entire project? I think Tidelift values such metadata, among other things. Perhaps, it's reasonable to ask them to share their experience? By the way, if this new metadata is different/conflicting compared to the trove classifiers — which one is supposed to be the source of truth? Should using the trove classifiers be deprecated in favor of the new statuses (if yes, all of the old statuses need to be represented too).
Also, how would the dependency resolvers be expected to take this metadata into account? I think that the deprecation status might need to be coupled with some redirect when a maintainer knows which project/fork is going to keep being maintained. And the successor project might need to expose a backreference similar to Obsoletes-Dist in dist metadata.

@facutuesca
Copy link
Contributor Author

Should this include/inherit things that trove classifiers already expose? There are 7 development statuses already. My understanding is that the difference is in the classifiers being dist metadata while the suggested metadata bit is project-global and is not bound to a specific version, right? Wouldn't it be useful to mark a subset of old versions as deprecated, as opposed to an entire project?

Yes, the trove classifiers are metadata for a specific version of a package, whereas the suggested statuses are project-global. With regards to marking a subset of old versions as deprecated: I'm not sure this would be a good fit for anything other than the "deprecated" status. That is, other statuses like "archived" or "finished/done" usually refer to an entire project, and the main goal of this feature is to give maintainers an easy way of communicating this to downstream users.

By the way, if this new metadata is different/conflicting compared to the trove classifiers — which one is supposed to be the source of truth? Should using the trove classifiers be deprecated in favor of the new statuses (if yes, all of the old statuses need to be represented too).

The way I understand it, there's no conflict: a trove classifier of Development Status :: 3 - Alpha for pkg==0.0.5 communicates that version 0.0.5 of pkg is an alpha version. If that was the last version the maintainer released, and they come back years later and mark the project as "Archived", then both pieces of information are correct: the project has been archived, and the last released version was an alpha.

Given that, it wouldn't make sense to deprecate trove classifiers in favor of the new statuses. The new statuses are only meant to describe an (unlikely to change) end status for a project (deprecation/archival/etc), whereas trove classifiers are still needed for release-specific information of the entire project lifecycle.

Also, how would the dependency resolvers be expected to take this metadata into account? I think that the deprecation status might need to be coupled with some redirect when a maintainer knows which project/fork is going to keep being maintained. And the successor project might need to expose a backreference similar to Obsoletes-Dist in dist metadata

I think this feature can be useful without prescribing how dependency resolvers use this information (if they do at all). The fact that the information can be queried by API already makes it useful for users monitoring this kind of information for their supply chain security, and showing it in the web UI helps users that visit the project's PyPI page.

I'm unsure about adding an extra field for specifying a replacement package. I think that type of information is prone to decay, and in any case it could be included in the README or project description. I don't think dependency resolvers would be able to use the information about a replacement package anyway, since there would have to be a guarantee of being a 1:1 drop-in replacement.

@sethmlarson
Copy link
Contributor

Thanks for opening this issue, here are some thoughts:

@miketheman: Adding LifecycleStatus was intended to be extended exactly as you're describing - so yay!

Should whether a project is quarantined or not be separate from a user-defined lifecycle status? Quarantining is a process outside of anything user-controlled and in my mind should be reversible without disturbing the project.

Trove classifiers

IMO, trove classifiers were not the right mechanism for capturing the information about project lifecycle, requiring publishing a new version to update the information means it's either out-of-date or tedious to maintain.

Better mechanisms: Pre-releases are captured using the version number, "mature" from the age and amount of users. Something akin to "archived" or "deprecated" outside of packaging metadata makes sense to me, so this feels like the right layer to host this information.

I don't think this system should try to capture everything that trove classifiers do, if only because the information doesn't evenly apply to a project (instead to releases).

Yanking, client-side warnings/errors

The yanking PEP doesn't mention anything about project lifecycle, instead focusing on "stopping the bleeding" for broken versions.

Similar to yanking, I could definitely see an installer warning about a deprecated project. Deprecation brings to mind that the project is intending for its audience to take action (either adopting another project or removing it).

Archived is the language that GitHub uses, I think we should pick one state between archived/completed/finished so we don't confuse users or maintainers about what it means. I think it makes sense to also warn users in this case? I'm not quite sure why mentally I am more hung up on warnings users for this case than deprecation.

I think archiving implies that there will be no more security fixes from upstream, which is relevant for users of open source trying to comply with regulations, so this information is helpful if a medium/large project were to be marked as archived.

Resolving

I don't think any of these statuses should have an effect on resolvers.

@miketheman miketheman removed the requires triaging maintainers need to do initial inspection of issue label Oct 11, 2024
@miketheman
Copy link
Member

Another link: #1506

@facutuesca
Copy link
Contributor Author

And naming things is hard - so I'd encourage some deeper thought into what these names mean, and whether or not they convey the appropriate meaning.

After thinking a bit, I think the most important thing is to add a mechanism for mantainers to signal their users that they should re-evaluate their use of a project

The names suggested are different "flavors" of the above to give the user more context. But maybe we should start with their definition, rather than with their name.

I propose the following two categories to start the discussion:

  • Category 1: Users are strongly encouraged to stop using the project. Users should not expect any further updates, not even security or bug fixes.

  • Category 2: Users should evaluate if they want to stop using the project. Users should not expect any new features, but bug fixes might be addressed, and security issues will be addressed.

Using the original names, Deprecated and Archived would fit into Category 1 (strong suggestion to stop using), and Finished would fit into Category 2 (suggestion to evaluate use).

Is that a fair summary? What things should be added or removed?

@woodruffw
Copy link
Member

I like those categories!

As a related thought: something PyPI could do is email a project's owners after a designated period of inactivity (2-3 years?) and nudge them to consider adding a status label. That would have to be carefully balanced to avoid annoying users/spamming people, however.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants