Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Endpoint.undeploy_all() has an edge case where it doesn't undeploy all models #1441

Closed
jasonbrancazio opened this issue Jun 17, 2022 · 6 comments
Assignees
Labels
api: vertex-ai Issues related to the googleapis/python-aiplatform API. 🚨 This issue needs some love. triage me I really want to be triaged. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@jasonbrancazio
Copy link

https://github.com/googleapis/python-aiplatform/blob/main/google/cloud/aiplatform/models.py#L1567 finds the model ids to undeploy by looking at the traffic_split.keys()

It is possible to get into a state where an Endpoint has a deployed model with no traffic split. In that state, the model id will not appear in traffic_split.keys(). undeploy_all will return successfully, but Endpoint.delete() will fail because models without traffic split are still deployed to the endpoint.

A better approach would be to use something like [m.id for m in endpoint.list_models()] to get the list of keys. Perhaps it would be [m.id for m in self._gca_resource.list_models()], but I'm not sure.

@product-auto-label product-auto-label bot added the api: vertex-ai Issues related to the googleapis/python-aiplatform API. label Jun 17, 2022
@yoshi-automation yoshi-automation added triage me I really want to be triaged. 🚨 This issue needs some love. labels Jun 17, 2022
@ivanmkc
Copy link
Contributor

ivanmkc commented Jun 22, 2022

I was just going to post but I see someone else found this bug.

This is indeed a bug.

self._gca_resource.deployed_models:

[id: "8204437019208712192"
model: "projects/1012616486416/locations/us-central1/models/5329293277411672064"
display_name: "train-automl-flowers"
create_time {
  seconds: 1654461627
  nanos: 611158000
}
automatic_resources {
  min_replica_count: 1
  max_replica_count: 1
}
disable_container_logging: true
]

self._gca_resource.traffic_split

{}

The problematic code is here:

    def undeploy_all(self, sync: bool = True) -> "Endpoint":
        """Undeploys every model deployed to this Endpoint.

        Args:
            sync (bool):
                Whether to execute this method synchronously. If False, this method
                will be executed in concurrent Future and any downstream object will
                be immediately returned and synced when the Future has completed.
        """
        self._sync_gca_resource()

        models_to_undeploy = sorted(  # Undeploy zero traffic models first
            self._gca_resource.traffic_split.keys(),
            key=lambda id: self._gca_resource.traffic_split[id],
        )

        for deployed_model in models_to_undeploy:
            self._undeploy(deployed_model_id=deployed_model, sync=sync)

        return self

@ivanmkc
Copy link
Contributor

ivanmkc commented Jun 22, 2022

I believe the fix is to replace:

        models_to_undeploy = sorted(  # Undeploy zero traffic models first
            self._gca_resource.traffic_split.keys(),
            key=lambda id: self._gca_resource.traffic_split[id],
        )

with

models_to_undeploy = [models.id for models in self._gca_resource.deployed_models]

@ivanmkc ivanmkc added the type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. label Jun 22, 2022
@rosiezou
Copy link
Contributor

@ivanmkc
Copy link
Contributor

ivanmkc commented Nov 30, 2022

@rosiezou The PR you mention doesn't fix the underlying bug in the SDK. That was just a workaround as the SDK was not working.

@ivanmkc ivanmkc reopened this Nov 30, 2022
@jaycee-li jaycee-li self-assigned this Jan 3, 2023
copybara-service bot pushed a commit that referenced this issue Jan 4, 2023
copybara-service bot pushed a commit that referenced this issue Jan 5, 2023
copybara-service bot pushed a commit that referenced this issue Jan 6, 2023
copybara-service bot pushed a commit that referenced this issue Jan 6, 2023
nayaknishant pushed a commit to nayaknishant/python-aiplatform that referenced this issue Jan 20, 2023
@ivanmkc
Copy link
Contributor

ivanmkc commented Jan 25, 2023

@jaycee-li I'm pretty sure this is still a bug. Can you double-check? Happy to chat offline.

@jaycee-li
Copy link
Contributor

The issue is resolved per offline discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: vertex-ai Issues related to the googleapis/python-aiplatform API. 🚨 This issue needs some love. triage me I really want to be triaged. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

5 participants