Skip to content

Conversation

@fengzhou-msft
Copy link
Member

@fengzhou-msft fengzhou-msft commented Nov 12, 2020

Description

In the response of DELETE long running operation, some service (observed in some corner case of servicebus, update: confirmed in swagger review session that more services have this behavior) may return 404 while GETting operation results when the resource is successfully deleted (normally it should return 200 with no response body).

To handle the 404 case, this PR adds similar logic to the error handling in wait --deleted command:

if getattr(ex, 'status_code', None) == 404:
if wait_for_deleted:
return None

Testing Guide

Without this PR, azdev test test_sb_alias --live will fail when deleting the secondary namespace.
With this PR, it can succeed.

History Notes

[Component Name 1] BREAKING CHANGE: az command a: Make some customer-facing breaking change.
[Component Name 2] az command b: Add some customer-facing feature.


This checklist is used to make sure that common guidelines for a pull request are followed.

@yonzhan
Copy link
Collaborator

yonzhan commented Nov 12, 2020

Core

@jiasli
Copy link
Member

jiasli commented Feb 8, 2021

Is GET called on the service itself or the operation status (Azure-AsyncOperation/Location)?

except ClientException as client_exception:
from azure.cli.core.commands.arm import handle_long_running_operation_exception
self.cli_ctx.get_progress_controller().stop()
if getattr(client_exception, 'status_code', None) == 404 and 'delete' in self.cli_ctx.data['command']:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if we add this hack, Python SDK and its users will still suffers from this non-standard behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have an open issue in https://github.com/Azure/azure-rest-api-specs to track this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened an issue in our repo #15906 for service team to deal with.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on a disucssion with swagger reviewers, there are some existing services using 404 in LRO. For new swaggers, there should be validation tasks to catch it but for existing ones as long as the json is not modified with PRs, it will not be caught.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think we should force service team to fix it, instead of patching it from our side, otherwise Python SDK and its users will still suffers from this non-standard behavior.

@fengzhou-msft
Copy link
Member Author

Is GET called on the service itself or the operation status (Azure-AsyncOperation/Location)?

The error is thrown during long running operation, so I think it's called on the operation status.

        # Delete Namespace - primary
>       self.cmd('servicebus namespace delete --resource-group {rg} --name {namespacenameprimary}')

azure-cli/src/azure-cli/azure/cli/command_modules/servicebus/tests/latest/test_servicebus_alias_commands.py:163: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
azure-cli/src/azure-cli-testsdk/azure/cli/testsdk/base.py:167: in cmd
    return execute(self.cli_ctx, command, expect_failure=expect_failure).assert_with_checks(checks)
azure-cli/src/azure-cli-testsdk/azure/cli/testsdk/base.py:238: in __init__
    self._in_process_execute(cli_ctx, command, expect_failure=expect_failure)
azure-cli/src/azure-cli-testsdk/azure/cli/testsdk/base.py:301: in _in_process_execute
    raise ex.exception
env/lib/python3.8/site-packages/knack/cli.py:233: in invoke
    cmd_result = self.invocation.execute(args)
azure-cli/src/azure-cli-core/azure/cli/core/commands/__init__.py:659: in execute
    raise ex
azure-cli/src/azure-cli-core/azure/cli/core/commands/__init__.py:722: in _run_jobs_serially
    results.append(self._run_job(expanded_arg, cmd_copy))
azure-cli/src/azure-cli-core/azure/cli/core/commands/__init__.py:715: in _run_job
    six.reraise(*sys.exc_info())
env/lib/python3.8/site-packages/six.py:703: in reraise
    raise value
azure-cli/src/azure-cli-core/azure/cli/core/commands/__init__.py:704: in _run_job
    result = LongRunningOperation(cmd_copy.cli_ctx, 'Starting {}'.format(cmd_copy.name))(result)
azure-cli/src/azure-cli-core/azure/cli/core/commands/__init__.py:988: in __call__
    handle_long_running_operation_exception(client_exception)

@houk-ms
Copy link
Contributor

houk-ms commented Apr 22, 2021

Soft-delete and purge operation on managed-hsm also suffer from this bad behavior. It just returns 404 when the delete operation completes.

I think we should have this logic supported in CLI core to handle these cases. Maybe also add some log info to expose these cases. Of course, at the same time, we should report these bad behiviors to service team.

Copy link
Contributor

@houk-ms houk-ms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants