Skip to content

Commit

Permalink
Automatically retrying with auth_type=azure-cli when constructing `…
Browse files Browse the repository at this point in the history
…workspace_clients` on Azure (#1650)

## Changes
- Automatically retrying with `auth_type=azure-cli` when constructing
`workspace_clients` on Azure
- Resolved TODO items for `AccountWorkspaces`
- Added relevant suggestions in `troubleshooting.md`

### Linked issues
<!-- DOC: Link issue with a keyword: close, closes, closed, fix, fixes,
fixed, resolve, resolves, resolved. See
https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword
-->

Closes #1574, closes #1430

### Functionality 

- [x] added relevant user documentation

### Tests
<!-- How is this tested? Please see the checklist below and also
describe any other relevant tests -->

- [x] manually tested
- [x] verified on staging environment (screenshot attached)
  • Loading branch information
nkvuong authored May 7, 2024
1 parent c79c9aa commit 1350995
Show file tree
Hide file tree
Showing 6 changed files with 126 additions and 46 deletions.
9 changes: 6 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,13 @@ Databricks Labs UCX
![UCX by Databricks Labs](docs/logo-no-background.png)

The companion for upgrading to Unity Catalog. After [installation](#install-ucx), ensure to [trigger](#ensure-assessment-run-command) the [assessment workflow](#assessment-workflow),
so that you'll be able to [scope the migration](docs/assessment.md) and execute the [group migration workflow](#group-migration-workflow).
[`<installation_path>/README`](#readme-notebook) contains further instructions and explanations of these workflows.
so that you'll be able to [scope the migration](docs/assessment.md) and execute the [group migration workflow](#group-migration-workflow).

After installation, [`<installation_path>/README`](#readme-notebook) contains further instructions and explanations of these workflows.
Then you can execute [table migration workflow](#table-migration-workflow).
More workflows, like notebook code migration is coming in the future releases.

More workflows, like notebook code migration is coming in the future releases.

UCX exposes a number of command line utilities accessible via `databricks labs ucx`.

For questions, troubleshooting or bug fixes, please see our [troubleshooting guide](docs/troubleshooting.md) or submit [an issue](https://github.com/databrickslabs/ucx/issues).
Expand Down
12 changes: 12 additions & 0 deletions docs/troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,18 @@ By following these steps, you should be able to effectively locate, access, and
- Ensure you do not forget the `--profile <CONFIGPROFILENAME>` to authenticate. If the databricks command cannot authenticate, you may receive a lengthy stack traceback
- Validate your login with `databricks auth env --profile <CONFIGPROFILENAME>` which should print out a json structure having about 8 different keys and values and secrets

If Azure CLI has already been installed and authenticated, but you see the following error when running ucx commands:

`14:50:33 ERROR [d.labs.ucx] In order to obtain AAD token, Please run azure cli to authenticate.`

Resolve this in macOS by running the command with an explicit auth type set: `DATABRICKS_AUTH_TYPE=azure-cli databricks labs ucx ...`.
To resolve this issue in Windows, proceed with the following steps:

1. Open `%userprofile%` (the path like `C:\Users\<username>`)
2. Open `.databrickscfg`
3. Change the profile definition by setting `auth_type=azure-cli`
4. Save, close the file, re-run `databricks labs ucx ...`

### Resolving common errors on UCX install

#### Error on installing the ucx inventory database
Expand Down
61 changes: 54 additions & 7 deletions src/databricks/labs/ucx/account/workspaces.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,16 +25,30 @@ def _workspaces(self):
yield workspace

def client_for(self, workspace: Workspace) -> WorkspaceClient:
return self._ac.get_workspace_client(workspace)
try:
# get_workspace_client will raise an exception since it calls config.init_auth()
return self._ac.get_workspace_client(workspace)
except (PermissionDenied, NotFound, ValueError) as err:
# on azure, we can retry with azure-cli auth
if self._ac.config.is_azure and self._ac.config.auth_type != "azure-cli":
current_auth_type = self._ac.config.auth_type
self._ac.config.auth_type = "azure-cli"
try:
ws = self._ac.get_workspace_client(workspace)
except (PermissionDenied, NotFound, ValueError) as exc:
raise PermissionDenied(f"Failed to create client for {workspace.deployment_name}: {exc}") from exc
finally:
self._ac.config.auth_type = current_auth_type
return ws
raise PermissionDenied(f"Failed to create client for {workspace.deployment_name}: {err}") from err

def workspace_clients(self, workspaces: list[Workspace] | None = None) -> list[WorkspaceClient]:
"""
Return a list of WorkspaceClient for each configured workspace in the account
:return: list[WorkspaceClient]
"""
# TODO: move _can_administer() from account installer over here
if workspaces is None:
workspaces = self._workspaces()
workspaces = self.get_accessible_workspaces()
clients = []
for workspace in workspaces:
ws = self.client_for(workspace)
Expand Down Expand Up @@ -72,6 +86,35 @@ def create_account_level_groups(self, prompts: Prompts, workspace_ids: list[int]
self._add_members_to_acc_group(self._ac, acc_group.id, group_name, valid_group)
logger.info(f"Group {group_name} created in the account")

def get_accessible_workspaces(self) -> list[Workspace]:
"""
Get all workspaces that the user has access to
:return: list[Workspace]
"""
accessible_workspaces = []
for workspace in self._ac.workspaces.list():
if self._can_administer(workspace):
accessible_workspaces.append(workspace)
return accessible_workspaces

def _can_administer(self, workspace: Workspace) -> bool:
try:
# check if user has access to workspace
ws = self.client_for(workspace)
except (PermissionDenied, NotFound, ValueError) as err:
logger.warning(f"{workspace.deployment_name}: Encounter error {err}. Skipping...")
return False
current_user = ws.current_user.me()
if current_user.groups is None:
return False
# check if user is a workspace admin
if "admins" not in [g.display for g in current_user.groups]:
logger.warning(
f"{workspace.deployment_name}: User {current_user.user_name} is not a workspace admin. Skipping..."
)
return False
return True

def _try_create_account_groups(
self, group_name: str, acc_groups: dict[str | None, list[ComplexValue] | None]
) -> Group | None:
Expand Down Expand Up @@ -115,7 +158,8 @@ def _add_members_to_acc_group(
schemas=[PatchSchema.URN_IETF_PARAMS_SCIM_API_MESSAGES_2_0_PATCH_OP],
)

def _chunks(self, lst, chunk_size):
@staticmethod
def _chunks(lst, chunk_size):
"""Yield successive n-sized chunks from lst."""
for i in range(0, len(lst), chunk_size):
yield lst[i : i + chunk_size]
Expand Down Expand Up @@ -158,7 +202,8 @@ def _load_workspace_groups(self, prompts, workspace, all_workspaces_groups):
all_workspaces_groups[group_name] = full_workspace_group
logger.info(f"Found a total of {len(all_workspaces_groups)} groups to migrate to the account")

def _is_group_out_of_scope(self, group: Group) -> bool:
@staticmethod
def _is_group_out_of_scope(group: Group) -> bool:
if group.display_name in {"users", "admins", "account users"}:
logger.debug(f"Group {group.display_name} is a system group, ignoring")
return True
Expand All @@ -170,7 +215,8 @@ def _is_group_out_of_scope(self, group: Group) -> bool:
return True
return False

def _has_same_members(self, group_1: Group, group_2: Group) -> bool:
@staticmethod
def _has_same_members(group_1: Group, group_2: Group) -> bool:
ws_members_set_1 = set([m.display for m in group_1.members] if group_1.members else [])
ws_members_set_2 = set([m.display for m in group_2.members] if group_2.members else [])
return not bool((ws_members_set_1 - ws_members_set_2).union(ws_members_set_2 - ws_members_set_1))
Expand All @@ -190,7 +236,8 @@ def _get_account_groups(self) -> dict[str | None, list[ComplexValue] | None]:
logger.info(f"{len(acc_groups)} account groups found")
return acc_groups

def _safe_groups_get(self, interface, group_id) -> Group | None:
@staticmethod
def _safe_groups_get(interface, group_id) -> Group | None:
try:
if not group_id:
return None
Expand Down
32 changes: 1 addition & 31 deletions src/databricks/labs/ucx/install.py
Original file line number Diff line number Diff line change
Expand Up @@ -611,36 +611,6 @@ def _get_safe_account_client(self) -> AccountClient:
account_id = self.prompts.question("Please provide the Databricks account id")
return AccountClient(host=host, account_id=account_id, product="ucx", product_version=__version__)

def _can_administer(self, workspace: Workspace):
# TODO: move to AccountWorkspaces
try:
# check if user is a workspace admin
ws = self.account_client.get_workspace_client(workspace)
current_user = ws.current_user.me()
if current_user.groups is None:
return False
if "admins" not in [g.display for g in current_user.groups]:
logger.warning(
f"{workspace.deployment_name}: User {current_user.user_name} is not a workspace admin. Skipping..."
)
return False
# check if user has access to workspace
except (PermissionDenied, NotFound, ValueError) as err:
logger.warning(f"{workspace.deployment_name}: Encounter error {err}. Skipping...")
return False
return True

def _get_accessible_workspaces(self):
"""
Get all workspaces that the user has access to
"""
# TODO: move to AccountWorkspaces
accessible_workspaces = []
for workspace in self.account_client.workspaces.list():
if self._can_administer(workspace):
accessible_workspaces.append(workspace)
return accessible_workspaces

def _get_installer(self, workspace: Workspace) -> WorkspaceInstaller:
workspace_client = self.account_client.get_workspace_client(workspace)
logger.info(f"Installing UCX on workspace {workspace.deployment_name}")
Expand All @@ -650,7 +620,7 @@ def install_on_account(self):
ctx = AccountContext(self._get_safe_account_client())
default_config = None
confirmed = False
accessible_workspaces = self._get_accessible_workspaces()
accessible_workspaces = self.account_workspaces.get_accessible_workspaces()
msg = "\n".join([w.deployment_name for w in accessible_workspaces])
installed_workspaces = []
if not self.prompts.confirm(
Expand Down
6 changes: 4 additions & 2 deletions tests/unit/account/test_aggregate.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,13 @@
import logging
from unittest.mock import create_autospec

from databricks.sdk import Workspace, WorkspaceClient
from databricks.sdk.service import iam, sql

from databricks.labs.ucx.account.aggregate import AccountAggregate
from databricks.labs.ucx.account.workspaces import AccountWorkspaces

from databricks.labs.ucx.config import WorkspaceConfig
from databricks.labs.ucx.contexts.workspace_cli import WorkspaceContext
from databricks.sdk.service import sql


def test_basic_readiness_report_no_workspaces(acc_client, caplog):
Expand Down Expand Up @@ -67,6 +68,7 @@ def test_readiness_report_ucx_installed(acc_client, caplog):
),
statement_id='123',
)
ws.current_user.me.return_value = iam.User(user_name="[email protected]", groups=[iam.ComplexValue(display="admins")])

ctx = WorkspaceContext(ws).replace(config=WorkspaceConfig(inventory_database="something", warehouse_id="1234"))
account_aggregate_obj = AccountAggregate(account_ws, workspace_context_factory=lambda _: ctx)
Expand Down
52 changes: 49 additions & 3 deletions tests/unit/account/test_workspaces.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,11 @@
import io
import json

from unittest.mock import create_autospec
import pytest

import pytest
from databricks.labs.blueprint.installation import Installation, MockInstallation
from databricks.labs.blueprint.tui import MockPrompts
from databricks.sdk import WorkspaceClient
from databricks.sdk import AccountClient, WorkspaceClient
from databricks.sdk.errors import NotFound, ResourceConflict
from databricks.sdk.service import iam
from databricks.sdk.service.iam import ComplexValue, Group, ResourceMeta, User
Expand Down Expand Up @@ -448,3 +447,50 @@ def test_create_acc_groups_should_not_throw_if_acc_grp_exists(acc_client):

acc_client.groups.create.assert_called_with(display_name="de")
acc_client.groups.patch.assert_not_called()


def test_get_accessible_workspaces():
acc = create_autospec(AccountClient)
acc.workspaces.list.return_value = [
Workspace(workspace_name="foo", workspace_id=123, workspace_status_message="Running", deployment_name="abc"),
Workspace(workspace_name="bar", workspace_id=456, workspace_status_message="Running", deployment_name="def"),
Workspace(workspace_name="bar", workspace_id=789, workspace_status_message="Running", deployment_name="def"),
Workspace(workspace_name="bar", workspace_id=000, workspace_status_message="Running", deployment_name="def"),
]
acc.config.is_azure = True
acc.config.auth_type = "databricks-cli"

# admin in workspace 1
ws1 = create_autospec(WorkspaceClient)
ws1.current_user.me.return_value = iam.User(user_name="[email protected]", groups=[iam.ComplexValue(display="admins")])
# not an admin in workspace 2
ws2 = create_autospec(WorkspaceClient)
ws2.current_user.me.return_value = iam.User(user_name="[email protected]")
# not an admin in workspace 3
ws3 = create_autospec(WorkspaceClient)
ws3.current_user.me.return_value = iam.User(user_name="[email protected]", groups=[iam.ComplexValue(display="users")])

def get_workspace_client(workspace) -> WorkspaceClient:
match workspace.workspace_id:
case 123:
return ws1
case 456:
return ws2
case 789:
return ws3
case _:
raise ValueError("unexpected workspace id")

acc.get_workspace_client.side_effect = get_workspace_client

account_workspaces = AccountWorkspaces(acc)
assert len(account_workspaces.get_accessible_workspaces()) == 1
ws1.current_user.me.assert_called_once()
ws2.current_user.me.assert_called_once()
# get_workspace_client should be called once for 123, 456, 789, then twice for workspace 000
assert acc.get_workspace_client.call_count == 5

acc.config.is_azure = False
acc.config.auth_type = "databricks-cli"
account_workspaces = AccountWorkspaces(acc)
assert len(account_workspaces.get_accessible_workspaces()) == 1

0 comments on commit 1350995

Please sign in to comment.