This feature introduces the ability to migrate groups from workspace level to account level in the group migration workflow. It helps you to upgrade all Databricks workspace assets: Legacy Table ACLs, Entitlements, AWS instance profiles, Clusters, Cluster policies, Instance Pools, Databricks SQL warehouses, Delta Live Tables, Jobs, MLflow experiments, MLflow registry, SQL Dashboards & Queries, SQL Alerts, Token and Password usage permissions that are set on the workspace level, Secret scopes, Notebooks, Directories, Repos, and Files.
It ensures that all the necessary groups are available in the workspace with the correct permissions, and removes any unnecessary groups and permissions.
The tasks in the group migration workflow depend on the output of the assessment workflow and can be executed in sequence to ensure a successful migration.
The output of each task is stored in Delta tables in the $inventory_database
schema.
The group migration workflow can be executed multiple times to ensure that all the groups are migrated successfully and that all the necessary permissions are assigned.
crawl_groups
: This task scans all groups for the local group migration scope.rename_workspace_local_groups
: This task renames workspace local groups by adding aucx-renamed-
prefix. This step is taken to avoid conflicts with account-level groups that may have the same name as workspace-local groups.reflect_account_groups_on_workspace
: This task adds matching account groups to this workspace. The matching account level group(s) must preexist(s) for this step to be successful. This step is necessary to ensure that the account-level groups are available in the workspace for assigning permissions.apply_permissions_to_account_groups
: This task assigns the full set of permissions of the original group to the account-level one. It covers local workspace-local permissions for all entities, including Legacy Table ACLs, Entitlements, AWS instance profiles, Clusters, Cluster policies, Instance Pools, Databricks SQL warehouses, Delta Live Tables, Jobs, MLflow experiments, MLflow registry, SQL Dashboards & Queries, SQL Alerts, Token and Password usage permissions, Secret Scopes, Notebooks, Directories, Repos, Files. This step is necessary to ensure that the account-level groups have the necessary permissions to manage the entities in the workspace.validate_groups_permissions
: This task validates that all the crawled permissions are applied correctly to the destination groups.delete_backup_groups
: This task removes all workspace-level backup groups, along with their permissions. This should only be executed after confirming that the workspace-local migration worked successfully for all the groups involved. This step is necessary to clean up the workspace and remove any unnecessary groups and permissions.
MigratedGroup
class represents a group that has been migrated from one name to another and stores information about
the original and new names, as well as the group's members, external ID, and roles. The MigrationState
class holds
the state of the migration process and provides methods for getting the target principal and temporary name for a given
group name.
The GroupManager
class is a CrawlerBase
subclass that manages groups in a Databricks workspace. It provides methods
for renaming groups, reflecting account groups on the workspace, deleting original workspace groups, and validating
group membership. The class also provides methods for listing workspace and account groups, getting group details, and
deleting groups.
The GroupMigrationStrategy
abstract base class defines the interface for a strategy that generates a list
of MigratedGroup
objects based on a mapping between workspace and account groups.
The MatchingNamesStrategy
, MatchByExternalIdStrategy
, RegexSubStrategy
, and RegexMatchStrategy
classes are
concrete implementations of this interface. See group name conflicts for more details.
The ConfigureGroups
class provides a command-line interface for configuring the group migration process during installation.
It prompts the user to enter information about the group migration strategy, such as the renamed group prefix, regular expressions
for matching and substitution, and a list of groups to migrate. The class also provides methods for validating user input
and setting class variables based on the user's responses.
It enables to crawl, save, and apply permissions for clusters, tables and UDFs (User-Defined Functions), secret scopes, entitlements, and dashboards.
To use the module, you can create a PermissionManager
instance by calling the factory
method, which sets up
the necessary AclSupport
objects for different types of objects in the workspace. Once the instance
is created, you can call the inventorize_permissions
method to crawl and save the permissions for all objects to
the inventory database in the permissions
table.
The apply_group_permissions
method allows you to apply the permissions to a list of account groups, while
the verify_group_permissions
method verifies that the permissions are valid.
The AclSupport
objects define how to crawl, save, and apply permissions for specific types of objects in the workspace:
get_crawler_tasks
: A method that returns a list of callables that crawl and return permissions for the objects supported by theAclSupport
instance. This method should be implemented to provide the necessary logic for crawling permissions for the specific type of object supported.get_apply_task
: A method that returns a callable that applies the permissions for a givenPermissions
object to a destination group, based on the group's migration state. The callable should not have any shared mutable state, ensuring thread safety and reducing the risk of bugs.get_verify_task
: A method that returns a callable that verifies that the permissions for a givenPermissions
object are applied correctly to the destination group. This method can be used to ensure that permissions are applied as expected, helping to improve the reliability and security of your Databricks workspace.object_types
: An abstract method that returns a set of strings representing the object types that theAclSupport
instance supports. This method should be implemented to provide the necessary information about the object types supported by theAclSupport
class.
The Permissions
dataclass is used to represent the permissions for a specific object type and ID. The dataclass includes a raw
attribute
that contains the raw permission data as a string, providing a convenient way to work with the underlying permission data.
The GenericPermissionsSupport
class is a concrete implementation of the AclSupport
interface for
migrating permissions on various objects in a Databricks workspace. It is designed to be flexible and support almost any
object type in the workspace:
- clusters
- cluster policies
- instance pools
- sql warehouses
- jobs
- pipelines
- serving endpoints
- experiments
- registered models
- token usage
- password usage
- feature tables
- notebooks
- workspace folders
It takes in an instance of the WorkspaceClient
class, a list of Listing
objects, and a verify_timeout
parameter in
its constructor. The Listing
objects are responsible for listing the objects in the workspace, and
the GenericPermissionsSupport
class uses these listings to crawl the ACL permissions for each object.
The _apply_grant
method applies the ACL permission to the target principal in the database, and the _verify
method
checks if the ACL permission in the Grant
object matches the ACL permission for that object and principal in the database.
If the ACL permission does not match, the method raises a ValueError
with an error message. The get_verify_task
method
takes in a Permissions
object and returns a callable object that calls the _verify
method with the object type,
object ID, and Grant
object from the Permissions
object.
he _safe_get_permissions
and _safe_updatepermissions
methods are used to safely get and update the permissions for
a given object type and ID, respectively. These methods handle exceptions that may occur during the API calls and log
appropriate warning messages.
Reflected in RedashPermissionsSupport. See examples for more details on how to use it as a library.
The ScimSupport
is AclSupport
that creates a snapshot of all the groups in the workspace, including their display name, id, meta, roles, and entitlements.
The _is_item_relevant
method checks if a permission item is relevant to the current migration state. The get_crawler_tasks
method returns an iterator of partial functions
for crawling the permissions of each group in the snapshot. It checks if the group has any roles or entitlements and returns a partial function to crawl the corresponding property.
See examples for more details on how to use it as a library.
SecretScopesSupport
is a concrete implementation of the AclSupport
interface for crawling ACLs of
all secret scopes, applying and verifying ACLs, and checking if a Permissions
object is relevant to the current
migration state. It simplifies the process of managing permissions on secret scopes by checking if the ACLs have been
applied correctly, and if not, automatically reapplying them.
The TableAclSupport
class is initialized with an instance of GrantsCrawler
and SqlBackend
classes, along with a verify_timeout
parameter.
The class offers methods for crawling table ACL permissions, applying and verifying ACL permissions, and checking if a Permissions
object is relevant to the current migration state.
The get_crawler_tasks
method returns an iterator of callable objects, each of which returns a Permissions
object for a specific table ACL permission.
The _from_reduced
method creates a Grant
object for each set of folded actions, and the get_apply_task
method applies the ACL permission in the Permissions
object to the target principal in the MigrationState
object.
Furthermore, the _apply_grant
method applies the ACL permission to the target principal in the database, while the _verify
method checks if the ACL permission in
the Grant
object matches the ACL permission for that object and principal in the database. The get_verify_task
method calls the _verify
method with the object type,
object ID, and Grant
object from the Permissions
object.
Use DEBUG
notebook to troubleshoot anything.
Below are some useful code snippets that can be useful for troubleshooting. Make sure to install databricks-sdk on the cluster to run it.
- Find workspace-local groups that are eligible for migration to the account:
from databricks.sdk import WorkspaceClient
from databricks.sdk.service import iam
ws = WorkspaceClient()
workspace_groups = [
g
for g in ws.groups.list(attributes='id,displayName,meta')
if g.meta.resource_type == "WorkspaceGroup"
]
print(f'Found {len(workspace_groups)} workspace-local groups')
account_groups = [
iam.Group.from_dict(r)
for r in ws.api_client.do(
"get",
"/api/2.0/account/scim/v2/Groups",
query={"attributes": "id,displayName,meta,members"},
).get("Resources", [])
]
account_groups = [g for g in account_groups if g.display_name not in ["users", "admins", "account users"]]
print(f"Found {len(account_groups)} account groups")
ws_group_names = {_.display_name for _ in workspace_groups}
ac_group_names = {_.display_name for _ in account_groups}
group_names = list(ws_group_names.intersection(ac_group_names))
print(f"Found {len(group_names)} groups to migrate")
- Recover workspace-local groups from backup groups from within a debug notebook:
from databricks.labs.ucx.workspace_access.groups import GroupManager
from databricks.labs.ucx.config import GroupsConfig
group_manager = GroupManager(ws, GroupsConfig(auto=True))
group_manager.ws_local_group_deletion_recovery()
- Recover Table ACL from
$inventory.grants
to$inventory.permissions
:
from databricks.labs.ucx.hive_metastore import GrantsCrawler, TablesCrawler
from databricks.labs.ucx.workspace_access.manager import PermissionManager
from databricks.labs.ucx.workspace_access.tacl import TableAclSupport
from databricks.labs.ucx.framework.crawlers import RuntimeBackend
sql_backend = RuntimeBackend()
inventory_schema = cfg.inventory_database
tables = TablesCrawler(sql_backend, inventory_schema)
grants = GrantsCrawler(tables)
tacl = TableAclSupport(grants, sql_backend)
permission_manager = PermissionManager(sql_backend, inventory_schema, [tacl])
permission_manager.inventorize_permissions()
- Create a migration state just for account groups
from databricks.labs.ucx.workspace_access.manager import PermissionManager
from databricks.labs.ucx.workspace_access.groups import GroupMigrationState
from databricks.labs.ucx.framework.crawlers import StatementExecutionBackend
collected_groups = []
...
migration_state = GroupMigrationState()
for group in collected_groups:
migration_state.add(None, None, group)
sql_backend = StatementExecutionBackend(ws, cfg.warehouse_id)
permission_manager = PermissionManager.factory(ws, sql_backend, cfg.inventory_database)
permission_manager.apply_group_permissions(migration_state, destination="account")
- recovering permissions from a debug notebook with logs
import logging
from logging.handlers import TimedRotatingFileHandler
databricks_logger = logging.getLogger("databricks")
databricks_logger.setLevel(logging.DEBUG)
ucx_logger = logging.getLogger("databricks.labs.ucx")
ucx_logger.setLevel(logging.DEBUG)
log_file = "/Workspace/Users/[email protected]/recovery.log"
# files are available in the workspace only once their handlers are closed,
# so we rotate files log every 10 minutes.
#
# See https://docs.python.org/3/library/logging.handlers.html#logging.handlers.TimedRotatingFileHandler
file_handler = TimedRotatingFileHandler(log_file, when="M", interval=5)
log_format = "%(asctime)s %(levelname)s [%(name)s] {%(threadName)s} %(message)s"
log_formatter = logging.Formatter(fmt=log_format, datefmt="%H:%M:%S")
file_handler.setFormatter(log_formatter)
file_handler.setLevel(logging.DEBUG)
databricks_logger.addHandler(file_handler)
sql_backend = StatementExecutionBackend(ws, cfg.warehouse_id)
try:
permission_manager = PermissionManager.factory(ws, sql_backend, cfg.inventory_database)
permission_manager.apply_group_permissions(migration_state, destination="account")
finally:
# IMPORTANT!!!!
file_handler.close()