Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find Usage of DBFS Mounts in the Workspace and Repo #391

Closed
Tracked by #1085
dipankarkush-db opened this issue Oct 6, 2023 · 2 comments
Closed
Tracked by #1085

Find Usage of DBFS Mounts in the Workspace and Repo #391

dipankarkush-db opened this issue Oct 6, 2023 · 2 comments
Labels
feat/crawler migrate/code Abstract Syntax Trees and other dark magic migrate/jobs Step 5 - Upgrading Jobs for External Tables migrate/volumes migrate from raw DBFS mounts to UC Volumes

Comments

@dipankarkush-db
Copy link
Contributor

Historically, customers have used mount points to connect their Databricks workspaces to their chosen cloud object storage, and many chose to utilize file paths as opposed to tables for accessing their data. For these long-time customers, the migration process is extremely manual, slow, and disruptive. This drastically decreases the chances of legacy customers upgrading their environment to Unity Catalog, which over time will lead to increased customer churn in favor of other platforms which do not require such a manual refactoring. Until now, we have not been able to provide an easier path forward for these customers.

@zpappa
Copy link

zpappa commented Oct 9, 2023

Functional Requirements

  • Crawler must find usages of mount paths in Python/SQL in SQL and Spark Commands
  • Crawler must find usages of mount paths in Python/SQL in Python dbutils.fs commands
  • Crawler must find usages of mount paths in Python/SQL in shell commands
  • Crawler must find usages of mount paths in Python/SQL in FS commands
  • Crawler must inventory mount paths and where they are used, notebook, query, cell, line

Persisted Table

mount_path_usages

Field Comment
mount_path_to_file the entire mount path as used in code, i.e. /mnt/mount/myfile.txt
mount_path the mount path, i.e. /mnt/mount
path_to_file path to the notebook in the workspace or relative to the root of the repo, empty if a query
git_repo git repo that is used
query_id id of the query if a query, empty if a notebook
notebook_command_type python, sql, sh, fs, etc
notebook_command_offset the offset of the command if a notebook
line_number the line number (not offset) of the usage in the query or file, or in the notebook cell

Implementation Details

  • We have an existing implementation that works in uc-job-upgrade that needs to be ported in

Considerations

  • Provide some way to scan only certain branches in git

@zpappa zpappa moved this from Triage to Refined in UCX Oct 9, 2023
@zpappa zpappa moved this from In Progress to Todo in UCX (weekly) - DO NOT USE THIS BOARD Oct 9, 2023
@pohlposition pohlposition moved this from Refined to Month Backlog in UCX Oct 9, 2023
@nfx nfx added the feat/cli CLI commands label Dec 6, 2023
@nfx nfx moved this from Month Backlog to Triage in UCX Dec 6, 2023
@nfx nfx moved this from Triage to Quarter Backlog in UCX Dec 6, 2023
@nfx nfx added the migrate/code Abstract Syntax Trees and other dark magic label Mar 25, 2024
@nfx nfx added migrate/volumes migrate from raw DBFS mounts to UC Volumes migrate/jobs Step 5 - Upgrading Jobs for External Tables and removed enhancement New feature or request to be discussed feat/cli CLI commands labels Apr 22, 2024
@nfx nfx changed the title Find Usage of Mount Points in the Workspace and Repo Find Usage of DBFS Mounts in the Workspace and Repo Apr 22, 2024
@nfx nfx removed the step/assessment go/uc/upgrade - Assessment Step label Apr 22, 2024
@nfx
Copy link
Collaborator

nfx commented Apr 24, 2024

Duplicate of #1133

@nfx nfx marked this as a duplicate of #1133 Apr 24, 2024
@nfx nfx closed this as completed Apr 24, 2024
@github-project-automation github-project-automation bot moved this from Quarter Backlog to Archive in UCX Apr 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat/crawler migrate/code Abstract Syntax Trees and other dark magic migrate/jobs Step 5 - Upgrading Jobs for External Tables migrate/volumes migrate from raw DBFS mounts to UC Volumes
Projects
Archived in project
Development

No branches or pull requests

3 participants