Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create inventory of service principals and direct files access in Azure for a Spark session #310

Closed
Tracked by #1085
dipankarkush-db opened this issue Sep 28, 2023 · 5 comments · Fixed by #326
Closed
Tracked by #1085
Labels
feat/crawler migrate/code Abstract Syntax Trees and other dark magic migrate/jobs Step 5 - Upgrading Jobs for External Tables step/assessment go/uc/upgrade - Assessment Step

Comments

@dipankarkush-db
Copy link
Contributor

In Azure, data access is authorized using service principals using Spark session settings.

Spark session level settings
spark.conf.set("fs.azure.account.auth.type","OAuth")
spark.conf.set("fs.azure.account.oauth.provider.type","org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set("fs.azure.account.oauth2.client.id","")
spark.conf.set("fs.azure.account.oauth2.client.secret",dbutils.secrets.get(scope="",key=""))
spark.conf.set("fs.azure.account.oauth2.client.endpoint","https://login.microsoftonline.com//oauth2/token")

We need a feature in the tool to create an inventory of all service principals and direct files/mount points that are currently being used in the workspace along with the objects in teh spark session

@pohlposition pohlposition added step/assessment go/uc/upgrade - Assessment Step migrate/external go/uc/upgrade SYNC EXTERNAL TABLES step labels Sep 28, 2023
@nfx nfx added this to the 1 week milestone Oct 2, 2023
@zpappa zpappa removed this from the 1 week milestone Oct 2, 2023
@nfx nfx added this to UCX Oct 3, 2023
@nfx nfx moved this to Active Backlog in UCX Oct 3, 2023
@nfx nfx closed this as completed in #326 Oct 8, 2023
@github-project-automation github-project-automation bot moved this from Active Backlog to Archive in UCX Oct 8, 2023
@qziyuan
Copy link
Contributor

qziyuan commented Mar 7, 2024

I believe this is issue is not completed, since we are not crawling SPN from Spark session level settings in user codes

@qziyuan qziyuan reopened this Mar 7, 2024
@github-project-automation github-project-automation bot moved this from Archive to Refined in UCX Mar 7, 2024
@nfx nfx moved this from Refined to Quarter Backlog in UCX Mar 7, 2024
@nfx
Copy link
Collaborator

nfx commented Mar 7, 2024

@qziyuan we can start working on it once we complete the table migration critical path

@nfx nfx added the migrate/code Abstract Syntax Trees and other dark magic label Mar 7, 2024
@nfx
Copy link
Collaborator

nfx commented Mar 7, 2024

this is not in our critical path, as we can crawl for SPN usage directly through Azure APIs - https://github.com/databrickslabs/ucx/blob/main/src/databricks/labs/ucx/azure/access.py#L23-L31

@nfx nfx added migrate/jobs Step 5 - Upgrading Jobs for External Tables and removed migrate/external go/uc/upgrade SYNC EXTERNAL TABLES step labels Apr 15, 2024
@nfx nfx moved this from Quarter Backlog to Design in UCX Jul 4, 2024
@JCZuurmond
Copy link
Member

this is not in our critical path, as we can crawl for SPN usage directly through Azure APIs - https://github.com/databrickslabs/ucx/blob/main/src/databricks/labs/ucx/azure/access.py#L23-L31

@dipankarkush-db : Could you confirm that ucx already covers your question? If the service principal is used through Spark settings, we find it due its permissions on storage accounts. We might not find its credentials, but we recommend to use access connectors instead.

If it covers your question, then we need to rewrite this issue to remove these spark configuration settings during code migrations as the authentication is handled by UC.

@nfx
Copy link
Collaborator

nfx commented Jul 17, 2024

Closing in favour of #2021

@nfx nfx closed this as completed Jul 17, 2024
@github-project-automation github-project-automation bot moved this from Design to Archive in UCX Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat/crawler migrate/code Abstract Syntax Trees and other dark magic migrate/jobs Step 5 - Upgrading Jobs for External Tables step/assessment go/uc/upgrade - Assessment Step
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

6 participants