Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a crawler for creating an inventory of Azure Service Principals #326

Merged
merged 75 commits into from
Oct 8, 2023

Conversation

dipankarkush-db
Copy link
Contributor

@dipankarkush-db dipankarkush-db commented Sep 28, 2023

No description provided.

@codecov
Copy link

codecov bot commented Sep 28, 2023

Codecov Report

Merging #326 (f57bc26) into main (d7cc195) will increase coverage by 0.32%.
The diff coverage is 84.79%.

@@            Coverage Diff             @@
##             main     #326      +/-   ##
==========================================
+ Coverage   83.48%   83.80%   +0.32%     
==========================================
  Files          31       31              
  Lines        2555     2810     +255     
  Branches      448      532      +84     
==========================================
+ Hits         2133     2355     +222     
- Misses        319      332      +13     
- Partials      103      123      +20     
Files Coverage Δ
src/databricks/labs/ucx/runtime.py 50.53% <43.75%> (-2.03%) ⬇️
src/databricks/labs/ucx/assessment/crawlers.py 82.86% <87.44%> (+9.20%) ⬆️

src/databricks/labs/ucx/assessment/crawlers.py Outdated Show resolved Hide resolved
src/databricks/labs/ucx/assessment/crawlers.py Outdated Show resolved Hide resolved
src/databricks/labs/ucx/assessment/crawlers.py Outdated Show resolved Hide resolved
tests/unit/assessment/test_assessment.py Outdated Show resolved Hide resolved
src/databricks/labs/ucx/assessment/crawlers.py Outdated Show resolved Hide resolved
src/databricks/labs/ucx/assessment/crawlers.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@nfx nfx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, _list_all_cluster_with_spn_in_spark_conf in AzureServicePrincipalCrawler is a bit better. see other comments

src/databricks/labs/ucx/assessment/crawlers.py Outdated Show resolved Hide resolved
src/databricks/labs/ucx/assessment/crawlers.py Outdated Show resolved Hide resolved
tests/unit/assessment/test_assessment.py Outdated Show resolved Hide resolved
tests/unit/assessment/test_assessment.py Show resolved Hide resolved
@gitguardian
Copy link

gitguardian bot commented Sep 29, 2023

⚠️ GitGuardian has uncovered 2 secrets following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secrets in your pull request
GitGuardian id Secret Commit Filename
8289420 Generic High Entropy Secret 784c7f9 tests/unit/assessment/test_assessment.py View secret
8289420 Generic High Entropy Secret 020f631 tests/unit/assessment/test_assessment.py View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secrets safely. Learn here the best practices.
  3. Revoke and rotate these secrets.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

Our GitHub checks need improvements? Share your feedbacks!

Copy link
Collaborator

@nfx nfx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiny comment

tests/integration/assessment/test_assessment.py Outdated Show resolved Hide resolved
@dipankarkush-db
Copy link
Contributor Author

This looks mostly OK, but I don't see any scanning of init scripts to see if the same is being set in init scripts.
Here's what I think is missing:

  1. Check cluster scoped init scripts for spark config
  2. Check global init scripts for spark config

Do we want to handle scanning the init scripts (cluster and global) separately? There could be other entries in the init scripts (such as installation of packages) that we might be interested and its content are relevant for all cloud platform. We could run a dependent job after the current jobs to capture the details from init scripts and if any matching spark config for Azure is found then append to the cluster, job and Azure SPN tables. If agreed, then i could create a new issue.

Added the following -

  1. Azure SPN present in cluster scope init scripts for job and cluster and if yes then flag those clusters
  2. Check all the global init scripts and flag all those which have Azure SPN
  3. List of global init script is also added in the dashboard

Did not do - List of all Azure SPNs from all the init scripts. Lets discuss about it. It makes sense to me to add a separate issue for the same and merge this one.

@dipankarkush-db
Copy link
Contributor Author

This looks mostly OK, but I don't see any scanning of init scripts to see if the same is being set in init scripts.

Here's what I think is missing:

  1. Check cluster scoped init scripts for spark config
  2. Check global init scripts for spark config
image

src/databricks/labs/ucx/mixins/fixtures.py Outdated Show resolved Hide resolved
src/databricks/labs/ucx/mixins/fixtures.py Show resolved Hide resolved
@nfx nfx enabled auto-merge October 8, 2023 15:50
@nfx nfx changed the title Added a Crawler for creating an inventory of Azure service principals Added a crawler for creating an inventory of Azure Service Principals Oct 8, 2023
@nfx nfx merged commit 994c127 into main Oct 8, 2023
@nfx nfx mentioned this pull request Oct 12, 2023
nfx added a commit that referenced this pull request Oct 12, 2023
# Version changelog

## 0.4.0

* Added exception handling for secret scope not found.
([#418](#418)).
* Added a crawler for creating an inventory of Azure Service Principals
([#326](#326)).
* Added check if account group already exists during failure recovery
([#446](#446)).
* Added checking for index out of range.
([#429](#429)).
* Added hyperlink to UCX releases in the main readme
([#408](#408)).
* Added integration test to check backup groups get deleted
([#387](#387)).
* Added logging of errors during threadpool operations.
([#376](#376)).
* Added recovery mode for workspace-local groups from temporary groups
([#435](#435)).
* Added support for migrating Legacy Table ACLs from workspace-local to
account-level groups
([#412](#412)).
* Added detection for installations of unreleased versions
([#399](#399)).
* Decoupled `PermissionsManager` from `GroupMigrationToolkit`
([#407](#407)).
* Enabled debug logging for every job task run through a file, which is
accessible from both workspace UI and Databricks CLI
([#426](#426)).
* Ensured that table exists, even when crawlers produce zero records
([#373](#373)).
* Extended test suite for HMS->HMS TACL migration
([#439](#439)).
* Fixed handling of secret scope responses
([#431](#431)).
* Fixed `crawl_permissions` task to respect 'workspace_start_path'
config ([#444](#444)).
* Fixed broken logic in `parallel` module and applied hardened error
handling design for parallel code
([#405](#405)).
* Fixed codecov.io reporting
([#403](#403)).
* Fixed integration tests for crawlers
([#379](#379)).
* Improved README.py and logging messages
([#433](#433)).
* Improved cleanup for workspace backup groups by adding more retries on
errors ([#375](#375)).
* Improved dashboard queries to show unsupported storage types.
([#398](#398)).
* Improved documentation for readme notebook
([#257](#257)).
* Improved test coverage for installer
([#371](#371)).
* Introduced deterministic `env_or_skip` fixture for integration tests
([#396](#396)).
* Made HMS & UC fixtures return `CatalogInfo`, `SchemaInfo`, and
`TableInfo` ([#409](#409)).
* Merge `workspace_access.Crawler` and `workspace_access.Applier`
interfaces to `workspace_access.AclSupport`
([#436](#436)).
* Moved examples to docs
([#404](#404)).
* Properly isolated integration testing for workflows on an existing
shared cluster ([#414](#414)).
* Removed thread pool for any IAM Group removals and additions
([#394](#394)).
* Replace plus char with minus in version tag for GCP dev installation
of UCX ([#420](#420)).
* Run integration tests on shared clusters for a faster devloop
([#397](#397)).
* Show difference between serverless and PRO warehouses during
installation ([#385](#385)).
* Split `migrate-groups` workflow into three different stages for
reliability ([#442](#442)).
* Use groups instead of usernames in code owners file
([#389](#389)).
@nfx nfx deleted the fix/azure-SPNs-inventory branch October 17, 2023 22:44
FastLee pushed a commit that referenced this pull request Oct 25, 2023
# Version changelog

## 0.4.0

* Added exception handling for secret scope not found.
([#418](#418)).
* Added a crawler for creating an inventory of Azure Service Principals
([#326](#326)).
* Added check if account group already exists during failure recovery
([#446](#446)).
* Added checking for index out of range.
([#429](#429)).
* Added hyperlink to UCX releases in the main readme
([#408](#408)).
* Added integration test to check backup groups get deleted
([#387](#387)).
* Added logging of errors during threadpool operations.
([#376](#376)).
* Added recovery mode for workspace-local groups from temporary groups
([#435](#435)).
* Added support for migrating Legacy Table ACLs from workspace-local to
account-level groups
([#412](#412)).
* Added detection for installations of unreleased versions
([#399](#399)).
* Decoupled `PermissionsManager` from `GroupMigrationToolkit`
([#407](#407)).
* Enabled debug logging for every job task run through a file, which is
accessible from both workspace UI and Databricks CLI
([#426](#426)).
* Ensured that table exists, even when crawlers produce zero records
([#373](#373)).
* Extended test suite for HMS->HMS TACL migration
([#439](#439)).
* Fixed handling of secret scope responses
([#431](#431)).
* Fixed `crawl_permissions` task to respect 'workspace_start_path'
config ([#444](#444)).
* Fixed broken logic in `parallel` module and applied hardened error
handling design for parallel code
([#405](#405)).
* Fixed codecov.io reporting
([#403](#403)).
* Fixed integration tests for crawlers
([#379](#379)).
* Improved README.py and logging messages
([#433](#433)).
* Improved cleanup for workspace backup groups by adding more retries on
errors ([#375](#375)).
* Improved dashboard queries to show unsupported storage types.
([#398](#398)).
* Improved documentation for readme notebook
([#257](#257)).
* Improved test coverage for installer
([#371](#371)).
* Introduced deterministic `env_or_skip` fixture for integration tests
([#396](#396)).
* Made HMS & UC fixtures return `CatalogInfo`, `SchemaInfo`, and
`TableInfo` ([#409](#409)).
* Merge `workspace_access.Crawler` and `workspace_access.Applier`
interfaces to `workspace_access.AclSupport`
([#436](#436)).
* Moved examples to docs
([#404](#404)).
* Properly isolated integration testing for workflows on an existing
shared cluster ([#414](#414)).
* Removed thread pool for any IAM Group removals and additions
([#394](#394)).
* Replace plus char with minus in version tag for GCP dev installation
of UCX ([#420](#420)).
* Run integration tests on shared clusters for a faster devloop
([#397](#397)).
* Show difference between serverless and PRO warehouses during
installation ([#385](#385)).
* Split `migrate-groups` workflow into three different stages for
reliability ([#442](#442)).
* Use groups instead of usernames in code owners file
([#389](#389)).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Create inventory of service principals and direct files access in Azure for a Spark session
3 participants