Skip to content

Commit

Permalink
Tidy up whitespace (#1923)
Browse files Browse the repository at this point in the history
## Changes

This PR follows on from #1778 by cleaning up files that use tabs, have
trailing whitespace, or are missing EOL at the end of files. By fixing
these in one go other PRs become less noisy. (Sample files used in tests
have been left alone and are not affected by this PR.)
  • Loading branch information
asnare authored Jul 5, 2024
1 parent 5209d63 commit 6198a28
Show file tree
Hide file tree
Showing 124 changed files with 270 additions and 271 deletions.
2 changes: 1 addition & 1 deletion .codegen.json
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,4 @@
"pytest -n 4 --cov src --cov-report=xml --timeout 30 tests/unit --durations 20"
]
}
}
}
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ contact_links:
- name: General Databricks questions
url: https://help.databricks.com/
about: Issues related to Databricks and not related to UCX

- name: UCX Documentation
url: https://github.com/databrickslabs/ucx/tree/main/docs
about: Documentation about UCX
Expand Down
1 change: 0 additions & 1 deletion .github/ISSUE_TEMPLATE/feature.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,4 +33,3 @@ body:
description: Add any other context, references or screenshots about the feature request here.
validations:
required: false

2 changes: 1 addition & 1 deletion .github/codecov.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,4 @@ coverage:
patch:
default:
target: auto
threshold: 0.5%
threshold: 0.5%
2 changes: 1 addition & 1 deletion .github/dependabot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,4 @@ updates:
- package-ecosystem: "github-actions"
directory: "/"
schedule:
interval: "daily"
interval: "daily"
2 changes: 1 addition & 1 deletion .github/pull_request_template.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

Resolves #..

### Functionality
### Functionality

- [ ] added relevant user documentation
- [ ] added new CLI command
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/no-cheat.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,4 +27,4 @@ jobs:
if [ "${CHEAT}" -ne 0 ]; then
echo "Do not cheat the linter: ${CHEAT}"
exit 1
fi
fi
8 changes: 4 additions & 4 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,12 +22,12 @@ jobs:
cache: 'pip'
cache-dependency-path: '**/pyproject.toml'
python-version: '3.10'

- name: Build wheels
run: |
pip install hatch==1.9.4
hatch build
- name: Draft release
uses: softprops/action-gh-release@v2
with:
Expand All @@ -38,11 +38,11 @@ jobs:
- uses: pypa/gh-action-pypi-publish@release/v1
name: Publish package distributions to PyPI

- name: Sign artifacts with Sigstore
uses: sigstore/[email protected]
with:
inputs: |
dist/databricks_*.whl
dist/databricks_*.tar.gz
release-signing-artifacts: true
release-signing-artifacts: true
22 changes: 11 additions & 11 deletions docs/external_hms_glue.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,38 +10,38 @@ External Hive Metastore Integration
* [Additional Considerations](#additional-considerations)
<!-- TOC -->

UCX works with both the default workspace metastore, or an external Hive metastore. This document outlines the current
UCX works with both the default workspace metastore, or an external Hive metastore. This document outlines the current
integration and how to set up UCX to work with your existing external metastore.

# Installation

The setup process follows the following steps

- UCX scan existing cluster policies, and Databricks SQL data access configuration for Spark configurations key that
- UCX scan existing cluster policies, and Databricks SQL data access configuration for Spark configurations key that
enables external Hive metastore:
- Spark config `spark.databricks.hive.metastore.glueCatalog.enabled=true` - for Glue Catalog
- Spark config containing prefixes `spark.sql.hive.metastore` - for external Hive metastore
- If a matching cluster policy is identified, UCX prompts the user with the following message:
_We have identified one or more cluster policies set up for an external metastore.
Would you like to set UCX to connect to the external metastore?_
- Selecting **Yes** will display a list of the matching policies and allow the user to select the appropriate policies.
- The chosen policy will be used as the template to set up UCX job clusters via a new policy. UCX will clone the
- The chosen policy will be used as the template to set up UCX job clusters via a new policy. UCX will clone the
necessary Spark configurations and data access configurations, e.g. Instance Profile over to this new policy.
- When prompted for an inventory database, please specify a new name instead of the default `ucx` to avoid conflict.
This is because the inventory database will be created in the external metastore, which is shared across multiple workspaces.
- UCX **DOES NOT** update the data access configuration for SQL Warehouses. This is because Databricks SQL settings apply
- UCX **DOES NOT** update the data access configuration for SQL Warehouses. This is because Databricks SQL settings apply
to all warehouses in a workspace, and can introduce unexpected changes to existing workload.

**Note**
As UCX uses both job clusters and SQL Warehouses, it is important to ensure that both are configured to use the same
As UCX uses both job clusters and SQL Warehouses, it is important to ensure that both are configured to use the same
external Hive metastore. If the SQL Warehouses are not configured for external Hive metastore, please manually update
the data access configuration. See [Enable data access configuration](https://learn.microsoft.com/en-us/azure/databricks/admin/sql/data-access-configuration) for more details

[[back to top](#external-hive-metastore-integration)]

# Manual Override

If the workspace does not have a cluster policy or SQL data access configuration for external Hive metastore, there are
If the workspace does not have a cluster policy or SQL data access configuration for external Hive metastore, there are
two options to manually enable this:
- *Pre-installation*: create a custer policy with the appropriate Spark configuration and data access for external metastore:
- See the following documentation pages for more details: [Glue catalog](https://docs.databricks.com/en/archive/external-metastores/aws-glue-metastore.html) and [External Hive Metastore](https://learn.microsoft.com/en-us/azure/databricks/archive/external-metastores/external-hive-metastore).
Expand Down Expand Up @@ -70,13 +70,13 @@ following the post-installation steps above.

# Assessment Workflow

Once UCX is set up with external Hive metastore the assessment workflow will scan tables & views from the external
Once UCX is set up with external Hive metastore the assessment workflow will scan tables & views from the external
Hive metastore instead of the default workspace metastore.

If the external Hive metastore is shared between multiple workspaces, please specify a different inventory
database name for each UCX installation. This is to avoid conflicts between the inventory databases.

As the inventory database is stored in the external Hive metastore, it can only be queried from a cluster or SQL warehouse
As the inventory database is stored in the external Hive metastore, it can only be queried from a cluster or SQL warehouse
with external Hive metastore configuration. The assessment dashboard will also fail if the SQL warehouse is not configured correctly.

[[back to top](#external-hive-metastore-integration)]
Expand All @@ -91,14 +91,14 @@ metastore is redundant and will be a no-op.

# Additional Considerations

If a workspace is set up with multiple external Hive metastores, you will need to plan the approach carefully. Below are
If a workspace is set up with multiple external Hive metastores, you will need to plan the approach carefully. Below are
a few considerations to keep in mind:
- You can have multiple UCX installations in a workspace, each set up with a different external Hive metastore. As the
SQL data access configuration is shared across the entire workspace, you will need to manually update them when running
each UCX installation.
- You can uninstall UCX and reinstall it with a different external Hive metastore. This still requires manual updates to
the SQL data access configuration, but it is a cleaner approach.
- You can manually modify the cluster policy and SQL data access configuration to point to the correct external Hive
- You can manually modify the cluster policy and SQL data access configuration to point to the correct external Hive
metastore, after UCX has been installed. This is the most flexible approach, but requires manual intervention.

[[back to top](#external-hive-metastore-integration)]
[[back to top](#external-hive-metastore-integration)]
4 changes: 2 additions & 2 deletions docs/group_name_conflict.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ Choose how to map the workspace groups:
[3] Match by External ID
[4] Regex Substitution
[5] Regex Matching
Enter a number between 0 and 5:
Enter a number between 0 and 5:
```

The user then input the Prefix/Suffix/Regular Expression.
Expand All @@ -41,4 +41,4 @@ Group Translation Scenarios:
| Prefix | prefix: [Prefix] | ^ | [Prefix] | [EMPTY] | data_engineers --> prod_data_engineers |
| Suffix | suffix: [Prefix] | $ | [Suffix] | [EMPTY] | data_engineers --> data_engineers_prod |
| Substitution | Search Regex: [Regex]<br/>Replace Text:[Replacement_Text] | [WS_Regex] | [ [Replacement_Text] | [Empty] | corp_tech_data_engineers --> prod_data_engineers |
| Partial Lookup | Workspace Regex: [WS_Regex]<br/> Account Regex: [Acct Regex] | [WS_Regex] | [Empty] | [Acct_Regex] | data_engineers(12345) --> data_engs(12345) |
| Partial Lookup | Workspace Regex: [WS_Regex]<br/> Account Regex: [Acct Regex] | [WS_Regex] | [Empty] | [Acct_Regex] | data_engineers(12345) --> data_engs(12345) |
Loading

0 comments on commit 6198a28

Please sign in to comment.