Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistent 0-based line tracking for linters #1855

Merged
merged 17 commits into from
Jun 7, 2024
Merged

Conversation

asnare
Copy link
Contributor

@asnare asnare commented Jun 6, 2024

Changes

This PR includes the following changes with respect to how line information is tracked and reported by the linting code:

  • We now use 0-based positions everywhere, instead of a mix of 0-based and 1-based. This particular choice is because that's also the convention used by the LSP protocol.
  • An off-by-one error has been fixed in the way the notebook parser tracks cell position offsets.
  • Existing tests have been checked and updated to verify the 0-based line positions.
  • A new test for cell parsing has been introduced.

For notebooks, there is an issue of what the position information means: for now the line position will refer to the 0-based line number in the underlying notebook file. (There is a related issue with respect to tracking column positions for MAGIC cells, but that is unaddressed by this PR.)

Linked issues

Blocks #1843.

asnare added 3 commits June 6, 2024 16:58
…racking.

Lines are now 1-based everywhere instead of a mix of 0-based and 1-based.

In addition an off-by-one error in the offset tracking for notebook cells has been fixed.
@asnare asnare self-assigned this Jun 6, 2024
Copy link

codecov bot commented Jun 6, 2024

Codecov Report

Attention: Patch coverage is 89.28571% with 6 lines in your changes missing coverage. Please review.

Project coverage is 89.26%. Comparing base (a867412) to head (5211d32).

Files Patch % Lines
...databricks/labs/ucx/source_code/linters/imports.py 81.81% 2 Missing ⚠️
...databricks/labs/ucx/source_code/notebooks/cells.py 80.00% 1 Missing and 1 partial ⚠️
src/databricks/labs/ucx/source_code/graph.py 85.71% 1 Missing ⚠️
...databricks/labs/ucx/source_code/linters/pyspark.py 85.71% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1855      +/-   ##
==========================================
- Coverage   89.27%   89.26%   -0.01%     
==========================================
  Files          95       95              
  Lines       12007    12022      +15     
  Branches     2108     2110       +2     
==========================================
+ Hits        10719    10732      +13     
- Misses        877      879       +2     
  Partials      411      411              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@nfx nfx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

everything has to be zero-based

src/databricks/labs/ucx/source_code/base.py Outdated Show resolved Hide resolved
@asnare asnare changed the title Consistent 1-based line tracking for linters Consistent 0-based line tracking for linters Jun 7, 2024
@asnare asnare marked this pull request as ready for review June 7, 2024 14:21
@asnare asnare requested review from a team and larsgeorge-db June 7, 2024 14:21
@asnare asnare requested a review from nfx June 7, 2024 14:22
Copy link

github-actions bot commented Jun 7, 2024

❌ 188/189 passed, 1 failed, 23 skipped, 3h53m2s total

❌ test_job_linter_some_notebook_graph_with_problems: AssertionError: assert {'second_note.../mnt/foo/bar'} == {'second_note.../mnt/foo/bar'} (17.707s)
AssertionError: assert {'second_note.../mnt/foo/bar'} == {'second_note.../mnt/foo/bar'}
  
  Extra items in the left set:
  'second_notebook:3 [dbfs-usage] Deprecated file system path in call to: /mnt/something'
  'some_file.py:0 [direct-filesystem-access] The use of TEST_SCHEMA dbfs: references is deprecated: /mnt/foo/bar'
  'some_file.py:0 [dbfs-usage] Deprecated file system path in call to: /mnt/foo/bar'
  'second_notebook:3 [direct-filesystem-access] The use of TEST_SCHEMA dbfs: references is deprecated: /mnt/something'
  Extra items in the right set:
  'some_file.py:1 [dbfs-usage] Deprecated file system path in call to: /mnt/foo/bar'
  'some_file.py:1 [direct-filesystem-access] The use of TEST_SCHEMA dbfs: references is deprecated: /mnt/foo/bar'
  'second_notebook:4 [direct-filesystem-access] The use of TEST_SCHEMA dbfs: references is deprecated: /mnt/something'
  'second_notebook:4 [dbfs-usage] Deprecated file system path in call to: /mnt/something'
  
  Full diff:
    {
  -     'second_notebook:4 [dbfs-usage] Deprecated file system path in call to: '
  ?                      ^
  +     'second_notebook:3 [dbfs-usage] Deprecated file system path in call to: '
  ?                      ^
        '/mnt/something',
  -     'second_notebook:4 [direct-filesystem-access] The use of TEST_SCHEMA dbfs: '
  ?                      ^
  +     'second_notebook:3 [direct-filesystem-access] The use of TEST_SCHEMA dbfs: '
  ?                      ^
        'references is deprecated: /mnt/something',
  -     'some_file.py:1 [dbfs-usage] Deprecated file system path in call to: '
  ?                   ^
  +     'some_file.py:0 [dbfs-usage] Deprecated file system path in call to: '
  ?                   ^
        '/mnt/foo/bar',
  -     'some_file.py:1 [direct-filesystem-access] The use of TEST_SCHEMA dbfs: '
  ?                   ^
  +     'some_file.py:0 [direct-filesystem-access] The use of TEST_SCHEMA dbfs: '
  ?                   ^
        'references is deprecated: /mnt/foo/bar',
    }
[gw1] linux -- Python 3.10.14 /home/runner/work/ucx/ucx/.venv/bin/python
14:27 DEBUG [databricks.labs.ucx.mixins.fixtures] added notebook fixture: /Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/linter-5mzL/main
14:27 INFO [databricks.labs.ucx.mixins.fixtures] Job: https://DATABRICKS_HOST#job/533656757416295
14:27 DEBUG [databricks.labs.ucx.mixins.fixtures] added job fixture: CreateResponse(job_id=533656757416295)
14:27 DEBUG [databricks.labs.ucx.mixins.fixtures] added notebook fixture: /Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/linter-5mzL/second_notebook
14:27 DEBUG [databricks.labs.ucx.source_code.linters.files] Resolving unknown import: some_file
14:28 WARNING [databricks.labs.ucx.source_code.jobs] Found job problems:
/Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/linter-5mzL/second_notebook:3 [direct-filesystem-access] The use of TEST_SCHEMA dbfs: references is deprecated: /mnt/something
/Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/linter-5mzL/second_notebook:3 [dbfs-usage] Deprecated file system path in call to: /mnt/something
/Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/linter-5mzL/some_file.py:0 [direct-filesystem-access] The use of TEST_SCHEMA dbfs: references is deprecated: /mnt/foo/bar
/Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/linter-5mzL/some_file.py:0 [dbfs-usage] Deprecated file system path in call to: /mnt/foo/bar
14:27 DEBUG [databricks.labs.ucx.mixins.fixtures] added notebook fixture: /Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/linter-5mzL/main
14:27 INFO [databricks.labs.ucx.mixins.fixtures] Job: https://DATABRICKS_HOST#job/533656757416295
14:27 DEBUG [databricks.labs.ucx.mixins.fixtures] added job fixture: CreateResponse(job_id=533656757416295)
14:27 DEBUG [databricks.labs.ucx.mixins.fixtures] added notebook fixture: /Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/linter-5mzL/second_notebook
14:27 DEBUG [databricks.labs.ucx.source_code.linters.files] Resolving unknown import: some_file
14:28 WARNING [databricks.labs.ucx.source_code.jobs] Found job problems:
/Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/linter-5mzL/second_notebook:3 [direct-filesystem-access] The use of TEST_SCHEMA dbfs: references is deprecated: /mnt/something
/Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/linter-5mzL/second_notebook:3 [dbfs-usage] Deprecated file system path in call to: /mnt/something
/Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/linter-5mzL/some_file.py:0 [direct-filesystem-access] The use of TEST_SCHEMA dbfs: references is deprecated: /mnt/foo/bar
/Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/linter-5mzL/some_file.py:0 [dbfs-usage] Deprecated file system path in call to: /mnt/foo/bar
14:28 DEBUG [databricks.labs.ucx.mixins.fixtures] clearing 1 job fixtures
14:28 DEBUG [databricks.labs.ucx.mixins.fixtures] removing job fixture: CreateResponse(job_id=533656757416295)
14:28 DEBUG [databricks.labs.ucx.mixins.fixtures] clearing 2 notebook fixtures
14:28 DEBUG [databricks.labs.ucx.mixins.fixtures] removing notebook fixture: /Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/linter-5mzL/main
14:28 DEBUG [databricks.labs.ucx.mixins.fixtures] removing notebook fixture: /Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/linter-5mzL/second_notebook
14:28 INFO [databricks.labs.ucx.mixins.fixtures] Schema hive_metastore.ucx_s7bch: https://DATABRICKS_HOST/explore/data/hive_metastore/ucx_s7bch
14:28 DEBUG [databricks.labs.ucx.mixins.fixtures] added schema fixture: SchemaInfo(browse_only=None, catalog_name='hive_metastore', catalog_type=None, comment=None, created_at=None, created_by=None, effective_predictive_optimization_flag=None, enable_predictive_optimization=None, full_name='hive_metastore.ucx_s7bch', metastore_id=None, name='ucx_s7bch', owner=None, properties=None, schema_id=None, storage_location=None, storage_root=None, updated_at=None, updated_by=None)
14:28 DEBUG [databricks.labs.ucx.install] Cannot find previous installation: Path (/Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/.RW6h/config.yml) doesn't exist.
14:28 INFO [databricks.labs.ucx.install] Please answer a couple of questions to configure Unity Catalog migration
14:28 INFO [databricks.labs.ucx.installer.hms_lineage] HMS Lineage feature creates one system table named system.hms_to_uc_migration.table_access and helps in your migration process from HMS to UC by allowing you to programmatically query HMS lineage data.
14:28 INFO [databricks.labs.ucx.install] Fetching installations...
14:28 INFO [databricks.labs.ucx.installer.policy] Creating UCX cluster policy.
14:28 DEBUG [tests.integration.conftest] Waiting for clusters to start...
14:28 DEBUG [tests.integration.conftest] Waiting for clusters to start...
14:28 INFO [databricks.labs.ucx.install] Deleting UCX v0.25.1+2620240607142810 from https://DATABRICKS_HOST
14:28 INFO [databricks.labs.ucx.install] Deleting inventory database ucx_s7bch
14:28 INFO [databricks.labs.ucx.install] Deleting jobs
14:28 ERROR [databricks.labs.ucx.install] No jobs present or jobs already deleted
14:28 INFO [databricks.labs.ucx.install] Deleting cluster policy
14:28 INFO [databricks.labs.ucx.install] Deleting secret scope
14:28 INFO [databricks.labs.ucx.install] UnInstalling UCX complete
14:28 DEBUG [databricks.labs.ucx.mixins.fixtures] clearing 0 workspace user fixtures
14:28 DEBUG [databricks.labs.ucx.mixins.fixtures] clearing 0 account group fixtures
14:28 DEBUG [databricks.labs.ucx.mixins.fixtures] clearing 0 workspace group fixtures
14:28 DEBUG [databricks.labs.ucx.mixins.fixtures] clearing 0 table fixtures
14:28 DEBUG [databricks.labs.ucx.mixins.fixtures] clearing 0 table fixtures
14:28 DEBUG [databricks.labs.ucx.mixins.fixtures] clearing 1 schema fixtures
14:28 DEBUG [databricks.labs.ucx.mixins.fixtures] removing schema fixture: SchemaInfo(browse_only=None, catalog_name='hive_metastore', catalog_type=None, comment=None, created_at=None, created_by=None, effective_predictive_optimization_flag=None, enable_predictive_optimization=None, full_name='hive_metastore.ucx_s7bch', metastore_id=None, name='ucx_s7bch', owner=None, properties=None, schema_id=None, storage_location=None, storage_root=None, updated_at=None, updated_by=None)
[gw1] linux -- Python 3.10.14 /home/runner/work/ucx/ucx/.venv/bin/python

Running from acceptance #3799

@asnare asnare marked this pull request as draft June 7, 2024 15:07
@asnare asnare marked this pull request as ready for review June 7, 2024 15:21
## Changes

This PR is stacked on #1855 and includes the following changes:

- Updates the fixtures for the functional linter tests so that the
expected offsets are correct.
- Updates the functional tests to also validate the region offsets are
as expected.
- The test itself has also been updated so that test failures are
reported as lists of missing/unexpected comments in the form used by the
functional tests.

### Tests

- [ ] manually tested
- [X] added unit tests
- [ ] added integration tests
- [ ] verified on staging environment (screenshot attached)
@nfx nfx had a problem deploying to account-admin June 7, 2024 15:53 — with GitHub Actions Failure
Copy link
Collaborator

@nfx nfx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm

@nfx nfx merged commit 2389444 into main Jun 7, 2024
7 of 8 checks passed
@nfx nfx deleted the fix/consistent-region-positions branch June 7, 2024 17:19
nfx added a commit that referenced this pull request Jun 7, 2024
* Added migration for Python linters from `ast` (standard library) to `astroid` package ([#1835](#1835)). In this release, the Python linters have been migrated from the `ast` package in the standard library to the `astroid` package, version 3.2.2 or higher, with minimal inference implementation. This change includes updates to the `pyproject.toml` file to include `astroid` as a dependency and bump the version of `pylint`. No changes have been made to user documentation, CLI commands, workflows, or tables. Testing has been conducted through the addition of unit tests. This update aims to improve the functionality and accuracy of the Python linters.
* Added workflow linter for delta live tables task ([#1825](#1825)). In this release, there are updates to the `_register_pipeline_task` method in the `jobs.py` file. The method now checks for the existence of the pipeline and its libraries, and registers each notebook or jar library found in the pipeline as a task. If the library is a Maven or file type, it will raise a `DependencyProblem` as it is not yet implemented. Additionally, new functions and tests have been added to improve the quality and functionality of the project, including a workflow linter for Delta Live Tables (DLT) tasks and a linter that checks for issues with specified DLT tasks. A new method, `test_workflow_linter_dlt_pipeline_task`, has been added to test the workflow linter for DLT tasks, verifying the correct creation and functioning of the pipeline task and checking the building of the dependency graph for the task. These changes enhance the project's ability to ensure the proper configuration and correctness of DLT tasks and prevent potential issues.
* Consistent 0-based line tracking for linters ([#1855](#1855)). 0-based line tracking has been consistently implemented for linters in various files and methods throughout the project, addressing issue [#1855](#1855). This change includes removing direct filesystem references in favor of using the Unity Catalog for table migration and format changes. It also updates comments and warnings to improve clarity and consistency. In particular, the spark-table.py file has been updated to ensure that the spark.log.level is set correctly for UC Shared Clusters, and that the Spark Driver JVM is no longer accessed directly. The new file, simple_notebook.py, demonstrates the consistent line tracking for linters across different cell types, such as Python, Markdown, SQL, Scala, Shell, Pip, and Python (with magic commands). These changes aim to improve the accuracy and reliability of linters, making the codebase more maintainable and adaptable.

Dependency updates:

 * Updated sqlglot requirement from <24.2,>=23.9 to >=23.9,<25.1 ([#1856](#1856)).
@nfx nfx mentioned this pull request Jun 7, 2024
nfx added a commit that referenced this pull request Jun 7, 2024
* Added migration for Python linters from `ast` (standard library) to
`astroid` package
([#1835](#1835)). In this
release, the Python linters have been migrated from the `ast` package in
the standard library to the `astroid` package, version 3.2.2 or higher,
with minimal inference implementation. This change includes updates to
the `pyproject.toml` file to include `astroid` as a dependency and bump
the version of `pylint`. No changes have been made to user
documentation, CLI commands, workflows, or tables. Testing has been
conducted through the addition of unit tests. This update aims to
improve the functionality and accuracy of the Python linters.
* Added workflow linter for delta live tables task
([#1825](#1825)). In this
release, there are updates to the `_register_pipeline_task` method in
the `jobs.py` file. The method now checks for the existence of the
pipeline and its libraries, and registers each notebook or jar library
found in the pipeline as a task. If the library is a Maven or file type,
it will raise a `DependencyProblem` as it is not yet implemented.
Additionally, new functions and tests have been added to improve the
quality and functionality of the project, including a workflow linter
for Delta Live Tables (DLT) tasks and a linter that checks for issues
with specified DLT tasks. A new method,
`test_workflow_linter_dlt_pipeline_task`, has been added to test the
workflow linter for DLT tasks, verifying the correct creation and
functioning of the pipeline task and checking the building of the
dependency graph for the task. These changes enhance the project's
ability to ensure the proper configuration and correctness of DLT tasks
and prevent potential issues.
* Consistent 0-based line tracking for linters
([#1855](#1855)). 0-based
line tracking has been consistently implemented for linters in various
files and methods throughout the project, addressing issue
[#1855](#1855). This change
includes removing direct filesystem references in favor of using the
Unity Catalog for table migration and format changes. It also updates
comments and warnings to improve clarity and consistency. In particular,
the spark-table.py file has been updated to ensure that the
spark.log.level is set correctly for UC Shared Clusters, and that the
Spark Driver JVM is no longer accessed directly. The new file,
simple_notebook.py, demonstrates the consistent line tracking for
linters across different cell types, such as Python, Markdown, SQL,
Scala, Shell, Pip, and Python (with magic commands). These changes aim
to improve the accuracy and reliability of linters, making the codebase
more maintainable and adaptable.

Dependency updates:

* Updated sqlglot requirement from <24.2,>=23.9 to >=23.9,<25.1
([#1856](#1856)).
Copy link
Member

@JCZuurmond JCZuurmond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants