Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE]: Scan all notebooks and python files attached to every job and render a dashboard #1539

Closed
1 task done
Tracked by #1085
nfx opened this issue Apr 24, 2024 · 2 comments · Fixed by #1741
Closed
1 task done
Tracked by #1085
Assignees
Labels
feat/viz vizualizing UCX progress as a redash/lakeview dashboard feat/workflow triggered as a Databricks Job managed by UCX migrate/code Abstract Syntax Trees and other dark magic

Comments

@nfx
Copy link
Collaborator

nfx commented Apr 24, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Problem statement

  1. Jobs may contain multiple notebook tasks
  2. Notebook tasks may be written in python or sql
  3. but tasks could also be python files and packages
  4. python files may live on dbfs or git repo

Proposed Solution

scan notebooks and python files as dependencies of a job and refresh a dashboard with results, that link to notebooks/files in the same workspace

Additional Context

No response

@nfx nfx added migrate/code Abstract Syntax Trees and other dark magic feat/viz vizualizing UCX progress as a redash/lakeview dashboard feat/workflow triggered as a Databricks Job managed by UCX labels Apr 24, 2024
@nfx nfx added this to UCX Apr 24, 2024
@github-project-automation github-project-automation bot moved this to Triage in UCX Apr 24, 2024
@nfx nfx changed the title [FEATURE]: Scan all notebooks and python files attached to a job and render a dashboard which shows problems linking to source files [FEATURE]: Scan all notebooks and python files attached to every job and render a dashboard which shows problems linking to source files Apr 24, 2024
@nfx nfx moved this from Triage to Active Backlog in UCX Apr 24, 2024
@nfx nfx self-assigned this May 2, 2024
nfx added a commit that referenced this issue May 7, 2024
```mermaid
flowchart TD
    job -->|has many| job_task
    job_task -.-> notebook_task
    job_task -.-> wheel_task 

    job -.-> git_source

    job_task -.->|execute on| interactive_cluster
    interactive_cluster -.-> library

    job_task -.-> library
    library -.-> wheel_on_dbfs
    library -.-> wheel_on_wsfs
    library -.-> wheel_on_volumes
    library -.-> egg_on_dbfs
    library -.-> egg_on_wsfs
    library -.-> pypi
    wheel_task -.-> wheel_on_dbfs
    wheel_task -.-> wheel_on_wsfs

    wheel_on_dbfs -.-> python_file
    wheel_on_wsfs -.-> python_file
    egg_on_dbfs -.-> python_file
    egg_on_wsfs -.-> python_file
    pypi -.-> python_file
    wsfs_file -.-> python_file
    python_file -.->|import| python_file
    notebook_task -.-> notebook
    notebook -.->|import| python_file
    notebook -.->|can run| notebook

    job_task -.-> dependency_graph
    python_file --> dependency_graph
    notebook --> dependency_graph

    git_source -.-> python_file
    git_source -.-> notebook
    lint_local_code_cli --> dependency_graph

    workflow_linter --> dependency_graph
    workflow_linter -.-> job_problems
    dependency_graph -.-> job_problems
    job_problems -.->|viz| redash_dashboard
```

This PR adds baseline for linting workflows

Related to:
- #1542 
- #1541
- #1540
- #1539
- #1382
- #1204
- #1203
- #1085

closes #1559
closes #1468
closes #1286
@JCZuurmond
Copy link
Member

@JCZuurmond : Split out dashboard creation

@JCZuurmond JCZuurmond changed the title [FEATURE]: Scan all notebooks and python files attached to every job and render a dashboard which shows problems linking to source files [FEATURE]: Scan all notebooks and python files attached to every job and render a dashboard May 22, 2024
@JCZuurmond
Copy link
Member

Updated title as split out to #1740

@nfx nfx closed this as completed in #1741 May 22, 2024
nfx added a commit that referenced this issue May 22, 2024
#1741)

## Changes
Show the code problems found by the experimental workflow linter
workflow in the migration dashboard.

### Linked issues
Resolves #1539 

### Functionality 

- [ ] added relevant user documentation
- [ ] added new CLI command
- [ ] modified existing command: `databricks labs ucx ...`
- [ ] added a new workflow
- [x] modified existing workflow: `experimental-workflow-linter`
- [ ] added a new table
- [ ] modified existing table: `...`

### Tests
<!-- How is this tested? Please see the checklist below and also
describe any other relevant tests -->

- [ ] manually tested
- [ ] added unit tests
- [ ] added integration tests
- [ ] verified on staging environment (screenshot attached)

---------

Co-authored-by: Serge Smertin <[email protected]>
@github-project-automation github-project-automation bot moved this from Active Backlog to Archive in UCX May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat/viz vizualizing UCX progress as a redash/lakeview dashboard feat/workflow triggered as a Databricks Job managed by UCX migrate/code Abstract Syntax Trees and other dark magic
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants