-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detect dbutils.notebook.run #1284
Detect dbutils.notebook.run #1284
Conversation
add sample with RUN cell fix issue where non-PI comments preceding language PI would prevent language PI detection
* main: remove `isort` (databrickslabs#1280) Addressed Issue with Disabled Feature in certain regions (databrickslabs#1275) Improve documentation (databrickslabs#1162) Add roadmap workflows and tasks to Table Migration Workflow document (databrickslabs#1274) Fix integration test with new DeployedWorkflows (databrickslabs#1250) Document troubleshooting guide (databrickslabs#1226) Split `DeployedWorkflows` out of `WorkflowsDeployment` (databrickslabs#1248) Inject `_TASKS` via constructor to `WorkflowsDeployment` instead of a global variable (databrickslabs#1247) Decouple `InstallState` from `WorkspaceDeployment` constructor Add document for table migration workflow (databrickslabs#1229) Decouple `InstallState` from `WorkflowsDeployment` constructor (databrickslabs#1246)
* main: Build notebook dependency graph for `%run` cells (databrickslabs#1279) # Conflicts: # src/databricks/labs/ucx/source_code/notebook.py # tests/unit/source_code/test_notebook.py
Co-authored-by: Cor <[email protected]>
Co-authored-by: Cor <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1284 +/- ##
==========================================
- Coverage 90.02% 89.23% -0.79%
==========================================
Files 62 70 +8
Lines 7430 8242 +812
Branches 1335 1454 +119
==========================================
+ Hits 6689 7355 +666
- Misses 470 589 +119
- Partials 271 298 +27 ☔ View full report in Codecov by Sentry. |
class MatchingVisitor(ast.NodeVisitor): | ||
|
||
def __init__(self, node_type: type, match_nodes: list[tuple[str, type]]): | ||
self.matched_nodes: list[ast.AST] = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if something is public, let's expose this as a method - would be easier to refactor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
class PythonLinter(ASTLinter, Linter): | ||
def lint(self, code: str) -> Iterable[Advice]: | ||
self.parse(code) | ||
nodes = self.locate(ast.Call, [("run", ast.Attribute), ("notebook", ast.Attribute), ("dbutils", ast.Name)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if the code is self._dbutils.notebook.run(...)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it's theoretically possible to do that but I can't think of any benefit for users to keep a local private copy of a public API ?
To your point, we can't address this edge cases for now. Tbh, I can think of thousands of them, such as:
run_notebook = dbutils.notebook.run
.../...
run_notebook('some notebook')
I have created ticket #1334 for that
def __init__(self): | ||
self._module: ast.Module | None = None | ||
|
||
def parse(self, code: str): | ||
self._module = ast.parse(code) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def __init__(self): | |
self._module: ast.Module | None = None | |
def parse(self, code: str): | |
self._module = ast.parse(code) | |
def __init__(self, code: str): | |
self._module = ast.parse(code) |
so that we either fail initialising or have a valid state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I respectfully disagree. A constructor should not perform processing, especially not when that processing may fail under uncontrolled conditions (source code received from outside)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Factory is just fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually the reason for this is that ASTLinter is a base class for PythonLinter, which also needs to follow Linter conventions... changing that now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -94,7 +150,8 @@ def is_runnable(self) -> bool: | |||
statements = parse_sql(self._original_code) | |||
return len(statements) > 0 | |||
except SQLParseError: | |||
return False | |||
sqlglot_logger.warning(f"Failed to parse SQL using 'sqlglot': {self._original_code}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's also log the sqlglot error, so that we can create issues over there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
self._new_cell = args[3] | ||
# PI stands for Processing Instruction | ||
# pylint: disable=invalid-name | ||
self._requires_isolated_PI = args[3] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self._requires_isolated_PI = args[3] | |
self._requires_isolated_processing_instruction = args[3] |
:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed the warning, but I guess people looking into this code would be informed enough by the comment ?
if cell_language.requires_isolated_pi: | ||
if line.startswith(LANGUAGE_PREFIX): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if cell_language.requires_isolated_pi: | |
if line.startswith(LANGUAGE_PREFIX): | |
if cell_language.requires_isolated_pi and line.startswith(LANGUAGE_PREFIX): |
def _make_runnable(self, lines: list[str], cell_language: CellLanguage): | ||
prefix = f"{self.comment_prefix} {MAGIC_PREFIX} " | ||
prefix_len = len(prefix) | ||
# pylint: disable=too-many-nested-blocks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# pylint: disable=too-many-nested-blocks |
this one can be avoided
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -225,6 +292,41 @@ def make_cell(lines_: list[str]): | |||
|
|||
return cells | |||
|
|||
def _make_runnable(self, lines: list[str], cell_language: CellLanguage): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def _make_runnable(self, lines: list[str], cell_language: CellLanguage): | |
def _unwrap_magic(self, lines: list[str], cell_language: CellLanguage): |
i think it's a better name for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
line = f"{cell_language.comment_prefix} {COMMENT_PI}{line}" | ||
lines[i] = line | ||
|
||
def make_unrunnable(self, code: str, cell_language: CellLanguage) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def make_unrunnable(self, code: str, cell_language: CellLanguage) -> str: | |
def as_magic(self, code: str, cell_language: CellLanguage) -> str: |
might be a better name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
def matched_nodes(self): | ||
return self._matched_nodes | ||
|
||
# pylint: disable=invalid-name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# pylint: disable=invalid-name | |
# visit_Call is the invalid naming convention, but it is required for NodeVisitor | |
# pylint: disable=invalid-name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -38,6 +51,43 @@ def build_dependency_graph(self, parent: DependencyGraph): | |||
raise NotImplementedError() | |||
|
|||
|
|||
class PythonLinter(Linter): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this to astlinter.py ( -> python_analysis.py
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
path = cls.get_dbutils_notebook_run_path_arg(node) | ||
if isinstance(path, ast.Constant): | ||
return Advisory( | ||
'notebook-auto-migrate', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'notebook-auto-migrate', | |
'dbutils-notebook-run-literal', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
node.end_col_offset or 0, | ||
) | ||
return Advisory( | ||
'notebook-manual-migrate', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'notebook-manual-migrate', | |
'dbutils-notebook-run-dynamic', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
* main: Adding CSV, JSON and include path in mounts (databrickslabs#1329) Add missing step sync-workspace-info (databrickslabs#1330) disable annotation-unchecked mypy warning (databrickslabs#1331) Use service factory to resolve object dependencies (databrickslabs#1209)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
* main: Integrate detection of notebook dependencies (databrickslabs#1338) # Conflicts: # src/databricks/labs/ucx/source_code/notebook.py
merged |
Changes
Linked issues
Resolves #1200