Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve catch-all handling and avoid some pylint suppressions #1919

Merged
merged 28 commits into from
Jul 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
2f7ec58
support magic commands in python cells
ericvergnaud Jun 13, 2024
e0796d6
address comments
ericvergnaud Jun 14, 2024
9a7c1bd
address comments
ericvergnaud Jun 14, 2024
fe7de63
Merge branch 'main' into support-magic-commands-in-python-cells
ericvergnaud Jun 14, 2024
f24ab05
fix merge issues
ericvergnaud Jun 14, 2024
5433d90
raise problem with unsupported magics
ericvergnaud Jun 14, 2024
13258d5
parse magic commands when linting
ericvergnaud Jun 14, 2024
5327656
remove cyclic imports
ericvergnaud Jun 17, 2024
330bcc6
move 'magic command' code to 'cells'
ericvergnaud Jun 17, 2024
8509813
Merge branch 'main' into support-magic-commands-in-python-cells
ericvergnaud Jun 17, 2024
a6a13c2
support ! as line magic prefix
ericvergnaud Jun 18, 2024
a862663
Merge branch 'main' into support-magic-commands-in-python-cells
ericvergnaud Jun 18, 2024
24851a2
Cache the regex used for some magic detection.
asnare Jun 19, 2024
da7133a
Access protected member under test via a friend class.
asnare Jun 19, 2024
5fce3ba
Target python parsing errors more narrowly, and test the way they're …
asnare Jun 19, 2024
2a169e1
Annotate some test fixture types.
asnare Jun 19, 2024
df1878c
Type hint the fixture.
asnare Jun 19, 2024
4f31198
Log exception on internal error for analysis and support.
asnare Jun 19, 2024
ef94c88
When an internal error occurs, return all commands/problems so far in…
asnare Jun 19, 2024
a42b314
Don't bother checking the logs; log-capturing seems to be flaky.
asnare Jun 19, 2024
274a2a9
Remove unused import.
asnare Jun 19, 2024
ef3ac4f
Add a (deliberately) failing test to highlight something that we don'…
asnare Jun 19, 2024
33de8a1
Merge branch 'main' into support-magic-commands-in-python-cells-no-cheat
asnare Jul 3, 2024
d023f05
Restore file from main; this branch does not intend to modify this file.
asnare Jul 3, 2024
63f6e49
Merge branch 'main' into support-magic-commands-in-python-cells-no-cheat
asnare Jul 3, 2024
3473b9f
Merge branch 'main' into support-magic-commands-in-python-cells-no-cheat
asnare Jul 9, 2024
6006d56
Merge branch 'main' into support-magic-commands-in-python-cells-no-cheat
asnare Jul 9, 2024
193a4d8
Merge branch 'main' into support-magic-commands-in-python-cells-no-cheat
nfx Jul 10, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 11 additions & 6 deletions src/databricks/labs/ucx/source_code/notebooks/cells.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
from pathlib import Path

from astroid import Call, Const, ImportFrom, Name, NodeNG # type: ignore
from astroid.exceptions import AstroidSyntaxError # type: ignore
from sqlglot import parse as parse_sql, ParseError as SQLParseError

from databricks.sdk.service.workspace import Language
Expand Down Expand Up @@ -403,7 +404,8 @@ def build_graph_from_python_source(self, python_code: str) -> list[DependencyPro
problems: list[DependencyProblem] = []
try:
tree = Tree.normalize_and_parse(python_code)
except Exception as e: # pylint: disable=broad-except
except AstroidSyntaxError as e:
logger.debug(f"Could not parse Python code: {python_code}", exc_info=True)
problems.append(DependencyProblem('parse-error', f"Could not parse Python code: {e}"))
return problems
syspath_changes = SysPathChange.extract_from_tree(self._context.session_state, tree)
Expand Down Expand Up @@ -485,11 +487,11 @@ def extract_from_tree(
nodes = tree.locate(Call, [("magic_command", Name)])
for command in cls._make_commands_for_magic_command_call_nodes(nodes):
commands.append(command)
return commands, problems
except Exception as e: # pylint: disable=broad-except
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this one more narrow?

logger.debug(f"Internal error while checking magic commands in tree: {tree.root}", exc_info=True)
problem = problem_factory('internal-error', f"While checking magic commands: {e}", tree.root)
problems.append(problem)
return [], problems
return commands, problems

@classmethod
def _make_commands_for_magic_command_call_nodes(cls, nodes: list[Call]):
Expand Down Expand Up @@ -527,8 +529,11 @@ def build_dependency_graph(self, graph: DependencyGraph) -> list[DependencyProbl
return [DependencyProblem("library-install-failed", "Missing arguments after 'pip install'")]
return graph.register_library(*argv[2:]) # Skipping %pip install

@staticmethod
def _split(code) -> list[str]:
# Cache re-used regex (and ensure issues are raised during class init instead of upon first use).
_splitter = re.compile(r"(?<!\\)\n")

@classmethod
def _split(cls, code: str) -> list[str]:
"""Split pip cell code into multiple arguments

Note:
Expand All @@ -537,7 +542,7 @@ def _split(code) -> list[str]:
Sources:
https://docs.databricks.com/en/libraries/notebooks-python-libraries.html#manage-libraries-with-pip-commands
"""
match = re.search(r"(?<!\\)\n", code)
match = cls._splitter.search(code)
if match:
code = code[: match.start()] # Remove code after non-escaped newline
code = code.replace("\\\n", " ")
Expand Down
2 changes: 1 addition & 1 deletion tests/unit/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ def __repr__(self):


@pytest.fixture
def mock_path_lookup():
def mock_path_lookup() -> PathLookup:
return MockPathLookup()


Expand Down
3 changes: 2 additions & 1 deletion tests/unit/source_code/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
from databricks.labs.ucx.source_code.known import KnownList
from databricks.labs.ucx.source_code.linters.files import ImportFileResolver, FileLoader
from databricks.labs.ucx.source_code.notebooks.loaders import NotebookLoader, NotebookResolver
from databricks.labs.ucx.source_code.path_lookup import PathLookup
from databricks.labs.ucx.source_code.python_libraries import PythonLibraryResolver


Expand Down Expand Up @@ -48,7 +49,7 @@ def extended_test_index():


@pytest.fixture
def simple_dependency_resolver(mock_path_lookup):
def simple_dependency_resolver(mock_path_lookup: PathLookup) -> DependencyResolver:
allow_list = KnownList()
library_resolver = PythonLibraryResolver(allow_list)
notebook_resolver = NotebookResolver(NotebookLoader())
Expand Down
63 changes: 63 additions & 0 deletions tests/unit/source_code/notebooks/test_cells.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,15 @@
from databricks.labs.ucx.source_code.linters.files import FileLoader, ImportFileResolver
from databricks.labs.ucx.source_code.linters.python_ast import Tree
from databricks.labs.ucx.source_code.notebooks.cells import CellLanguage, PipCell, PipMagic, MagicCommand
from databricks.labs.ucx.source_code.notebooks.cells import (
GraphBuilder,
PythonCell,
)
from databricks.labs.ucx.source_code.notebooks.loaders import (
NotebookResolver,
NotebookLoader,
)
from databricks.labs.ucx.source_code.path_lookup import PathLookup
from databricks.labs.ucx.source_code.python_libraries import PythonLibraryResolver
from databricks.labs.ucx.source_code.known import KnownList

Expand Down Expand Up @@ -167,6 +172,58 @@ def test_pip_cell_build_dependency_graph_handles_multiline_code():
graph.register_library.assert_called_once_with("databricks")


def test_graph_builder_parse_error(
simple_dependency_resolver: DependencyResolver, mock_path_lookup: PathLookup
) -> None:
"""Check that internal parsing errors are caught and logged."""
# Fixture.
dependency = Dependency(FileLoader(), Path(""))
graph = DependencyGraph(dependency, None, simple_dependency_resolver, mock_path_lookup, CurrentSessionState())
graph.new_graph_builder_context()
builder = GraphBuilder(graph.new_graph_builder_context())

# Run the test.
problems = builder.build_graph_from_python_source("this is not valid python")

# Check results.
assert [
problem
for problem in problems
if problem.code == "parse-error" and problem.message.startswith("Could not parse Python code")
]


def test_parses_python_cell_with_magic_commands(simple_dependency_resolver, mock_path_lookup):
code = """
a = 'something'
%pip install databricks
b = 'else'
"""
cell = PythonCell(code, original_offset=1)
dependency = Dependency(FileLoader(), Path(""))
graph = DependencyGraph(dependency, None, simple_dependency_resolver, mock_path_lookup, CurrentSessionState())
problems = cell.build_dependency_graph(graph)
assert not problems


@pytest.mark.xfail(reason="Line-magic as an expression is not supported ", strict=True)
def test_python_cell_with_expression_magic(
simple_dependency_resolver: DependencyResolver, mock_path_lookup: PathLookup
) -> None:
"""Line magic (%) can be used in places where expressions are expected; check that this is handled."""
# Fixture
code = "current_directory = %pwd"
cell = PythonCell(code, original_offset=1)
dependency = Dependency(FileLoader(), Path(""))
graph = DependencyGraph(dependency, None, simple_dependency_resolver, mock_path_lookup, CurrentSessionState())

# Run the test
problems = cell.build_dependency_graph(graph)

# Verify there were no problems.
assert not problems


@pytest.mark.parametrize(
"code,split",
[
Expand All @@ -193,6 +250,12 @@ def test_pip_cell_build_dependency_graph_handles_multiline_code():
],
)
def test_pip_magic_split(code, split):
# Avoid direct protected access to the _split method.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oke, that is an interesting workaround. I am not settled on the protected access in test cases, I like them sometimes, when it is clear unit testable functionality, like here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When code has been well-designed with testing in mind and makes good use of functional composition, I agree it can often be avoided.

However in practice most codebases don't function that way and the fixtures for unit tests are (by necessity) tightly coupled to the implementation of whatever is being tested. This means better-than-public access is frequently required.

My preferred approach here would be to relax the linting rules so that unit tests are allowed to access protected attributes. The current situation is that:

  • There are reasons for doing it.
  • The linting rules don't allow it, and there is no effective override due to the CI checks.
  • Combined, these lead to code such as this that is worse than allowing it in the first place.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with allowing the protected access in the tests. I ran into the same problem. Though, I think it good practice to nudge the test to test the public API's in the first place, but also have the option to narrowly test non public API's

class _PipMagicFriend(PipMagic):
@classmethod
def split(cls, code: str) -> list[str]:
return cls._split(code)

assert PipMagic._split(code) == split # pylint: disable=protected-access


Expand Down