-
Notifications
You must be signed in to change notification settings - Fork 294
Creating on_warning_callback arg for better warning management #225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
78 commits
Select commit
Hold shift + click to select a range
5fdf852
addding slack warning
giovannicorsetti 65e4adc
remove unused import
giovannicorsetti bd957da
add error handling for parsing function
giovannicorsetti b4122bd
🎨 [pre-commit.ci] Auto format from pre-commit.com hooks
pre-commit-ci[bot] 14ff9b0
format
giovannicorsetti ec5cdf1
🎨 [pre-commit.ci] Auto format from pre-commit.com hooks
pre-commit-ci[bot] 79c3308
handling case where tests are not defined
giovannicorsetti 2456fa9
Merge remote-tracking branch 'origin/main'
giovannicorsetti 0506c9d
🎨 [pre-commit.ci] Auto format from pre-commit.com hooks
pre-commit-ci[bot] 7ae0dda
creating unified function for WARN and ERROR
giovannicorsetti ff4d2eb
Merge remote-tracking branch 'origin/main'
giovannicorsetti 48dbef7
🎨 [pre-commit.ci] Auto format from pre-commit.com hooks
pre-commit-ci[bot] 5f789d2
Slack message containing warnings
giovannicorsetti 01499c2
🎨 [pre-commit.ci] Auto format from pre-commit.com hooks
pre-commit-ci[bot] 9cea7d6
make line shorter
giovannicorsetti a7926a1
Creating on_warning_callback argument (TODO: document)
giovannicorsetti 831f14b
🎨 [pre-commit.ci] Auto format from pre-commit.com hooks
pre-commit-ci[bot] cdeaef7
patch for render.py
giovannicorsetti f208954
Merge remote-tracking branch 'origin/main'
giovannicorsetti aa7492e
🎨 [pre-commit.ci] Auto format from pre-commit.com hooks
pre-commit-ci[bot] 9f16455
improving robustness & adding local doc
giovannicorsetti e6d4c79
🎨 [pre-commit.ci] Auto format from pre-commit.com hooks
pre-commit-ci[bot] abc215e
adding warn documentation
giovannicorsetti c97c7bc
Merge branch 'main' into main
CorsettiS 5014aff
slightly change documentation of functions
giovannicorsetti 054f93d
Improve warn function performance & make it more readable
giovannicorsetti 5a19af4
merge current master into it
giovannicorsetti 0fcc04b
🎨 [pre-commit.ci] Auto format from pre-commit.com hooks
pre-commit-ci[bot] 867f4fb
merge with version 0.6
giovannicorsetti 00a3ee5
implementing changes for local version
giovannicorsetti 3fd4d79
Merge remote-tracking branch 'origin/main'
giovannicorsetti 7eafcdb
using black for formatting
giovannicorsetti 0464d19
🎨 [pre-commit.ci] Auto format from pre-commit.com hooks
pre-commit-ci[bot] 45b74ba
fix crash with on_warning_callback on docker & k8s and add note about…
giovannicorsetti 5eba1fe
Merge remote-tracking branch 'origin/main'
giovannicorsetti 6d86cd3
🎨 [pre-commit.ci] Auto format from pre-commit.com hooks
pre-commit-ci[bot] 660c1f2
quick patch
giovannicorsetti daa780b
Merge remote-tracking branch 'origin/main'
giovannicorsetti 73fdc5c
Merge branch 'main' into main
CorsettiS 05db490
Merge branch 'main' into main
CorsettiS 64c7ba6
solving new branch conflict
giovannicorsetti f018428
Merge remote-tracking branch 'origin/main'
giovannicorsetti 7ae00fb
🎨 [pre-commit.ci] Auto format from pre-commit.com hooks
pre-commit-ci[bot] c78ea6d
solving new branch conflict patch_1
giovannicorsetti 155adc9
Merge remote-tracking branch 'origin/main'
giovannicorsetti 5c80ad1
Update warn_parsing.py
CorsettiS ccfb239
Merge branch 'main' into main
CorsettiS 111f405
reset raw_customers.csv
giovannicorsetti 2add5d8
🎨 [pre-commit.ci] Auto format from pre-commit.com hooks
pre-commit-ci[bot] a78d551
reset raw_orders.csv
giovannicorsetti 8f05fee
Merge remote-tracking branch 'origin/main'
giovannicorsetti d862dd9
🎨 [pre-commit.ci] Auto format from pre-commit.com hooks
pre-commit-ci[bot] d665193
Update cosmos/providers/dbt/core/operators/local.py
CorsettiS 1658fd1
🎨 [pre-commit.ci] Auto format from pre-commit.com hooks
pre-commit-ci[bot] f514982
format local.py
giovannicorsetti a450c55
format adapted_subprocesshook.py
giovannicorsetti 837e5c8
format warn_parsing.py
giovannicorsetti ff864d1
format raw_orders.csv
giovannicorsetti e2c4bb4
add extra note consideration
giovannicorsetti 23310d2
Merge branch 'main' into main
CorsettiS 9912f34
add extra note consideration 2
giovannicorsetti b24e34c
add second note to documentation
giovannicorsetti 545b875
Merge branch 'main' into main
CorsettiS a4fd279
Merge branch 'main' into main
CorsettiS 3fb0aec
Merge branch 'main' into main
jlaneve 6b7addd
Merge branch 'main' into main
CorsettiS 7280ca8
renaming subprocesshook to fulloutputsubprocesshook + adding descript…
giovannicorsetti 58268fd
Merge remote-tracking branch 'origin/main'
giovannicorsetti 3efd844
🎨 [pre-commit.ci] Auto format from pre-commit.com hooks
pre-commit-ci[bot] c8c4deb
adding tests for warning functions
giovannicorsetti 9d172eb
Merge remote-tracking branch 'origin/main'
giovannicorsetti 0000b9e
🎨 [pre-commit.ci] Auto format from pre-commit.com hooks
pre-commit-ci[bot] 329462a
improving consistency of tests
giovannicorsetti 3774add
Merge remote-tracking branch 'origin/main'
giovannicorsetti 2e62efe
🎨 [pre-commit.ci] Auto format from pre-commit.com hooks
pre-commit-ci[bot] e5bfcfb
Merge branch 'main' into main
CorsettiS d61313b
renaming all SubprocessResult for FullOutputSubprocessResult
giovannicorsetti 07ff0d3
🎨 [pre-commit.ci] Auto format from pre-commit.com hooks
pre-commit-ci[bot] File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
108 changes: 108 additions & 0 deletions
108
cosmos/providers/dbt/core/utils/adapted_subprocesshook.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,108 @@ | ||
| # This hook has been refined from the Airflow SubprocessHook, offering an added functionality to the original. | ||
| # It presents an alternative option to return the complete command output, as opposed to solely the last line from | ||
| # stdout or stderr. This option proves to be highly beneficial for any text analysis that depends on the stdout or | ||
| # stderr output of a dbt command. | ||
| from __future__ import annotations | ||
|
|
||
| import contextlib | ||
| import os | ||
| import signal | ||
| from collections import namedtuple | ||
| from subprocess import PIPE, STDOUT, Popen | ||
| from tempfile import TemporaryDirectory, gettempdir | ||
|
|
||
| from airflow.hooks.base import BaseHook | ||
|
|
||
| FullOutputSubprocessResult = namedtuple( | ||
| "FullOutputSubprocessResult", ["exit_code", "output", "full_output"] | ||
| ) | ||
|
|
||
|
|
||
| class FullOutputSubprocessHook(BaseHook): | ||
| """Hook for running processes with the ``subprocess`` module.""" | ||
|
|
||
| def __init__(self) -> None: | ||
| self.sub_process: Popen[bytes] | None = None | ||
| super().__init__() | ||
|
|
||
| def run_command( | ||
| self, | ||
| command: list[str], | ||
| env: dict[str, str] | None = None, | ||
| output_encoding: str = "utf-8", | ||
| cwd: str | None = None, | ||
| ) -> FullOutputSubprocessResult: | ||
| """ | ||
| Execute the command. | ||
|
|
||
| If ``cwd`` is None, execute the command in a temporary directory which will be cleaned afterwards. | ||
| If ``env`` is not supplied, ``os.environ`` is passed | ||
|
|
||
| :param command: the command to run | ||
| :param env: Optional dict containing environment variables to be made available to the shell | ||
| environment in which ``command`` will be executed. If omitted, ``os.environ`` will be used. | ||
| Note, that in case you have Sentry configured, original variables from the environment | ||
| will also be passed to the subprocess with ``SUBPROCESS_`` prefix. See | ||
| :doc:`/administration-and-deployment/logging-monitoring/errors` for details. | ||
| :param output_encoding: encoding to use for decoding stdout | ||
| :param cwd: Working directory to run the command in. | ||
| If None (default), the command is run in a temporary directory. | ||
| :return: :class:`namedtuple` containing: | ||
| ``exit_code`` | ||
| ``output``: the last line from stderr or stdout | ||
| ``full_output``: all lines from stderr or stdout. | ||
| """ | ||
| self.log.info("Tmp dir root location: \n %s", gettempdir()) | ||
| log_lines = [] | ||
| with contextlib.ExitStack() as stack: | ||
| if cwd is None: | ||
| cwd = stack.enter_context(TemporaryDirectory(prefix="airflowtmp")) | ||
|
|
||
| def pre_exec(): | ||
| # Restore default signal disposition and invoke setsid | ||
| for sig in ("SIGPIPE", "SIGXFZ", "SIGXFSZ"): | ||
| if hasattr(signal, sig): | ||
| signal.signal(getattr(signal, sig), signal.SIG_DFL) | ||
| os.setsid() | ||
|
|
||
| self.log.info("Running command: %s", command) | ||
|
|
||
| self.sub_process = Popen( | ||
| command, | ||
| stdout=PIPE, | ||
| stderr=STDOUT, | ||
| cwd=cwd, | ||
| env=env if env or env == {} else os.environ, | ||
| preexec_fn=pre_exec, | ||
| ) | ||
|
|
||
| self.log.info("Output:") | ||
| line = "" | ||
|
|
||
| if self.sub_process is None: | ||
| raise RuntimeError("The subprocess should be created here and is None!") | ||
| if self.sub_process.stdout is not None: | ||
| for raw_line in iter(self.sub_process.stdout.readline, b""): | ||
| line = raw_line.decode( | ||
| output_encoding, errors="backslashreplace" | ||
| ).rstrip() | ||
| # storing the warn & error lines to be used later | ||
| log_lines.append(line) | ||
| self.log.info("%s", line) | ||
|
|
||
| self.sub_process.wait() | ||
|
|
||
| self.log.info( | ||
| "Command exited with return code %s", self.sub_process.returncode | ||
| ) | ||
| return_code: int = self.sub_process.returncode | ||
|
|
||
| return FullOutputSubprocessResult( | ||
| exit_code=return_code, output=line, full_output=log_lines | ||
| ) | ||
|
|
||
| def send_sigterm(self): | ||
| """Sends SIGTERM signal to ``self.sub_process`` if one exists.""" | ||
| self.log.info("Sending SIGTERM signal to process group") | ||
| if self.sub_process and hasattr(self.sub_process, "pid"): | ||
| os.killpg(os.getpgid(self.sub_process.pid), signal.SIGTERM) |
39 changes: 39 additions & 0 deletions
39
cosmos/providers/dbt/core/utils/tests/test_warn_parsing.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,39 @@ | ||
| from airflow.hooks.subprocess import SubprocessResult | ||
|
|
||
| from cosmos.providers.dbt.core.utils.warn_parsing import ( | ||
| extract_log_issues, | ||
| parse_output, | ||
| ) | ||
|
|
||
|
|
||
| def test_parse_output() -> None: | ||
| for warnings in range(0, 3): | ||
| output_str = f"Done. PASS=15 WARN={warnings} ERROR=0 SKIP=0 TOTAL=16" | ||
| keyword = "WARN" | ||
| result = SubprocessResult(exit_code=0, output=output_str) | ||
| num_warns = parse_output(result, keyword) | ||
| assert num_warns == warnings | ||
|
|
||
|
|
||
| def test_extract_log_issues() -> None: | ||
| log_list = [ | ||
| "20:30:01 \x1b[33mRunning with dbt=1.3.0\x1b[0m", | ||
| "20:30:03 \x1b[33mFinished running 1 test in 10.31s.\x1b[0m", | ||
| "20:30:02 \x1b[33mWarning in test my_test (models/my_model.sql)\x1b[0m", | ||
| "20:30:02 \x1b[33mSome warning message\x1b[0m", | ||
| "20:30:03 \x1b[33mWarning in test my_second_test (models/my_model.sql)\x1b[0m", | ||
| "20:30:03 \x1b[33mA very different warning message\x1b[0m", | ||
| ] | ||
| test_names, test_results = extract_log_issues(log_list) | ||
| assert "my_test" in test_names | ||
| assert "my_second_test" in test_names | ||
| assert "Some warning message" in test_results | ||
| assert "A very different warning message" in test_results | ||
|
|
||
| log_list_no_warning = [ | ||
| "20:30:01 \x1b[33mRunning with dbt=1.3.0\x1b[0m", | ||
| "20:30:03 \x1b[33mFinished running 1 test in 10.31s.\x1b[0m", | ||
| ] | ||
| test_names_no_warns, test_results_no_warns = extract_log_issues(log_list_no_warning) | ||
| assert test_names_no_warns == [] | ||
| assert test_results_no_warns == [] |
|
CorsettiS marked this conversation as resolved.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,71 @@ | ||
| import logging | ||
| import re | ||
| from typing import List, Tuple | ||
|
|
||
| from airflow.hooks.subprocess import SubprocessResult | ||
|
|
||
|
|
||
| def parse_output(result: SubprocessResult, keyword: str) -> int: | ||
| """ | ||
| Parses the DBT test output message and returns the number of errors or warnings. | ||
|
|
||
| :param result: String containing the output to be parsed. | ||
| :param keyword: String representing the keyword to search for in the output (WARN, ERROR). | ||
| :return: An integer value associated with the keyword, or 0 if parsing fails. | ||
|
|
||
| Usage: | ||
| ----- | ||
| output_str = "Done. PASS=15 WARN=1 ERROR=0 SKIP=0 TOTAL=16" | ||
| keyword = "WARN" | ||
| num_warns = parse_output(output_str, keyword) | ||
| print(num_warns) | ||
| # Output: 1 | ||
| """ | ||
| output = result.output | ||
| try: | ||
| num = int(output.split(f"{keyword}=")[1].split()[0]) | ||
| except ValueError: | ||
| logging.error( | ||
| f"Could not parse number of {keyword}s. Check your dbt/airflow version or if --quiet is not being used" | ||
| ) | ||
| return num | ||
|
|
||
|
|
||
| def extract_log_issues(log_list: List[str]) -> Tuple[List[str], List[str]]: | ||
| """ | ||
| Extracts warning messages from the log list and returns them as a formatted string. | ||
|
|
||
| This function searches for warning messages in DBT test. It reverses the log list for performance | ||
| improvement. It extracts and formats the relevant information and appends it to a list of warnings. | ||
|
|
||
| :param log_list: List of strings, where each string is a log line from DBT test. | ||
| :return: two lists of strings, the first one containing the test names and the second one | ||
| containing the test results. | ||
| """ | ||
|
|
||
| def clean_line(line: str) -> str: | ||
| return line.replace("\x1b[33m", "").replace("\x1b[0m", "").strip() | ||
|
|
||
| test_names = [] | ||
| test_results = [] | ||
| pattern1 = re.compile(r"\d{2}:\d{2}:\d{2}\s+Warning in test ([\w_]+).*") | ||
| pattern2 = re.compile(r"\d{2}:\d{2}:\d{2}\s+(.*)") | ||
|
|
||
| for line_index, line in enumerate(reversed(log_list)): | ||
| cleaned_line = clean_line(line) | ||
|
|
||
| if "Finished running" in cleaned_line: | ||
| # No need to keep checking the log lines once all warnings are found | ||
| break | ||
|
|
||
| if "Warning in test" in cleaned_line: | ||
| test_name = pattern1.sub(r"\1", cleaned_line) | ||
| # test_result is on the next line by default | ||
| test_result = pattern2.sub( | ||
| r"\1", clean_line(log_list[-(line_index + 1) + 1]) | ||
| ) | ||
|
|
||
| test_names.append(test_name) | ||
| test_results.append(test_result) | ||
|
|
||
| return test_names, test_results | ||
|
jlaneve marked this conversation as resolved.
|
||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.