Skip to content

Conversation

@yyoean
Copy link
Contributor

@yyoean yyoean commented Oct 29, 2025

Add exclusion of zeros in diagnose_matrix

Summary by Sourcery

Enable optional exclusion of zero values in diagnose_matrix by adding a new environment variable and adapting the abnormality detection logic and method invocation accordingly.

New Features:

  • Introduce DEEPEP_DIAGNOSE_EXCLUDING_ZEROS environment variable to control zero exclusion in diagnose_matrix

Enhancements:

  • Extend diagnose_matrix to branch between including or excluding zeros when computing means and detecting abnormal columns, rows, and points
  • Propagate the exclusion flag from class initialization into diagnose_matrix calls

Documentation:

  • Document the new exclusion setting in the Diagnose class docstring

@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Oct 29, 2025

Reviewer's Guide

This PR introduces an optional toggle to exclude zero entries when diagnosing matrices, by reading a new environment variable into the Diagnose class, extending the diagnose_matrix API, and implementing a parallel computation path that filters out zeros in mean and abnormal-value calculations.

Class diagram for updated Diagnose class and diagnose_matrix method

classDiagram
class Diagnose {
  - thres_col: float
  - thres_row: float
  - thres_point: float
  - excluing_zeros: int
  + __init__(...)
  + diagnose_matrix(...)
}
Diagnose : diagnose_matrix static
Diagnose : excluing_zeros int
Diagnose : thres_col float
Diagnose : thres_row float
Diagnose : thres_point float
Loading

Flow diagram for diagnose_matrix zero exclusion logic

flowchart TD
    A["Start diagnose_matrix"] --> B{excluing_zeros == 0?}
    B -- Yes --> C["Include zeros in mean and abnormal calculations"]
    B -- No --> D["Exclude zeros in mean and abnormal calculations"]
    C --> E["Detect abnormal columns, rows, points (including zeros)"]
    D --> F["Detect abnormal columns, rows, points (excluding zeros)"]
    E --> G["Return results"]
    F --> G["Return results"]
Loading

File-Level Changes

Change Details Files
Add exclusion toggle configuration and propagate it
  • Expose DEEPEP_DIAGNOSE_EXCLUDING_ZEROS in the docstring
  • Read and store its value as self.excluing_zeros in init
  • Pass self.excluing_zeros into diagnose_matrix in _gather_diagnose_stats_internal
src/deepxtrace/diagnose.py
Extend diagnose_matrix signature to accept exclusion flag
  • Append excluing_zeros parameter with default 0 to diagnose_matrix declaration
  • Update internal calls and doc comments to include the new parameter
src/deepxtrace/diagnose.py
Implement conditional zero‐exclusion logic in diagnose_matrix
  • Wrap existing column/row/point detection under if excluing_zeros==0
  • Add else branch that computes means excluding zeros using list comprehensions
  • Initialize z-score arrays only for non-zero indices and mask zeros
  • Detect abnormal points only on non-zero entries and preserve suppress-rows/cols logic
src/deepxtrace/diagnose.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @yyoean, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the diagnose_matrix functionality by providing a configurable option to exclude zero values from the statistical calculations used to identify abnormal columns, rows, and individual points. This allows for more flexible and potentially more accurate anomaly detection in matrices where zero values might represent missing data or non-contributing elements that should not influence the statistical analysis.

Highlights

  • New Feature: Zero Exclusion in Diagnosis: Introduced an option to exclude zero values when calculating means and identifying abnormal columns, rows, and individual points within the diagnose_matrix function.
  • Configurable Behavior: The new zero exclusion behavior is controlled by a new environment variable, DEEPEP_DIAGNOSE_EXCLUDING_ZEROS, allowing users to toggle this feature.
  • Refactored Anomaly Detection Logic: The diagnose_matrix function now contains conditional logic to perform statistical calculations either including or excluding zero values, providing more flexible anomaly detection.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • There’s a typo in the parameter/name excluing_zeros—it should be spelled excluding_zeros to match the environment variable and avoid confusion.
  • The include-zeros and exclude-zeros branches share nearly identical logic; consider refactoring common pieces into helper functions to reduce duplication and simplify maintenance.
  • Update the diagnose_matrix docstring to document the new excluding_zeros parameter and clarify whether it expects a boolean or integer flag.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- There’s a typo in the parameter/name `excluing_zeros`—it should be spelled `excluding_zeros` to match the environment variable and avoid confusion.
- The include-zeros and exclude-zeros branches share nearly identical logic; consider refactoring common pieces into helper functions to reduce duplication and simplify maintenance.
- Update the diagnose_matrix docstring to document the new `excluding_zeros` parameter and clarify whether it expects a boolean or integer flag.

## Individual Comments

### Comment 1
<location> `src/deepxtrace/diagnose.py:165` </location>
<code_context>
             os.getenv(
                 "DEEPEP_DIAGNOSE_THRESHOLD_POINT",
                 5.0))
+        self.excluing_zeros = int(os.getenv("DEEPEP_DIAGNOSE_EXCLUDING_ZEROS", 0))

         # Initialize the diagnose
</code_context>

<issue_to_address>
**issue (typo):** Typo in variable name 'excluing_zeros'.

Please update all instances to 'excluding_zeros' for consistency.

Suggested implementation:

```python
        self.excluding_zeros = int(os.getenv("DEEPEP_DIAGNOSE_EXCLUDING_ZEROS", 0))

```

```python
        suppress_points_in_strong_rowscols=True, excluding_zeros=0

```

If `excluing_zeros` is used elsewhere in the file (e.g., inside the `diagnose_matrix` method or other methods), those usages should also be updated to `excluding_zeros`.
</issue_to_address>

### Comment 2
<location> `src/deepxtrace/diagnose.py:120` </location>
<code_context>
         DEEPEP_DIAGNOSE_THRESHOLD_COL: determine threshold for abnormal columns. Default 3.0.
         DEEPEP_DIAGNOSE_THRESHOLD_ROW: determine threshold for abnormal rows. Default 3.0.
         DEEPEP_DIAGNOSE_THRESHOLD_POINT: determine threshold for abnormal individual points. Default 5.0.
+        DEEPEP_DIAGNOSE_EXCLUDING_ZEROS: controls whether excluding zeors in diagnose_matrix. Default 0.

     """
</code_context>

<issue_to_address>
**nitpick (typo):** Typo in description: 'zeors' should be 'zeros'.

```suggestion
        DEEPEP_DIAGNOSE_EXCLUDING_ZEROS: controls whether excluding zeros in diagnose_matrix. Default 0.
```
</issue_to_address>

### Comment 3
<location> `src/deepxtrace/diagnose.py:309` </location>
<code_context>
    @staticmethod
    def diagnose_matrix(
        mat, thres_col=3.0, thres_row=3.0, thres_point=5.0,
        suppress_points_in_strong_rowscols=True, excluing_zeros=0
    ):
        """
        Detect abnormal columns, rows, and individual points in a 2D wait-time matrix.

        Arguments:
            mat (np.ndarray): 2D array where mat[i, j] is the waiting time of source i for destination j.
            thres_col (float): Threshold for abnormal columns.
            thres_row (float): Threshold for abnormal rows.
            thres_point (float): Threshold for abnormal individual points.
            suppress_points_in_strong_rowscols (bool): If True, exclude points already in detected abnormal rows/columns.

        Returns:
            dict: {
                "abnormal_cols": List[List[int, float, float]],  # abnormal column indices, [col_index, mean_value, normalized_value]
                "abnormal_rows": List[List[int, float, float]],  # abnormal row indices, [rol_index, mean_value, normalized_value]
                "abnormal_points": List[List[int, int, float, float]]  # abnormal points, [row, col, value, normalized_value]
            }
        """
        if (excluing_zeros == 0): 
            # 1. Check for abnormal columns (including zeros)
            col_means = mat.mean(axis=0)
            # z_col = (col_means - col_means.mean()) / (col_means.std() + 1e-8)
            z_col = col_means / (col_means.mean() + 1e-8)
            abnormal_cols = [
                [j, col_means[j], z_col[j]]
                for j in np.where(z_col > thres_col)[0]
            ]

            # 2. Check for abnormal rows (including zeros)
            row_means = mat.mean(axis=1)
            # z_row = (row_means - row_means.mean()) / (row_means.std() + 1e-8)
            z_row = row_means / (row_means.mean() + 1e-8)
            abnormal_rows = [
                [i, row_means[i], z_row[i]]
                for i in np.where(z_row > thres_row)[0]
            ]

            # 3. Check for abnormal single points (including zeros)
            # z_all = (mat - mat.mean()) / (mat.std() + 1e-8)
            z_all = mat / (mat.mean() + 1e-8)
            # Get all positions with z-score > threshold
            abnormal_points = [
                [i, j, mat[i, j], z_all[i, j]]
                for i in range(mat.shape[0])
                for j in range(mat.shape[1])
                if z_all[i, j] > thres_point
            ]
            # Optionally remove points that are in already detected abnormal rows or columns
            if suppress_points_in_strong_rowscols:
                strong_rows = [row[0] for row in abnormal_rows]
                strong_cols = [col[0] for col in abnormal_cols]
                abnormal_points = [
                    [i, j, v, z] for [i, j, v, z] in abnormal_points
                    if i not in strong_rows and j not in strong_cols
                ]            
        else:    
            # 1. Check for abnormal columns (excluding zeros in columns)
            col_means = np.array([
                mat[:, j][mat[:, j] != 0].mean()  # Calculate mean of non-zero values in column
                if np.any(mat[:, j] != 0)         # If column contains non-zero values
                else 0                            # Else set to 0 (avoid empty column errors)
                for j in range(mat.shape[1])
            ])
            # Calculate normalized values (exclude all-zero columns)
            valid_cols = np.where(col_means != 0)[0]  # Indices of columns with non-zero mean
            z_col = np.zeros_like(col_means)          # Initialize all-zero array
            if len(valid_cols) > 0:
                z_col[valid_cols] = col_means[valid_cols] / (col_means[valid_cols].mean() + 1e-8)
            # Detect abnormal columns (only non-zero columns)
            abnormal_cols = [
                [j, col_means[j], z_col[j]]
                for j in valid_cols
                if z_col[j] > thres_col
            ]

            # 2. Check for abnormal rows (excluding zeros in rows)
            row_means = np.array([
                mat[i, :][mat[i, :] != 0].mean()  # Calculate mean of non-zero values in row
                if np.any(mat[i, :] != 0)         # If row contains non-zero values
                else 0                            # Else set to 0 (avoid empty row errors)
                for i in range(mat.shape[0])
            ])
            # Calculate normalized values (exclude all-zero rows)
            valid_rows = np.where(row_means != 0)[0]  # Indices of rows with non-zero mean
            z_row = np.zeros_like(row_means)          # Initialize all-zero array
            if len(valid_rows) > 0:
                z_row[valid_rows] = row_means[valid_rows] / (row_means[valid_rows].mean() + 1e-8)
            # Detect abnormal rows (only non-zero rows)
            abnormal_rows = [
                [i, row_means[i], z_row[i]]
                for i in valid_rows
                if z_row[i] > thres_row
            ]     

            # 3. Check for abnormal single points (excluding zeros)
            mask = mat != 0  # Create mask for non-zero values
            z_all = np.zeros_like(mat, dtype=float)  # Initialize all-zero array
            if np.any(mask):  # If non-zero values exist
                # Calculate mean of non-zero values (global)
                nonzero_mean = mat[mask].mean()
                z_all[mask] = mat[mask] / (nonzero_mean + 1e-8)  # Normalize only non-zero values
            # Detect abnormal points (non-zero values with z-score > threshold)
            abnormal_points = [
                [i, j, mat[i, j], z_all[i, j]]
                for i in range(mat.shape[0])
                for j in range(mat.shape[1])
                if mask[i, j] and z_all[i, j] > thres_point  # Ensure non-zero and abnormal
            ]
            # Optionally remove points in already detected abnormal rows/columns
            if suppress_points_in_strong_rowscols:
                strong_rows = {row[0] for row in abnormal_rows}  # Use set for faster lookup
                strong_cols = {col[0] for col in abnormal_cols}
                abnormal_points = [
                    [i, j, v, z] for [i, j, v, z] in abnormal_points
                    if i not in strong_rows and j not in strong_cols
                ]

        # 4. Return for automatic processing
        return {
            "abnormal_cols": abnormal_cols,
            "abnormal_rows": abnormal_rows,
            "abnormal_points": abnormal_points
        }

</code_context>

<issue_to_address>
**issue (code-quality):** Low code quality found in Diagnose.diagnose\_matrix - 18% ([`low-code-quality`](https://docs.sourcery.ai/Reference/Default-Rules/comments/low-code-quality/))

<br/><details><summary>Explanation</summary>The quality score for this function is below the quality threshold of 25%.
This score is a combination of the method length, cognitive complexity and working memory.

How can you solve this?

It might be worth refactoring this function to make it shorter and more readable.

- Reduce the function length by extracting pieces of functionality out into
  their own functions. This is the most important thing you can do - ideally a
  function should be less than 10 lines.
- Reduce nesting, perhaps by introducing guard clauses to return early.
- Ensure that variables are tightly scoped, so that code using related concepts
  sits together within the function rather than being scattered.</details>
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a useful feature to optionally exclude zero values during matrix diagnosis. However, the implementation introduces several issues that should be addressed. There is a recurring typo ('excluing' instead of 'excluding') that affects readability and consistency. The main logic change in diagnose_matrix has led to significant code duplication, which will make future maintenance difficult. Additionally, there are several areas where performance can be improved by using vectorized NumPy operations instead of list comprehensions. I have provided specific comments and suggestions to resolve these issues.

Comment on lines 388 to 393
row_means = np.array([
mat[i, :][mat[i, :] != 0].mean() # Calculate mean of non-zero values in row
if np.any(mat[i, :] != 0) # If row contains non-zero values
else 0 # Else set to 0 (avoid empty row errors)
for i in range(mat.shape[0])
])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to the column mean calculation, this implementation for row means is inefficient due to the list comprehension. A vectorized approach will be significantly more performant.

            row_means = np.ma.masked_equal(mat, 0).mean(axis=1).filled(0)

yyoean and others added 6 commits October 29, 2025 19:04
Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: yyoean <[email protected]>
Signed-off-by: yyoean <[email protected]>
Signed-off-by: yyoean <[email protected]>
]
else:
# 1. Check for abnormal columns (excluding zeros in columns)
col_means = np.array([
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use this method? More efficient and concise.

col_means = np.ma.masked_equal(mat, 0).mean(axis=0).filled(0)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replaced

Signed-off-by: yyoean <[email protected]>
Signed-off-by: yyoean <[email protected]>
Signed-off-by: yyoean <[email protected]>
Signed-off-by: yyoean <[email protected]>
Signed-off-by: yyoean <[email protected]>
Signed-off-by: yyoean <[email protected]>
Signed-off-by: yyoean <[email protected]>
if i not in strong_rows and j not in strong_cols
]
else:
# 1. Check for abnormal columns (excluding zeros in columns)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add test cases for include some zeros matrix.

Signed-off-by: yyoean <[email protected]>
Signed-off-by: yyoean <[email protected]>
if i not in strong_rows and j not in strong_cols
]
elif excluding_zeros == 1:
# 3. Check for abnormal single points (excluding zeros)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simply

Signed-off-by: yyoean <[email protected]>
Signed-off-by: yyoean <[email protected]>
Signed-off-by: yyoean <[email protected]>
Signed-off-by: yyoean <[email protected]>
mean_val = nonzero_values.mean()
z_all = mat / (mean_val + 1e-8)
else:
mean_val = 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove useless the variable mean_val = 0

else:
mean_val = 0
# avoid devide zero
z_all = np.zeros_like(mat)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

z_all = np.zeros_like(mat) is equal z_all = mat ?

Signed-off-by: yyoean <[email protected]>
Copy link
Collaborator

@wangfakang wangfakang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks.

@wangfakang wangfakang merged commit d8cfa21 into antgroup:main Nov 5, 2025
3 checks passed
wangfakang pushed a commit to wangfakang/DeepXTrace that referenced this pull request Nov 6, 2025
* Add exclusion of zeros in diagnose_matrix

* Update src/deepxtrace/diagnose.py

Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>

* Update src/deepxtrace/diagnose.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Update src/deepxtrace/diagnose.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* autopep8

Signed-off-by: yyoean <[email protected]>

* essential

Signed-off-by: yyoean <[email protected]>

* format

Signed-off-by: yyoean <[email protected]>

* update

Signed-off-by: yyoean <[email protected]>

* simplify

Signed-off-by: yyoean <[email protected]>

* test.py

Signed-off-by: yyoean <[email protected]>

* format

Signed-off-by: yyoean <[email protected]>

---------

Signed-off-by: yyoean <[email protected]>
Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants