-
Notifications
You must be signed in to change notification settings - Fork 4
Add exclusion of zeros in diagnose_matrix #8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Reviewer's GuideThis PR introduces an optional toggle to exclude zero entries when diagnosing matrices, by reading a new environment variable into the Diagnose class, extending the diagnose_matrix API, and implementing a parallel computation path that filters out zeros in mean and abnormal-value calculations. Class diagram for updated Diagnose class and diagnose_matrix methodclassDiagram
class Diagnose {
- thres_col: float
- thres_row: float
- thres_point: float
- excluing_zeros: int
+ __init__(...)
+ diagnose_matrix(...)
}
Diagnose : diagnose_matrix static
Diagnose : excluing_zeros int
Diagnose : thres_col float
Diagnose : thres_row float
Diagnose : thres_point float
Flow diagram for diagnose_matrix zero exclusion logicflowchart TD
A["Start diagnose_matrix"] --> B{excluing_zeros == 0?}
B -- Yes --> C["Include zeros in mean and abnormal calculations"]
B -- No --> D["Exclude zeros in mean and abnormal calculations"]
C --> E["Detect abnormal columns, rows, points (including zeros)"]
D --> F["Detect abnormal columns, rows, points (excluding zeros)"]
E --> G["Return results"]
F --> G["Return results"]
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Summary of ChangesHello @yyoean, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- There’s a typo in the parameter/name
excluing_zeros—it should be spelledexcluding_zerosto match the environment variable and avoid confusion. - The include-zeros and exclude-zeros branches share nearly identical logic; consider refactoring common pieces into helper functions to reduce duplication and simplify maintenance.
- Update the diagnose_matrix docstring to document the new
excluding_zerosparameter and clarify whether it expects a boolean or integer flag.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- There’s a typo in the parameter/name `excluing_zeros`—it should be spelled `excluding_zeros` to match the environment variable and avoid confusion.
- The include-zeros and exclude-zeros branches share nearly identical logic; consider refactoring common pieces into helper functions to reduce duplication and simplify maintenance.
- Update the diagnose_matrix docstring to document the new `excluding_zeros` parameter and clarify whether it expects a boolean or integer flag.
## Individual Comments
### Comment 1
<location> `src/deepxtrace/diagnose.py:165` </location>
<code_context>
os.getenv(
"DEEPEP_DIAGNOSE_THRESHOLD_POINT",
5.0))
+ self.excluing_zeros = int(os.getenv("DEEPEP_DIAGNOSE_EXCLUDING_ZEROS", 0))
# Initialize the diagnose
</code_context>
<issue_to_address>
**issue (typo):** Typo in variable name 'excluing_zeros'.
Please update all instances to 'excluding_zeros' for consistency.
Suggested implementation:
```python
self.excluding_zeros = int(os.getenv("DEEPEP_DIAGNOSE_EXCLUDING_ZEROS", 0))
```
```python
suppress_points_in_strong_rowscols=True, excluding_zeros=0
```
If `excluing_zeros` is used elsewhere in the file (e.g., inside the `diagnose_matrix` method or other methods), those usages should also be updated to `excluding_zeros`.
</issue_to_address>
### Comment 2
<location> `src/deepxtrace/diagnose.py:120` </location>
<code_context>
DEEPEP_DIAGNOSE_THRESHOLD_COL: determine threshold for abnormal columns. Default 3.0.
DEEPEP_DIAGNOSE_THRESHOLD_ROW: determine threshold for abnormal rows. Default 3.0.
DEEPEP_DIAGNOSE_THRESHOLD_POINT: determine threshold for abnormal individual points. Default 5.0.
+ DEEPEP_DIAGNOSE_EXCLUDING_ZEROS: controls whether excluding zeors in diagnose_matrix. Default 0.
"""
</code_context>
<issue_to_address>
**nitpick (typo):** Typo in description: 'zeors' should be 'zeros'.
```suggestion
DEEPEP_DIAGNOSE_EXCLUDING_ZEROS: controls whether excluding zeros in diagnose_matrix. Default 0.
```
</issue_to_address>
### Comment 3
<location> `src/deepxtrace/diagnose.py:309` </location>
<code_context>
@staticmethod
def diagnose_matrix(
mat, thres_col=3.0, thres_row=3.0, thres_point=5.0,
suppress_points_in_strong_rowscols=True, excluing_zeros=0
):
"""
Detect abnormal columns, rows, and individual points in a 2D wait-time matrix.
Arguments:
mat (np.ndarray): 2D array where mat[i, j] is the waiting time of source i for destination j.
thres_col (float): Threshold for abnormal columns.
thres_row (float): Threshold for abnormal rows.
thres_point (float): Threshold for abnormal individual points.
suppress_points_in_strong_rowscols (bool): If True, exclude points already in detected abnormal rows/columns.
Returns:
dict: {
"abnormal_cols": List[List[int, float, float]], # abnormal column indices, [col_index, mean_value, normalized_value]
"abnormal_rows": List[List[int, float, float]], # abnormal row indices, [rol_index, mean_value, normalized_value]
"abnormal_points": List[List[int, int, float, float]] # abnormal points, [row, col, value, normalized_value]
}
"""
if (excluing_zeros == 0):
# 1. Check for abnormal columns (including zeros)
col_means = mat.mean(axis=0)
# z_col = (col_means - col_means.mean()) / (col_means.std() + 1e-8)
z_col = col_means / (col_means.mean() + 1e-8)
abnormal_cols = [
[j, col_means[j], z_col[j]]
for j in np.where(z_col > thres_col)[0]
]
# 2. Check for abnormal rows (including zeros)
row_means = mat.mean(axis=1)
# z_row = (row_means - row_means.mean()) / (row_means.std() + 1e-8)
z_row = row_means / (row_means.mean() + 1e-8)
abnormal_rows = [
[i, row_means[i], z_row[i]]
for i in np.where(z_row > thres_row)[0]
]
# 3. Check for abnormal single points (including zeros)
# z_all = (mat - mat.mean()) / (mat.std() + 1e-8)
z_all = mat / (mat.mean() + 1e-8)
# Get all positions with z-score > threshold
abnormal_points = [
[i, j, mat[i, j], z_all[i, j]]
for i in range(mat.shape[0])
for j in range(mat.shape[1])
if z_all[i, j] > thres_point
]
# Optionally remove points that are in already detected abnormal rows or columns
if suppress_points_in_strong_rowscols:
strong_rows = [row[0] for row in abnormal_rows]
strong_cols = [col[0] for col in abnormal_cols]
abnormal_points = [
[i, j, v, z] for [i, j, v, z] in abnormal_points
if i not in strong_rows and j not in strong_cols
]
else:
# 1. Check for abnormal columns (excluding zeros in columns)
col_means = np.array([
mat[:, j][mat[:, j] != 0].mean() # Calculate mean of non-zero values in column
if np.any(mat[:, j] != 0) # If column contains non-zero values
else 0 # Else set to 0 (avoid empty column errors)
for j in range(mat.shape[1])
])
# Calculate normalized values (exclude all-zero columns)
valid_cols = np.where(col_means != 0)[0] # Indices of columns with non-zero mean
z_col = np.zeros_like(col_means) # Initialize all-zero array
if len(valid_cols) > 0:
z_col[valid_cols] = col_means[valid_cols] / (col_means[valid_cols].mean() + 1e-8)
# Detect abnormal columns (only non-zero columns)
abnormal_cols = [
[j, col_means[j], z_col[j]]
for j in valid_cols
if z_col[j] > thres_col
]
# 2. Check for abnormal rows (excluding zeros in rows)
row_means = np.array([
mat[i, :][mat[i, :] != 0].mean() # Calculate mean of non-zero values in row
if np.any(mat[i, :] != 0) # If row contains non-zero values
else 0 # Else set to 0 (avoid empty row errors)
for i in range(mat.shape[0])
])
# Calculate normalized values (exclude all-zero rows)
valid_rows = np.where(row_means != 0)[0] # Indices of rows with non-zero mean
z_row = np.zeros_like(row_means) # Initialize all-zero array
if len(valid_rows) > 0:
z_row[valid_rows] = row_means[valid_rows] / (row_means[valid_rows].mean() + 1e-8)
# Detect abnormal rows (only non-zero rows)
abnormal_rows = [
[i, row_means[i], z_row[i]]
for i in valid_rows
if z_row[i] > thres_row
]
# 3. Check for abnormal single points (excluding zeros)
mask = mat != 0 # Create mask for non-zero values
z_all = np.zeros_like(mat, dtype=float) # Initialize all-zero array
if np.any(mask): # If non-zero values exist
# Calculate mean of non-zero values (global)
nonzero_mean = mat[mask].mean()
z_all[mask] = mat[mask] / (nonzero_mean + 1e-8) # Normalize only non-zero values
# Detect abnormal points (non-zero values with z-score > threshold)
abnormal_points = [
[i, j, mat[i, j], z_all[i, j]]
for i in range(mat.shape[0])
for j in range(mat.shape[1])
if mask[i, j] and z_all[i, j] > thres_point # Ensure non-zero and abnormal
]
# Optionally remove points in already detected abnormal rows/columns
if suppress_points_in_strong_rowscols:
strong_rows = {row[0] for row in abnormal_rows} # Use set for faster lookup
strong_cols = {col[0] for col in abnormal_cols}
abnormal_points = [
[i, j, v, z] for [i, j, v, z] in abnormal_points
if i not in strong_rows and j not in strong_cols
]
# 4. Return for automatic processing
return {
"abnormal_cols": abnormal_cols,
"abnormal_rows": abnormal_rows,
"abnormal_points": abnormal_points
}
</code_context>
<issue_to_address>
**issue (code-quality):** Low code quality found in Diagnose.diagnose\_matrix - 18% ([`low-code-quality`](https://docs.sourcery.ai/Reference/Default-Rules/comments/low-code-quality/))
<br/><details><summary>Explanation</summary>The quality score for this function is below the quality threshold of 25%.
This score is a combination of the method length, cognitive complexity and working memory.
How can you solve this?
It might be worth refactoring this function to make it shorter and more readable.
- Reduce the function length by extracting pieces of functionality out into
their own functions. This is the most important thing you can do - ideally a
function should be less than 10 lines.
- Reduce nesting, perhaps by introducing guard clauses to return early.
- Ensure that variables are tightly scoped, so that code using related concepts
sits together within the function rather than being scattered.</details>
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds a useful feature to optionally exclude zero values during matrix diagnosis. However, the implementation introduces several issues that should be addressed. There is a recurring typo ('excluing' instead of 'excluding') that affects readability and consistency. The main logic change in diagnose_matrix has led to significant code duplication, which will make future maintenance difficult. Additionally, there are several areas where performance can be improved by using vectorized NumPy operations instead of list comprehensions. I have provided specific comments and suggestions to resolve these issues.
src/deepxtrace/diagnose.py
Outdated
| row_means = np.array([ | ||
| mat[i, :][mat[i, :] != 0].mean() # Calculate mean of non-zero values in row | ||
| if np.any(mat[i, :] != 0) # If row contains non-zero values | ||
| else 0 # Else set to 0 (avoid empty row errors) | ||
| for i in range(mat.shape[0]) | ||
| ]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: yyoean <[email protected]>
Signed-off-by: yyoean <[email protected]>
Signed-off-by: yyoean <[email protected]>
src/deepxtrace/diagnose.py
Outdated
| ] | ||
| else: | ||
| # 1. Check for abnormal columns (excluding zeros in columns) | ||
| col_means = np.array([ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use this method? More efficient and concise.
col_means = np.ma.masked_equal(mat, 0).mean(axis=0).filled(0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replaced
Signed-off-by: yyoean <[email protected]>
Signed-off-by: yyoean <[email protected]>
Signed-off-by: yyoean <[email protected]>
Signed-off-by: yyoean <[email protected]>
Signed-off-by: yyoean <[email protected]>
Signed-off-by: yyoean <[email protected]>
Signed-off-by: yyoean <[email protected]>
src/deepxtrace/diagnose.py
Outdated
| if i not in strong_rows and j not in strong_cols | ||
| ] | ||
| else: | ||
| # 1. Check for abnormal columns (excluding zeros in columns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add test cases for include some zeros matrix.
Signed-off-by: yyoean <[email protected]>
Signed-off-by: yyoean <[email protected]>
src/deepxtrace/diagnose.py
Outdated
| if i not in strong_rows and j not in strong_cols | ||
| ] | ||
| elif excluding_zeros == 1: | ||
| # 3. Check for abnormal single points (excluding zeros) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
simply
Signed-off-by: yyoean <[email protected]>
Signed-off-by: yyoean <[email protected]>
Signed-off-by: yyoean <[email protected]>
Signed-off-by: yyoean <[email protected]>
src/deepxtrace/diagnose.py
Outdated
| mean_val = nonzero_values.mean() | ||
| z_all = mat / (mean_val + 1e-8) | ||
| else: | ||
| mean_val = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove useless the variable mean_val = 0
src/deepxtrace/diagnose.py
Outdated
| else: | ||
| mean_val = 0 | ||
| # avoid devide zero | ||
| z_all = np.zeros_like(mat) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
z_all = np.zeros_like(mat) is equal z_all = mat ?
Signed-off-by: yyoean <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks.
* Add exclusion of zeros in diagnose_matrix * Update src/deepxtrace/diagnose.py Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com> * Update src/deepxtrace/diagnose.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update src/deepxtrace/diagnose.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * autopep8 Signed-off-by: yyoean <[email protected]> * essential Signed-off-by: yyoean <[email protected]> * format Signed-off-by: yyoean <[email protected]> * update Signed-off-by: yyoean <[email protected]> * simplify Signed-off-by: yyoean <[email protected]> * test.py Signed-off-by: yyoean <[email protected]> * format Signed-off-by: yyoean <[email protected]> --------- Signed-off-by: yyoean <[email protected]> Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Add exclusion of zeros in diagnose_matrix
Summary by Sourcery
Enable optional exclusion of zero values in diagnose_matrix by adding a new environment variable and adapting the abnormality detection logic and method invocation accordingly.
New Features:
Enhancements:
Documentation: