Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Admin panel: App health check #10546

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

ehconitin
Copy link
Contributor

@ehconitin ehconitin commented Feb 27, 2025

closes twentyhq/core-team-issues#441
whats new -

  • app health with the proposed format
status: 
  system: {
    nodeVersion
  },
  overview: {
    totalWorkspaces,
    criticalWorkspaces,
    workspacesWithPendingMigrations,
    healthDistribution: {
      healthy,
      warning,
      critical,
    }
  },
 problematicWorkspaces: [
    {
      workspaceId,
      severity,
      pendingMigrations,
      issuesSummary: {
        structural,
        data,
        relationship
      }
    }
  ]
  • errorMessage and details seperation -- before we used to send error in details which made it difficult if we want both error and details on the front -- usecase >>
    suppose app health indicator is not healthy but still want to send details
  • stateHistory with timestamp -- this is something I introduced, not sure about this. Basically the thought process was to store the LastState of the details, incase of no connection or timeout errors. This is not yet used on the front, just the endpoint. But it could be used on the front too
  • name unifying ⁠https://discord.com/channels/1130383047699738754/1346454192776155156
  • json tree https://discord.com/channels/1130383047699738754/1346458558048501760

@ehconitin ehconitin self-assigned this Feb 27, 2025
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

This PR adds a new app health indicator to the admin panel's health monitoring system, allowing administrators to check workspace metadata migration status.

  • Added app to HealthIndicatorId enum in health-indicator-id.enum.ts for application-level health monitoring
  • Added new AppHealthIndicator class in app.health.ts that checks workspace health issues and pending migrations
  • Integrated app health checks in SettingsAdminIndicatorHealthStatusContent.tsx by adding a case for HealthIndicatorId.app
  • Updated DatabaseAndRedisHealthStatus.tsx to handle app health status with renamed variable isDatabaseRedisORAppDown
  • Added app health indicator configuration in health-indicators.constants.ts with appropriate label and description

8 file(s) reviewed, 3 comment(s)
Edit PR Review Bot Settings | Greptile

@ehconitin
Copy link
Contributor Author

@greptileai

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

(updates since last review)

Based on the provided information, I'll summarize the key changes in this PR that weren't covered in the previous review:

This PR significantly enhances the health monitoring system with improved error handling and detailed health status reporting.

  • Introduced JsonDataIndicatorHealthStatus component to replace DatabaseAndRedisHealthStatus, providing a more generic and reusable JSON tree visualization for health data
  • Separated error messages from details in AdminPanelHealthServiceData DTO, allowing both to be displayed simultaneously when a service is unhealthy
  • Added state history tracking with timestamps via new HealthStateManager utility to maintain last known state during connection issues
  • Created new health issue categorization constants (APP_HEALTH_ISSUE_CATEGORIES, APP_HEALTH_TABLE_ISSUES, APP_HEALTH_COLUMN_ISSUES, APP_HEALTH_RELATION_ISSUES) for better organization of workspace health issues
  • Unified naming conventions by changing paths from 'admin-panel' to 'server-admin' across multiple components

Note: The PR appears unrelated to the linked issue #441 which discusses cell value clearing behavior.

34 file(s) reviewed, 14 comment(s)
Edit PR Review Bot Settings | Greptile

[HealthIndicatorId.app]: {
id: HealthIndicatorId.app,
label: 'App Status',
description: 'Workspace metadata migration status check',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Description could be more comprehensive since this appears to check more than just migrations (system info, workspace health, etc)

},
databaseSize: databaseSize.size,
performance: {
cacheHitRatio: Math.round(parseFloat(cacheHitRatio.ratio)) + '%',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: cacheHitRatio could be NaN if heap_blks_hit + heap_blks_read = 0. Need to handle this edge case.

Suggested change
cacheHitRatio: Math.round(parseFloat(cacheHitRatio.ratio)) + '%',
cacheHitRatio: isNaN(parseFloat(cacheHitRatio.ratio)) ? '0%' : Math.round(parseFloat(cacheHitRatio.ratio)) + '%',

Comment on lines +83 to +88
max: parseInt(maxConnections.max_connections),
utilizationPercent: Math.round(
(parseInt(activeConnections.count) /
parseInt(maxConnections.max_connections)) *
100,
),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: utilizationPercent calculation should handle case where max_connections is 0 to prevent division by zero

Suggested change
max: parseInt(maxConnections.max_connections),
utilizationPercent: Math.round(
(parseInt(activeConnections.count) /
parseInt(maxConnections.max_connections)) *
100,
),
max: parseInt(maxConnections.max_connections),
utilizationPercent: parseInt(maxConnections.max_connections) === 0 ? 0 : Math.round(
(parseInt(activeConnections.count) /
parseInt(maxConnections.max_connections)) *
100,
),

Comment on lines +14 to +20
getStateWithAge() {
return this.lastKnownState
? {
...this.lastKnownState,
age: Date.now() - this.lastKnownState.timestamp.getTime(),
}
: 'No previous state available';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Inconsistent return types - returns either an object or string. This can cause type errors in consuming code. Should return null or a consistent object structure.

Suggested change
getStateWithAge() {
return this.lastKnownState
? {
...this.lastKnownState,
age: Date.now() - this.lastKnownState.timestamp.getTime(),
}
: 'No previous state available';
getStateWithAge() {
return this.lastKnownState
? {
...this.lastKnownState,
age: Date.now() - this.lastKnownState.timestamp.getTime(),
}
: null;

Comment on lines 93 to 98
} catch (error) {
const errorMessage =
const message =
error.message === HEALTH_ERROR_MESSAGES.REDIS_TIMEOUT
? HEALTH_ERROR_MESSAGES.REDIS_TIMEOUT
: HEALTH_ERROR_MESSAGES.REDIS_CONNECTION_FAILED;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Error message assignment could be simplified with nullish coalescing operator

return indicator.up({ details });
}

const errorMessage = `Found ${workspaceStats.filter((s) => s.severity === 'warning').length} workspaces with warnings and ${workspaceStats.filter((s) => s.severity === 'critical').length} critical workspaces`;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Avoid filtering the array twice by caching the filtered arrays or using reduce

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

App health status in admin panel
1 participant