@hsuite/health - Comprehensive System Health Monitoring

🏥 Advanced health monitoring and system diagnostics library for NestJS applications with DAG network monitoring

Enterprise-grade health monitoring solution providing real-time system resource tracking, service health checks, performance metrics collection, and specialized DAG network monitoring with event-driven updates and comprehensive diagnostics.


📚 Table of Contents


✨ Quick Start

Installation

npm install @hsuite/health

Basic Setup

import { HealthModule } from '@hsuite/health';
import { RedisClientOptions } from 'redis';

const redisOptions: RedisClientOptions = {
  socket: {
    host: 'localhost',
    port: 6379
  },
  password: 'your-redis-password',
  database: 0
};

@Module({
  imports: [
    HealthModule.forRoot(redisOptions)
  ]
})
export class AppModule {}

Health Check Usage

import { HealthService } from '@hsuite/health';

@Injectable()
export class MonitoringService {
  constructor(private healthService: HealthService) {}

  async checkSystemHealth() {
    const health = await this.healthService.check();
    console.log('System status:', health.status);
    return health;
  }
}

🏗️ Architecture

Core Component Areas

🏥 System Health Monitoring

  • Real-time Health Checks - Comprehensive system health validation

  • Service Connectivity - MongoDB, Redis, and microservice monitoring

  • Health Status Aggregation - Multi-service health state management

  • Cached Responses - Efficient health check performance optimization

📊 Resource Metrics Collection

  • CPU Monitoring - Real-time CPU utilization and multi-core tracking

  • Memory Management - Memory usage, availability, and percentage tracking

  • Disk Space Monitoring - Storage utilization and free space alerts

  • Network Metrics - Input/output traffic monitoring and analysis

🌐 DAG Network Monitoring

  • Network Health Tracking - Specialized DAG network status monitoring

  • Event-Driven Updates - Real-time threshold monitoring with events

  • Network Threshold Management - Online/offline status detection

  • Performance Optimization - Efficient network status collection

Performance Features

  • Response Caching - 1-second caching for optimal performance

  • Error Handling - Comprehensive exception management

  • Resource Optimization - Efficient OS utility integration

  • Multi-core Support - Advanced CPU usage calculations

Module Structure

src/
├── index.ts                           # Main entry point and exports
├── health.module.ts                   # Dynamic module with Redis configuration
├── health.controller.ts               # REST endpoints for health checks
├── health.service.ts                  # Core health monitoring service
├── interfaces/
│   └── infos.interface.ts             # Health metrics interfaces
├── models/
│   └── infos.model.ts                 # Health metrics model implementations
└── custom/
    └── dag.health.ts                  # DAG network health indicator

🔧 API Reference

Core Health Endpoints

All health endpoints are publicly accessible with @Public() decorator.

Health Check Endpoint

GET /health/check

  • Purpose: Comprehensive system health validation

  • Caching: 1-second response caching

  • Monitors: Redis, MongoDB, disk space, memory, DAG network, microservices

System Information Endpoint

GET /health/infos

  • Purpose: Detailed system metrics and resource utilization

  • Caching: 1-second response caching

  • Data: Platform, CPU, memory, disk, network metrics

Health Check Response Schema

interface HealthCheckResult {
  status: 'ok' | 'error';
  info: {
    [key: string]: {
      status: 'up' | 'down';
    };
  };
  error: {
    [key: string]: {
      status: 'up' | 'down';
      message?: string;
    };
  };
  details: {
    [key: string]: {
      status: 'up' | 'down';
      [key: string]: any;
    };
  };
}

System Information Schema

interface IHealthInfos {
  platform: string;    // Operating system platform
  release: string;     // OS version
  machine: string;     // Hardware identifier
  arch: string;        // CPU architecture
  uptime: number;      // System uptime in seconds
  cpu: IHealthInfosCPU;
  memory: IHealthInfosMemory;
  drive: IHealthInfosDrive;
  network: IHealthInfosNetwork;
}

Resource Metrics Tables

CPU Metrics
Type
Description

usage

number

CPU utilization percentage (0-100)

cpus

number

Number of CPU cores

speed

number

CPU clock frequency in MHz

Memory Metrics
Type
Description

totalMemMb

number

Total memory in MB

usedMemMb

number

Used memory in MB

freeMemMb

number

Free memory in MB

usedMemPercentage

number

Memory usage percentage

freeMemPercentage

number

Free memory percentage

Storage Metrics
Type
Description

totalGb

string

Total storage in GB

usedGb

string

Used storage in GB

freeGb

string

Free storage in GB

usedPercentage

string

Storage usage percentage

freePercentage

string

Free storage percentage


📖 Guides

Health Monitoring Setup Guide

Complete guide to setting up comprehensive health monitoring for your application. Comprehensive setup instructions covering health indicator configuration, system monitoring, resource tracking, service health checks, and enterprise-grade health monitoring with real-time alerts and notifications.

DAG Network Monitoring Guide

Learn how to implement and monitor DAG network health with event-driven updates. Advanced monitoring guide covering DAG network connectivity, consensus monitoring, network performance tracking, event-driven health updates, and enterprise-grade network monitoring with automated diagnostics.

Performance Optimization Guide

Best practices for optimizing health monitoring performance and resource usage. Detailed optimization guide covering monitoring efficiency, resource utilization optimization, performance tuning, scalability improvements, and enterprise-grade performance optimization for health monitoring systems.

Alert and Threshold Management Guide

Set up proactive monitoring with alerts and threshold-based notifications. Comprehensive guide for implementing alert systems, threshold configuration, notification management, escalation procedures, and enterprise-grade monitoring with automated alerting and incident response.


🎯 Examples

Comprehensive Health Monitoring Service

import { HealthService, DagHealthIndicator } from '@hsuite/health';
import { Injectable } from '@nestjs/common';

@Injectable()
export class SystemHealthMonitoringService {
  
  constructor(
    private healthService: HealthService,
    private dagHealth: DagHealthIndicator
  ) {}

  async performComprehensiveHealthCheck() {
    try {
      const healthResults = await this.healthService.check();
      const systemMetrics = await this.healthService.infos();
      const dagStatus = await this.dagHealth.isHealthy();

      const report = {
        timestamp: new Date().toISOString(),
        overallStatus: healthResults.status,
        systemHealth: {
          redis: healthResults.details.redis?.status || 'unknown',
          mongodb: healthResults.details.mongodb?.status || 'unknown',
          disk: healthResults.details.disk?.status || 'unknown',
          memory: healthResults.details.memory?.status || 'unknown'
        },
        dagNetwork: {
          status: dagStatus.dag?.status || 'unknown',
          network: dagStatus.dag?.network || 'disconnected'
        },
        resourceUtilization: {
          cpu: {
            usage: systemMetrics.cpu.usage,
            cores: systemMetrics.cpu.cpus,
            frequency: systemMetrics.cpu.speed
          },
          memory: {
            total: systemMetrics.memory.totalMemMb,
            used: systemMetrics.memory.usedMemMb,
            usagePercentage: systemMetrics.memory.usedMemPercentage,
            available: systemMetrics.memory.freeMemMb
          },
          storage: {
            total: systemMetrics.drive.totalGb,
            used: systemMetrics.drive.usedGb,
            usagePercentage: systemMetrics.drive.usedPercentage,
            available: systemMetrics.drive.freeGb
          },
          network: {
            bytesReceived: systemMetrics.network.inputBytes,
            bytesTransmitted: systemMetrics.network.outputBytes
          }
        },
        systemInfo: {
          platform: systemMetrics.platform,
          architecture: systemMetrics.arch,
          uptime: this.formatUptime(systemMetrics.uptime),
          release: systemMetrics.release
        }
      };

      // Check for critical conditions
      await this.evaluateSystemAlerts(report);

      return report;
    } catch (error) {
      throw new Error(`Health monitoring failed: ${error.message}`);
    }
  }

  async evaluateSystemAlerts(report: any) {
    const alerts = [];

    // CPU usage alerts
    if (report.resourceUtilization.cpu.usage > 90) {
      alerts.push({
        severity: 'critical',
        component: 'CPU',
        message: `High CPU utilization: ${report.resourceUtilization.cpu.usage}%`,
        threshold: 90,
        action: 'Scale resources or investigate high load processes'
      });
    } else if (report.resourceUtilization.cpu.usage > 75) {
      alerts.push({
        severity: 'warning',
        component: 'CPU',
        message: `Elevated CPU utilization: ${report.resourceUtilization.cpu.usage}%`,
        threshold: 75,
        action: 'Monitor CPU usage trends'
      });
    }

    // Memory usage alerts
    if (report.resourceUtilization.memory.usagePercentage > 95) {
      alerts.push({
        severity: 'critical',
        component: 'Memory',
        message: `Critical memory usage: ${report.resourceUtilization.memory.usagePercentage}%`,
        threshold: 95,
        action: 'Immediate memory cleanup or scaling required'
      });
    } else if (report.resourceUtilization.memory.usagePercentage > 85) {
      alerts.push({
        severity: 'warning',
        component: 'Memory',
        message: `High memory usage: ${report.resourceUtilization.memory.usagePercentage}%`,
        threshold: 85,
        action: 'Consider memory optimization or scaling'
      });
    }

    // Storage alerts
    const storageUsage = parseFloat(report.resourceUtilization.storage.usagePercentage);
    if (storageUsage > 95) {
      alerts.push({
        severity: 'critical',
        component: 'Storage',
        message: `Critical disk space: ${storageUsage}% used`,
        threshold: 95,
        action: 'Immediate cleanup or storage expansion required'
      });
    } else if (storageUsage > 85) {
      alerts.push({
        severity: 'warning',
        component: 'Storage',
        message: `Low disk space: ${storageUsage}% used`,
        threshold: 85,
        action: 'Plan for storage cleanup or expansion'
      });
    }

    // Service health alerts
    Object.entries(report.systemHealth).forEach(([service, status]) => {
      if (status !== 'up') {
        alerts.push({
          severity: 'critical',
          component: service.toUpperCase(),
          message: `Service ${service} is ${status}`,
          action: `Investigate ${service} connectivity and restart if necessary`
        });
      }
    });

    // DAG network alerts
    if (report.dagNetwork.status !== 'up') {
      alerts.push({
        severity: 'critical',
        component: 'DAG Network',
        message: `DAG network is ${report.dagNetwork.status}`,
        action: 'Check network connectivity and DAG node status'
      });
    }

    if (alerts.length > 0) {
      await this.processAlerts(alerts);
    }

    return alerts;
  }

  private async processAlerts(alerts: any[]) {
    // Process alerts - send notifications, log, etc.
    for (const alert of alerts) {
      console.warn(`[${alert.severity.toUpperCase()}] ${alert.component}: ${alert.message}`);
      console.warn(`Action: ${alert.action}`);
      
      // In production, implement:
      // - Send notifications (email, Slack, etc.)
      // - Log to monitoring systems
      // - Trigger automated responses
      // - Update dashboards
    }
  }

  private formatUptime(seconds: number): string {
    const days = Math.floor(seconds / 86400);
    const hours = Math.floor((seconds % 86400) / 3600);
    const minutes = Math.floor((seconds % 3600) / 60);
    return `${days}d ${hours}h ${minutes}m`;
  }

  async getHealthSummary() {
    try {
      const health = await this.healthService.check();
      const metrics = await this.healthService.infos();

      return {
        status: health.status,
        uptime: this.formatUptime(metrics.uptime),
        resources: {
          cpu: `${metrics.cpu.usage}% (${metrics.cpu.cpus} cores)`,
          memory: `${metrics.memory.usedMemPercentage}% used (${metrics.memory.freeMemMb}MB free)`,
          storage: `${metrics.drive.usedPercentage}% used (${metrics.drive.freeGb}GB free)`
        },
        services: Object.keys(health.details).reduce((acc, key) => {
          acc[key] = health.details[key].status;
          return acc;
        }, {}),
        lastChecked: new Date().toISOString()
      };
    } catch (error) {
      throw new Error(`Health summary generation failed: ${error.message}`);
    }
  }
}

DAG Network Monitoring Service

import { DagHealthIndicator } from '@hsuite/health';
import { Injectable, OnModuleInit } from '@nestjs/common';
import { EventEmitter2 } from '@nestjs/event-emitter';

@Injectable()
export class DAGNetworkMonitoringService implements OnModuleInit {
  
  private networkStatus: string = 'unknown';
  private lastStatusChange: Date = new Date();
  private statusHistory: Array<{ status: string; timestamp: Date }> = [];

  constructor(
    private dagHealth: DagHealthIndicator,
    private eventEmitter: EventEmitter2
  ) {}

  onModuleInit() {
    // Listen for DAG network events
    this.eventEmitter.on('smart_node.monitors.network_threshold_online', this.handleNetworkOnline.bind(this));
    this.eventEmitter.on('smart_node.monitors.network_threshold_offline', this.handleNetworkOffline.bind(this));

    // Start periodic monitoring
    this.startPeriodicMonitoring();
  }

  async checkDAGNetworkHealth() {
    try {
      const health = await this.dagHealth.isHealthy();
      
      const networkStatus = {
        status: health.dag?.status || 'unknown',
        network: health.dag?.network || 'disconnected',
        timestamp: new Date().toISOString(),
        isHealthy: health.dag?.status === 'up',
        details: health.dag
      };

      // Update internal status tracking
      if (networkStatus.status !== this.networkStatus) {
        this.handleStatusChange(networkStatus.status);
      }

      return networkStatus;
    } catch (error) {
      console.error('DAG network health check failed:', error);
      
      if (error.causes) {
        console.error('Health check causes:', error.causes);
      }

      return {
        status: 'error',
        network: 'error',
        timestamp: new Date().toISOString(),
        isHealthy: false,
        error: error.message,
        causes: error.causes
      };
    }
  }

  private handleNetworkOnline(event: any) {
    console.log('DAG Network Online Event:', event);
    this.handleStatusChange('up');
    
    // Emit custom application event
    this.eventEmitter.emit('dag.network.online', {
      timestamp: new Date(),
      previousStatus: this.networkStatus,
      event: event
    });
  }

  private handleNetworkOffline(event: any) {
    console.log('DAG Network Offline Event:', event);
    this.handleStatusChange('down');
    
    // Emit custom application event
    this.eventEmitter.emit('dag.network.offline', {
      timestamp: new Date(),
      previousStatus: this.networkStatus,
      event: event
    });
  }

  private handleStatusChange(newStatus: string) {
    const previousStatus = this.networkStatus;
    this.networkStatus = newStatus;
    this.lastStatusChange = new Date();

    // Add to status history
    this.statusHistory.push({
      status: newStatus,
      timestamp: this.lastStatusChange
    });

    // Keep only last 100 status changes
    if (this.statusHistory.length > 100) {
      this.statusHistory = this.statusHistory.slice(-100);
    }

    console.log(`DAG Network status changed: ${previousStatus} -> ${newStatus}`);
    
    // Trigger alerts for status changes
    if (newStatus === 'down') {
      this.triggerNetworkDownAlert(previousStatus);
    } else if (newStatus === 'up' && previousStatus === 'down') {
      this.triggerNetworkRecoveryAlert();
    }
  }

  private triggerNetworkDownAlert(previousStatus: string) {
    const alert = {
      severity: 'critical',
      component: 'DAG Network',
      message: 'DAG network has gone offline',
      previousStatus,
      currentStatus: this.networkStatus,
      timestamp: this.lastStatusChange,
      action: 'Investigate network connectivity and DAG node status'
    };

    console.error('DAG Network Alert:', alert);
    
    // In production, implement:
    // - Send immediate alerts
    // - Log to monitoring systems
    // - Trigger recovery procedures
  }

  private triggerNetworkRecoveryAlert() {
    const downtime = Date.now() - this.lastStatusChange.getTime();
    
    const alert = {
      severity: 'info',
      component: 'DAG Network',
      message: 'DAG network has recovered',
      currentStatus: this.networkStatus,
      timestamp: this.lastStatusChange,
      downtime: `${Math.round(downtime / 1000)} seconds`,
      action: 'Verify network stability and check for missed transactions'
    };

    console.log('DAG Network Recovery:', alert);
  }

  private startPeriodicMonitoring() {
    // Check DAG network health every 30 seconds
    setInterval(async () => {
      try {
        await this.checkDAGNetworkHealth();
      } catch (error) {
        console.error('Periodic DAG health check failed:', error);
      }
    }, 30000);
  }

  getNetworkStatusHistory() {
    return {
      currentStatus: this.networkStatus,
      lastStatusChange: this.lastStatusChange,
      statusHistory: this.statusHistory,
      uptime: this.calculateUptime(),
      downtime: this.calculateDowntime()
    };
  }

  private calculateUptime(): { total: number; percentage: number } {
    const now = Date.now();
    const oneDayAgo = now - (24 * 60 * 60 * 1000);
    
    const recentHistory = this.statusHistory.filter(
      entry => entry.timestamp.getTime() > oneDayAgo
    );

    let upTime = 0;
    for (let i = 0; i < recentHistory.length - 1; i++) {
      const current = recentHistory[i];
      const next = recentHistory[i + 1];
      
      if (current.status === 'up') {
        upTime += next.timestamp.getTime() - current.timestamp.getTime();
      }
    }

    // Add current status time if up
    if (recentHistory.length > 0 && recentHistory[recentHistory.length - 1].status === 'up') {
      upTime += now - recentHistory[recentHistory.length - 1].timestamp.getTime();
    }

    const totalTime = 24 * 60 * 60 * 1000; // 24 hours in ms
    const percentage = (upTime / totalTime) * 100;

    return {
      total: Math.round(upTime / 1000), // seconds
      percentage: Math.round(percentage * 100) / 100
    };
  }

  private calculateDowntime(): { total: number; percentage: number } {
    const uptime = this.calculateUptime();
    const totalDaySeconds = 24 * 60 * 60;
    
    return {
      total: totalDaySeconds - uptime.total,
      percentage: Math.round((100 - uptime.percentage) * 100) / 100
    };
  }
}

Advanced Redis Configuration Service

import { HealthModule } from '@hsuite/health';
import { Injectable } from '@nestjs/common';
import { RedisClientOptions } from 'redis';

@Injectable()
export class HealthModuleConfigurationService {
  
  createProductionRedisConfig(): RedisClientOptions {
    return {
      socket: {
        host: process.env.REDIS_HOST || 'redis-cluster.production.com',
        port: parseInt(process.env.REDIS_PORT || '6379'),
        tls: process.env.REDIS_TLS === 'true',
        connectTimeout: 10000,
        commandTimeout: 5000,
        lazyConnect: true
      },
      password: process.env.REDIS_PASSWORD,
      database: parseInt(process.env.REDIS_DATABASE || '0'),
      enableReadyCheck: true,
      maxRetriesPerRequest: 3,
      retryDelayOnFailover: 100,
      enableOfflineQueue: false,
      pingInterval: 30000,
      // Health check specific settings
      keepAlive: true,
      family: 4
    };
  }

  createDevelopmentRedisConfig(): RedisClientOptions {
    return {
      socket: {
        host: 'localhost',
        port: 6379,
        connectTimeout: 5000
      },
      database: 0,
      enableReadyCheck: false,
      maxRetriesPerRequest: 1
    };
  }

  createHighAvailabilityRedisConfig(): RedisClientOptions {
    return {
      socket: {
        host: process.env.REDIS_SENTINEL_HOST || 'redis-sentinel.internal',
        port: parseInt(process.env.REDIS_SENTINEL_PORT || '26379'),
        connectTimeout: 5000,
        commandTimeout: 3000
      },
      password: process.env.REDIS_PASSWORD,
      enableReadyCheck: true,
      maxRetriesPerRequest: 5,
      retryDelayOnFailover: 100,
      enableOfflineQueue: false,
      // Sentinel configuration
      sentinels: [
        { host: 'sentinel1.internal', port: 26379 },
        { host: 'sentinel2.internal', port: 26379 },
        { host: 'sentinel3.internal', port: 26379 }
      ],
      name: 'mymaster', // Redis master name in Sentinel
      // Failover settings
      reconnectOnError: (err: Error) => {
        const targetError = 'READONLY';
        return err.message.includes(targetError);
      }
    };
  }

  async validateRedisConfiguration(config: RedisClientOptions): Promise<boolean> {
    try {
      const Redis = require('ioredis');
      const redis = new Redis(config);
      
      // Test basic operations
      await redis.ping();
      await redis.set('health:test', 'validation', 'EX', 10);
      const result = await redis.get('health:test');
      await redis.del('health:test');
      
      await redis.quit();
      
      return result === 'validation';
    } catch (error) {
      console.error('Redis configuration validation failed:', error);
      return false;
    }
  }

  createConfigurationBasedOnEnvironment(): RedisClientOptions {
    const environment = process.env.NODE_ENV || 'development';
    
    switch (environment) {
      case 'production':
        return this.createProductionRedisConfig();
      case 'staging':
        return this.createHighAvailabilityRedisConfig();
      default:
        return this.createDevelopmentRedisConfig();
    }
  }
}

// Usage in app module
@Module({
  imports: [
    HealthModule.forRootAsync({
      imports: [ConfigModule],
      useFactory: async (
        configService: ConfigService,
        healthConfig: HealthModuleConfigurationService
      ) => {
        const redisConfig = healthConfig.createConfigurationBasedOnEnvironment();
        
        // Validate configuration before using
        const isValid = await healthConfig.validateRedisConfiguration(redisConfig);
        
        if (!isValid) {
          console.warn('Redis configuration validation failed, using fallback');
          return healthConfig.createDevelopmentRedisConfig();
        }
        
        return redisConfig;
      },
      inject: [ConfigService, HealthModuleConfigurationService]
    })
  ],
  providers: [HealthModuleConfigurationService]
})
export class AppModule {}

Health Metrics Analytics Service

import { HealthService } from '@hsuite/health';
import { Injectable } from '@nestjs/common';

@Injectable()
export class HealthMetricsAnalyticsService {
  
  private metricsHistory: Array<{ timestamp: Date; metrics: any }> = [];
  private readonly maxHistorySize = 1000;

  constructor(private healthService: HealthService) {
    // Start collecting metrics every minute
    this.startMetricsCollection();
  }

  async analyzeSystemPerformance(timeRange: { start: Date; end: Date }) {
    try {
      const relevantMetrics = this.metricsHistory.filter(
        entry => entry.timestamp >= timeRange.start && entry.timestamp <= timeRange.end
      );

      if (relevantMetrics.length === 0) {
        throw new Error('No metrics available for the specified time range');
      }

      const analysis = {
        timeRange,
        dataPoints: relevantMetrics.length,
        cpu: this.analyzeCPUMetrics(relevantMetrics),
        memory: this.analyzeMemoryMetrics(relevantMetrics),
        storage: this.analyzeStorageMetrics(relevantMetrics),
        network: this.analyzeNetworkMetrics(relevantMetrics),
        trends: this.calculateTrends(relevantMetrics),
        recommendations: []
      };

      // Generate recommendations
      analysis.recommendations = this.generateRecommendations(analysis);

      return analysis;
    } catch (error) {
      throw new Error(`Performance analysis failed: ${error.message}`);
    }
  }

  private startMetricsCollection() {
    setInterval(async () => {
      try {
        const metrics = await this.healthService.infos();
        
        this.metricsHistory.push({
          timestamp: new Date(),
          metrics: metrics
        });

        // Maintain history size
        if (this.metricsHistory.length > this.maxHistorySize) {
          this.metricsHistory = this.metricsHistory.slice(-this.maxHistorySize);
        }
      } catch (error) {
        console.error('Metrics collection failed:', error);
      }
    }, 60000); // Collect every minute
  }

  private analyzeCPUMetrics(metricsData: any[]) {
    const cpuValues = metricsData.map(entry => entry.metrics.cpu.usage);
    
    return {
      average: this.calculateAverage(cpuValues),
      min: Math.min(...cpuValues),
      max: Math.max(...cpuValues),
      median: this.calculateMedian(cpuValues),
      percentile95: this.calculatePercentile(cpuValues, 95),
      spikes: cpuValues.filter(value => value > 90).length,
      trend: this.calculateTrend(cpuValues)
    };
  }

  private analyzeMemoryMetrics(metricsData: any[]) {
    const memoryValues = metricsData.map(entry => entry.metrics.memory.usedMemPercentage);
    
    return {
      average: this.calculateAverage(memoryValues),
      min: Math.min(...memoryValues),
      max: Math.max(...memoryValues),
      median: this.calculateMedian(memoryValues),
      percentile95: this.calculatePercentile(memoryValues, 95),
      criticalEvents: memoryValues.filter(value => value > 95).length,
      trend: this.calculateTrend(memoryValues)
    };
  }

  private analyzeStorageMetrics(metricsData: any[]) {
    const storageValues = metricsData.map(entry => parseFloat(entry.metrics.drive.usedPercentage));
    
    return {
      average: this.calculateAverage(storageValues),
      min: Math.min(...storageValues),
      max: Math.max(...storageValues),
      median: this.calculateMedian(storageValues),
      growth: this.calculateStorageGrowth(metricsData),
      trend: this.calculateTrend(storageValues)
    };
  }

  private analyzeNetworkMetrics(metricsData: any[]) {
    const inputBytes = metricsData.map(entry => entry.metrics.network.inputBytes);
    const outputBytes = metricsData.map(entry => entry.metrics.network.outputBytes);
    
    return {
      input: {
        average: this.calculateAverage(inputBytes),
        peak: Math.max(...inputBytes),
        trend: this.calculateTrend(inputBytes)
      },
      output: {
        average: this.calculateAverage(outputBytes),
        peak: Math.max(...outputBytes),
        trend: this.calculateTrend(outputBytes)
      },
      totalTraffic: {
        average: this.calculateAverage(inputBytes.map((val, i) => val + outputBytes[i])),
        peak: Math.max(...inputBytes.map((val, i) => val + outputBytes[i]))
      }
    };
  }

  private calculateTrends(metricsData: any[]) {
    const timePoints = metricsData.map(entry => entry.timestamp.getTime());
    const startTime = Math.min(...timePoints);
    const endTime = Math.max(...timePoints);
    const duration = endTime - startTime;

    return {
      analysisWindowMinutes: Math.round(duration / (1000 * 60)),
      dataPointsCollected: metricsData.length,
      averageInterval: Math.round(duration / metricsData.length / 1000), // seconds
      systemStability: this.calculateSystemStability(metricsData)
    };
  }

  private generateRecommendations(analysis: any): string[] {
    const recommendations = [];

    // CPU recommendations
    if (analysis.cpu.average > 80) {
      recommendations.push('Consider CPU optimization or scaling - average usage is high');
    }
    if (analysis.cpu.spikes > analysis.dataPoints * 0.1) {
      recommendations.push('Investigate CPU spikes - frequent high usage detected');
    }

    // Memory recommendations
    if (analysis.memory.average > 85) {
      recommendations.push('Memory usage is consistently high - consider optimization');
    }
    if (analysis.memory.criticalEvents > 0) {
      recommendations.push('Critical memory events detected - immediate attention required');
    }

    // Storage recommendations
    if (analysis.storage.average > 85) {
      recommendations.push('Storage usage is high - plan for cleanup or expansion');
    }
    if (analysis.storage.trend > 0.1) {
      recommendations.push('Storage usage is growing rapidly - monitor closely');
    }

    // Network recommendations
    if (analysis.network.totalTraffic.peak > analysis.network.totalTraffic.average * 10) {
      recommendations.push('Network traffic spikes detected - investigate unusual activity');
    }

    return recommendations;
  }

  // Utility calculation methods
  private calculateAverage(values: number[]): number {
    return values.reduce((sum, val) => sum + val, 0) / values.length;
  }

  private calculateMedian(values: number[]): number {
    const sorted = [...values].sort((a, b) => a - b);
    const mid = Math.floor(sorted.length / 2);
    return sorted.length % 2 === 0 
      ? (sorted[mid - 1] + sorted[mid]) / 2 
      : sorted[mid];
  }

  private calculatePercentile(values: number[], percentile: number): number {
    const sorted = [...values].sort((a, b) => a - b);
    const index = Math.ceil((percentile / 100) * sorted.length) - 1;
    return sorted[index];
  }

  private calculateTrend(values: number[]): number {
    if (values.length < 2) return 0;
    
    // Simple linear trend calculation
    const firstHalf = values.slice(0, Math.floor(values.length / 2));
    const secondHalf = values.slice(Math.floor(values.length / 2));
    
    const firstAvg = this.calculateAverage(firstHalf);
    const secondAvg = this.calculateAverage(secondHalf);
    
    return (secondAvg - firstAvg) / firstAvg;
  }

  private calculateStorageGrowth(metricsData: any[]): number {
    if (metricsData.length < 2) return 0;
    
    const first = parseFloat(metricsData[0].metrics.drive.usedGb);
    const last = parseFloat(metricsData[metricsData.length - 1].metrics.drive.usedGb);
    
    return last - first; // GB growth
  }

  private calculateSystemStability(metricsData: any[]): string {
    // Calculate coefficient of variation for CPU and memory
    const cpuValues = metricsData.map(entry => entry.metrics.cpu.usage);
    const memoryValues = metricsData.map(entry => entry.metrics.memory.usedMemPercentage);
    
    const cpuStdDev = this.calculateStandardDeviation(cpuValues);
    const memoryStdDev = this.calculateStandardDeviation(memoryValues);
    
    const cpuCV = cpuStdDev / this.calculateAverage(cpuValues);
    const memoryCV = memoryStdDev / this.calculateAverage(memoryValues);
    
    const avgCV = (cpuCV + memoryCV) / 2;
    
    if (avgCV < 0.1) return 'Excellent';
    if (avgCV < 0.2) return 'Good';
    if (avgCV < 0.3) return 'Fair';
    return 'Poor';
  }

  private calculateStandardDeviation(values: number[]): number {
    const avg = this.calculateAverage(values);
    const squaredDiffs = values.map(val => Math.pow(val - avg, 2));
    const avgSquaredDiff = this.calculateAverage(squaredDiffs);
    return Math.sqrt(avgSquaredDiff);
  }

  getMetricsHistory() {
    return {
      totalDataPoints: this.metricsHistory.length,
      oldestEntry: this.metricsHistory.length > 0 ? this.metricsHistory[0].timestamp : null,
      newestEntry: this.metricsHistory.length > 0 ? this.metricsHistory[this.metricsHistory.length - 1].timestamp : null,
      memoryUsage: `${this.metricsHistory.length} entries (max: ${this.maxHistorySize})`
    };
  }
}

🔗 Integration

Required Dependencies

{
  "@nestjs/common": "^10.4.2",
  "@nestjs/core": "^10.4.2",
  "@hsuite/nestjs-swagger": "^1.0.3",
  "@compodoc/compodoc": "^1.1.23"
}

Module Integration

import { Module } from '@nestjs/common';
import { HealthModule, HealthService, DagHealthIndicator } from '@hsuite/health';

@Module({
  imports: [
    HealthModule.forRoot({
      socket: {
        host: process.env.REDIS_HOST || 'localhost',
        port: parseInt(process.env.REDIS_PORT || '6379')
      },
      password: process.env.REDIS_PASSWORD,
      database: parseInt(process.env.REDIS_DATABASE || '0')
    })
  ],
  providers: [
    SystemHealthMonitoringService,
    DAGNetworkMonitoringService,
    HealthModuleConfigurationService,
    HealthMetricsAnalyticsService
  ],
  exports: [
    HealthService,
    DagHealthIndicator,
    SystemHealthMonitoringService,
    DAGNetworkMonitoringService,
    HealthMetricsAnalyticsService
  ]
})
export class HealthMonitoringModule {}

Documentation Generation

# Generate comprehensive documentation
npm run compodoc

# Generate documentation with coverage report
npm run compodoc:coverage

Environment Configuration

# Redis Configuration
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_PASSWORD=your-secure-password
REDIS_DATABASE=0
REDIS_TLS=false

# Health Monitoring Settings
HEALTH_CHECK_INTERVAL=30000
METRICS_COLLECTION_INTERVAL=60000
ALERT_THRESHOLD_CPU=90
ALERT_THRESHOLD_MEMORY=85
ALERT_THRESHOLD_STORAGE=90

# DAG Network Settings
DAG_NETWORK_MONITORING=true
DAG_STATUS_CACHE_TTL=1000

Integration with HSuite Ecosystem

// Complete integration with other HSuite modules
import { HealthModule, HealthService } from '@hsuite/health';
import { AuthModule } from '@hsuite/auth';
import { SmartNetworkModule } from '@hsuite/smart-network';

@Module({
  imports: [
    AuthModule,
    SmartNetworkModule,
    HealthModule.forRootAsync({
      imports: [ConfigModule],
      useFactory: async (configService: ConfigService) => ({
        socket: {
          host: configService.get('REDIS_HOST', 'localhost'),
          port: configService.get('REDIS_PORT', 6379)
        },
        password: configService.get('REDIS_PASSWORD'),
        database: configService.get('REDIS_DATABASE', 0)
      }),
      inject: [ConfigService]
    })
  ]
})
export class HealthEcosystemModule {}

@Injectable()
export class IntegratedHealthService {
  constructor(
    private healthService: HealthService,
    private authService: AuthService,
    private networkService: SmartNetworkService
  ) {}

  async getComprehensiveSystemStatus() {
    // 1. Get basic health status
    const health = await this.healthService.check();
    
    // 2. Get system metrics
    const metrics = await this.healthService.infos();
    
    // 3. Check authentication service health
    const authHealth = await this.authService.getHealthStatus();
    
    // 4. Check network service health
    const networkHealth = await this.networkService.getNetworkStatus();

    return {
      timestamp: new Date().toISOString(),
      overallStatus: health.status,
      components: {
        system: health.details,
        authentication: authHealth,
        network: networkHealth
      },
      metrics: {
        cpu: metrics.cpu,
        memory: metrics.memory,
        storage: metrics.drive,
        network: metrics.network,
        uptime: metrics.uptime
      },
      platform: {
        os: metrics.platform,
        architecture: metrics.arch,
        release: metrics.release
      }
    };
  }
}

Performance Considerations

📊 Caching Strategy

  • 1-Second Caching - Health checks and metrics responses cached for optimal performance

  • Memory Management - Efficient resource metrics collection with OS utilities

  • Event-Driven Updates - DAG network status uses events to reduce polling overhead

🔧 Optimization Features

  • Multi-core CPU Calculations - Advanced CPU usage calculations for multi-core systems

  • Efficient Collection - Optimized OS utility integration for resource monitoring

  • Connection Pooling - Redis connection optimization for health checks

🛡️ Error Handling

  • Comprehensive Exception Management - Proper error types and detailed messages

  • Graceful Degradation - Fallback mechanisms for failed health checks

  • Service Isolation - Individual service failures don't affect overall monitoring


🏥 Enterprise Health Monitoring: Comprehensive system diagnostics with real-time resource tracking and specialized DAG network monitoring.

📊 Advanced Analytics: Performance analysis, trend calculation, and intelligent recommendations for system optimization.

🌐 DAG Network Integration: Event-driven network monitoring with threshold management and automated status updates.


Built with ❤️ by the HbarSuite Team Copyright © 2024 HbarSuite. All rights reserved.

Last updated