@hsuite/health - Comprehensive System Health Monitoring
🏥 Advanced health monitoring and system diagnostics library for NestJS applications with DAG network monitoring
Enterprise-grade health monitoring solution providing real-time system resource tracking, service health checks, performance metrics collection, and specialized DAG network monitoring with event-driven updates and comprehensive diagnostics.
📚 Table of Contents
✨ Quick Start
Installation
npm install @hsuite/health
Basic Setup
import { HealthModule } from '@hsuite/health';
import { RedisClientOptions } from 'redis';
const redisOptions: RedisClientOptions = {
socket: {
host: 'localhost',
port: 6379
},
password: 'your-redis-password',
database: 0
};
@Module({
imports: [
HealthModule.forRoot(redisOptions)
]
})
export class AppModule {}
Health Check Usage
import { HealthService } from '@hsuite/health';
@Injectable()
export class MonitoringService {
constructor(private healthService: HealthService) {}
async checkSystemHealth() {
const health = await this.healthService.check();
console.log('System status:', health.status);
return health;
}
}
🏗️ Architecture
Core Component Areas
🏥 System Health Monitoring
Real-time Health Checks - Comprehensive system health validation
Service Connectivity - MongoDB, Redis, and microservice monitoring
Health Status Aggregation - Multi-service health state management
Cached Responses - Efficient health check performance optimization
📊 Resource Metrics Collection
CPU Monitoring - Real-time CPU utilization and multi-core tracking
Memory Management - Memory usage, availability, and percentage tracking
Disk Space Monitoring - Storage utilization and free space alerts
Network Metrics - Input/output traffic monitoring and analysis
🌐 DAG Network Monitoring
Network Health Tracking - Specialized DAG network status monitoring
Event-Driven Updates - Real-time threshold monitoring with events
Network Threshold Management - Online/offline status detection
Performance Optimization - Efficient network status collection
⚡ Performance Features
Response Caching - 1-second caching for optimal performance
Error Handling - Comprehensive exception management
Resource Optimization - Efficient OS utility integration
Multi-core Support - Advanced CPU usage calculations
Module Structure
src/
├── index.ts # Main entry point and exports
├── health.module.ts # Dynamic module with Redis configuration
├── health.controller.ts # REST endpoints for health checks
├── health.service.ts # Core health monitoring service
├── interfaces/
│ └── infos.interface.ts # Health metrics interfaces
├── models/
│ └── infos.model.ts # Health metrics model implementations
└── custom/
└── dag.health.ts # DAG network health indicator
🔧 API Reference
Core Health Endpoints
All health endpoints are publicly accessible with @Public()
decorator.
Health Check Endpoint
GET /health/check
Purpose: Comprehensive system health validation
Caching: 1-second response caching
Monitors: Redis, MongoDB, disk space, memory, DAG network, microservices
System Information Endpoint
GET /health/infos
Purpose: Detailed system metrics and resource utilization
Caching: 1-second response caching
Data: Platform, CPU, memory, disk, network metrics
Health Check Response Schema
interface HealthCheckResult {
status: 'ok' | 'error';
info: {
[key: string]: {
status: 'up' | 'down';
};
};
error: {
[key: string]: {
status: 'up' | 'down';
message?: string;
};
};
details: {
[key: string]: {
status: 'up' | 'down';
[key: string]: any;
};
};
}
System Information Schema
interface IHealthInfos {
platform: string; // Operating system platform
release: string; // OS version
machine: string; // Hardware identifier
arch: string; // CPU architecture
uptime: number; // System uptime in seconds
cpu: IHealthInfosCPU;
memory: IHealthInfosMemory;
drive: IHealthInfosDrive;
network: IHealthInfosNetwork;
}
Resource Metrics Tables
usage
number
CPU utilization percentage (0-100)
cpus
number
Number of CPU cores
speed
number
CPU clock frequency in MHz
totalMemMb
number
Total memory in MB
usedMemMb
number
Used memory in MB
freeMemMb
number
Free memory in MB
usedMemPercentage
number
Memory usage percentage
freeMemPercentage
number
Free memory percentage
totalGb
string
Total storage in GB
usedGb
string
Used storage in GB
freeGb
string
Free storage in GB
usedPercentage
string
Storage usage percentage
freePercentage
string
Free storage percentage
📖 Guides
Health Monitoring Setup Guide
Complete guide to setting up comprehensive health monitoring for your application. Comprehensive setup instructions covering health indicator configuration, system monitoring, resource tracking, service health checks, and enterprise-grade health monitoring with real-time alerts and notifications.
DAG Network Monitoring Guide
Learn how to implement and monitor DAG network health with event-driven updates. Advanced monitoring guide covering DAG network connectivity, consensus monitoring, network performance tracking, event-driven health updates, and enterprise-grade network monitoring with automated diagnostics.
Performance Optimization Guide
Best practices for optimizing health monitoring performance and resource usage. Detailed optimization guide covering monitoring efficiency, resource utilization optimization, performance tuning, scalability improvements, and enterprise-grade performance optimization for health monitoring systems.
Alert and Threshold Management Guide
Set up proactive monitoring with alerts and threshold-based notifications. Comprehensive guide for implementing alert systems, threshold configuration, notification management, escalation procedures, and enterprise-grade monitoring with automated alerting and incident response.
🎯 Examples
Comprehensive Health Monitoring Service
import { HealthService, DagHealthIndicator } from '@hsuite/health';
import { Injectable } from '@nestjs/common';
@Injectable()
export class SystemHealthMonitoringService {
constructor(
private healthService: HealthService,
private dagHealth: DagHealthIndicator
) {}
async performComprehensiveHealthCheck() {
try {
const healthResults = await this.healthService.check();
const systemMetrics = await this.healthService.infos();
const dagStatus = await this.dagHealth.isHealthy();
const report = {
timestamp: new Date().toISOString(),
overallStatus: healthResults.status,
systemHealth: {
redis: healthResults.details.redis?.status || 'unknown',
mongodb: healthResults.details.mongodb?.status || 'unknown',
disk: healthResults.details.disk?.status || 'unknown',
memory: healthResults.details.memory?.status || 'unknown'
},
dagNetwork: {
status: dagStatus.dag?.status || 'unknown',
network: dagStatus.dag?.network || 'disconnected'
},
resourceUtilization: {
cpu: {
usage: systemMetrics.cpu.usage,
cores: systemMetrics.cpu.cpus,
frequency: systemMetrics.cpu.speed
},
memory: {
total: systemMetrics.memory.totalMemMb,
used: systemMetrics.memory.usedMemMb,
usagePercentage: systemMetrics.memory.usedMemPercentage,
available: systemMetrics.memory.freeMemMb
},
storage: {
total: systemMetrics.drive.totalGb,
used: systemMetrics.drive.usedGb,
usagePercentage: systemMetrics.drive.usedPercentage,
available: systemMetrics.drive.freeGb
},
network: {
bytesReceived: systemMetrics.network.inputBytes,
bytesTransmitted: systemMetrics.network.outputBytes
}
},
systemInfo: {
platform: systemMetrics.platform,
architecture: systemMetrics.arch,
uptime: this.formatUptime(systemMetrics.uptime),
release: systemMetrics.release
}
};
// Check for critical conditions
await this.evaluateSystemAlerts(report);
return report;
} catch (error) {
throw new Error(`Health monitoring failed: ${error.message}`);
}
}
async evaluateSystemAlerts(report: any) {
const alerts = [];
// CPU usage alerts
if (report.resourceUtilization.cpu.usage > 90) {
alerts.push({
severity: 'critical',
component: 'CPU',
message: `High CPU utilization: ${report.resourceUtilization.cpu.usage}%`,
threshold: 90,
action: 'Scale resources or investigate high load processes'
});
} else if (report.resourceUtilization.cpu.usage > 75) {
alerts.push({
severity: 'warning',
component: 'CPU',
message: `Elevated CPU utilization: ${report.resourceUtilization.cpu.usage}%`,
threshold: 75,
action: 'Monitor CPU usage trends'
});
}
// Memory usage alerts
if (report.resourceUtilization.memory.usagePercentage > 95) {
alerts.push({
severity: 'critical',
component: 'Memory',
message: `Critical memory usage: ${report.resourceUtilization.memory.usagePercentage}%`,
threshold: 95,
action: 'Immediate memory cleanup or scaling required'
});
} else if (report.resourceUtilization.memory.usagePercentage > 85) {
alerts.push({
severity: 'warning',
component: 'Memory',
message: `High memory usage: ${report.resourceUtilization.memory.usagePercentage}%`,
threshold: 85,
action: 'Consider memory optimization or scaling'
});
}
// Storage alerts
const storageUsage = parseFloat(report.resourceUtilization.storage.usagePercentage);
if (storageUsage > 95) {
alerts.push({
severity: 'critical',
component: 'Storage',
message: `Critical disk space: ${storageUsage}% used`,
threshold: 95,
action: 'Immediate cleanup or storage expansion required'
});
} else if (storageUsage > 85) {
alerts.push({
severity: 'warning',
component: 'Storage',
message: `Low disk space: ${storageUsage}% used`,
threshold: 85,
action: 'Plan for storage cleanup or expansion'
});
}
// Service health alerts
Object.entries(report.systemHealth).forEach(([service, status]) => {
if (status !== 'up') {
alerts.push({
severity: 'critical',
component: service.toUpperCase(),
message: `Service ${service} is ${status}`,
action: `Investigate ${service} connectivity and restart if necessary`
});
}
});
// DAG network alerts
if (report.dagNetwork.status !== 'up') {
alerts.push({
severity: 'critical',
component: 'DAG Network',
message: `DAG network is ${report.dagNetwork.status}`,
action: 'Check network connectivity and DAG node status'
});
}
if (alerts.length > 0) {
await this.processAlerts(alerts);
}
return alerts;
}
private async processAlerts(alerts: any[]) {
// Process alerts - send notifications, log, etc.
for (const alert of alerts) {
console.warn(`[${alert.severity.toUpperCase()}] ${alert.component}: ${alert.message}`);
console.warn(`Action: ${alert.action}`);
// In production, implement:
// - Send notifications (email, Slack, etc.)
// - Log to monitoring systems
// - Trigger automated responses
// - Update dashboards
}
}
private formatUptime(seconds: number): string {
const days = Math.floor(seconds / 86400);
const hours = Math.floor((seconds % 86400) / 3600);
const minutes = Math.floor((seconds % 3600) / 60);
return `${days}d ${hours}h ${minutes}m`;
}
async getHealthSummary() {
try {
const health = await this.healthService.check();
const metrics = await this.healthService.infos();
return {
status: health.status,
uptime: this.formatUptime(metrics.uptime),
resources: {
cpu: `${metrics.cpu.usage}% (${metrics.cpu.cpus} cores)`,
memory: `${metrics.memory.usedMemPercentage}% used (${metrics.memory.freeMemMb}MB free)`,
storage: `${metrics.drive.usedPercentage}% used (${metrics.drive.freeGb}GB free)`
},
services: Object.keys(health.details).reduce((acc, key) => {
acc[key] = health.details[key].status;
return acc;
}, {}),
lastChecked: new Date().toISOString()
};
} catch (error) {
throw new Error(`Health summary generation failed: ${error.message}`);
}
}
}
DAG Network Monitoring Service
import { DagHealthIndicator } from '@hsuite/health';
import { Injectable, OnModuleInit } from '@nestjs/common';
import { EventEmitter2 } from '@nestjs/event-emitter';
@Injectable()
export class DAGNetworkMonitoringService implements OnModuleInit {
private networkStatus: string = 'unknown';
private lastStatusChange: Date = new Date();
private statusHistory: Array<{ status: string; timestamp: Date }> = [];
constructor(
private dagHealth: DagHealthIndicator,
private eventEmitter: EventEmitter2
) {}
onModuleInit() {
// Listen for DAG network events
this.eventEmitter.on('smart_node.monitors.network_threshold_online', this.handleNetworkOnline.bind(this));
this.eventEmitter.on('smart_node.monitors.network_threshold_offline', this.handleNetworkOffline.bind(this));
// Start periodic monitoring
this.startPeriodicMonitoring();
}
async checkDAGNetworkHealth() {
try {
const health = await this.dagHealth.isHealthy();
const networkStatus = {
status: health.dag?.status || 'unknown',
network: health.dag?.network || 'disconnected',
timestamp: new Date().toISOString(),
isHealthy: health.dag?.status === 'up',
details: health.dag
};
// Update internal status tracking
if (networkStatus.status !== this.networkStatus) {
this.handleStatusChange(networkStatus.status);
}
return networkStatus;
} catch (error) {
console.error('DAG network health check failed:', error);
if (error.causes) {
console.error('Health check causes:', error.causes);
}
return {
status: 'error',
network: 'error',
timestamp: new Date().toISOString(),
isHealthy: false,
error: error.message,
causes: error.causes
};
}
}
private handleNetworkOnline(event: any) {
console.log('DAG Network Online Event:', event);
this.handleStatusChange('up');
// Emit custom application event
this.eventEmitter.emit('dag.network.online', {
timestamp: new Date(),
previousStatus: this.networkStatus,
event: event
});
}
private handleNetworkOffline(event: any) {
console.log('DAG Network Offline Event:', event);
this.handleStatusChange('down');
// Emit custom application event
this.eventEmitter.emit('dag.network.offline', {
timestamp: new Date(),
previousStatus: this.networkStatus,
event: event
});
}
private handleStatusChange(newStatus: string) {
const previousStatus = this.networkStatus;
this.networkStatus = newStatus;
this.lastStatusChange = new Date();
// Add to status history
this.statusHistory.push({
status: newStatus,
timestamp: this.lastStatusChange
});
// Keep only last 100 status changes
if (this.statusHistory.length > 100) {
this.statusHistory = this.statusHistory.slice(-100);
}
console.log(`DAG Network status changed: ${previousStatus} -> ${newStatus}`);
// Trigger alerts for status changes
if (newStatus === 'down') {
this.triggerNetworkDownAlert(previousStatus);
} else if (newStatus === 'up' && previousStatus === 'down') {
this.triggerNetworkRecoveryAlert();
}
}
private triggerNetworkDownAlert(previousStatus: string) {
const alert = {
severity: 'critical',
component: 'DAG Network',
message: 'DAG network has gone offline',
previousStatus,
currentStatus: this.networkStatus,
timestamp: this.lastStatusChange,
action: 'Investigate network connectivity and DAG node status'
};
console.error('DAG Network Alert:', alert);
// In production, implement:
// - Send immediate alerts
// - Log to monitoring systems
// - Trigger recovery procedures
}
private triggerNetworkRecoveryAlert() {
const downtime = Date.now() - this.lastStatusChange.getTime();
const alert = {
severity: 'info',
component: 'DAG Network',
message: 'DAG network has recovered',
currentStatus: this.networkStatus,
timestamp: this.lastStatusChange,
downtime: `${Math.round(downtime / 1000)} seconds`,
action: 'Verify network stability and check for missed transactions'
};
console.log('DAG Network Recovery:', alert);
}
private startPeriodicMonitoring() {
// Check DAG network health every 30 seconds
setInterval(async () => {
try {
await this.checkDAGNetworkHealth();
} catch (error) {
console.error('Periodic DAG health check failed:', error);
}
}, 30000);
}
getNetworkStatusHistory() {
return {
currentStatus: this.networkStatus,
lastStatusChange: this.lastStatusChange,
statusHistory: this.statusHistory,
uptime: this.calculateUptime(),
downtime: this.calculateDowntime()
};
}
private calculateUptime(): { total: number; percentage: number } {
const now = Date.now();
const oneDayAgo = now - (24 * 60 * 60 * 1000);
const recentHistory = this.statusHistory.filter(
entry => entry.timestamp.getTime() > oneDayAgo
);
let upTime = 0;
for (let i = 0; i < recentHistory.length - 1; i++) {
const current = recentHistory[i];
const next = recentHistory[i + 1];
if (current.status === 'up') {
upTime += next.timestamp.getTime() - current.timestamp.getTime();
}
}
// Add current status time if up
if (recentHistory.length > 0 && recentHistory[recentHistory.length - 1].status === 'up') {
upTime += now - recentHistory[recentHistory.length - 1].timestamp.getTime();
}
const totalTime = 24 * 60 * 60 * 1000; // 24 hours in ms
const percentage = (upTime / totalTime) * 100;
return {
total: Math.round(upTime / 1000), // seconds
percentage: Math.round(percentage * 100) / 100
};
}
private calculateDowntime(): { total: number; percentage: number } {
const uptime = this.calculateUptime();
const totalDaySeconds = 24 * 60 * 60;
return {
total: totalDaySeconds - uptime.total,
percentage: Math.round((100 - uptime.percentage) * 100) / 100
};
}
}
Advanced Redis Configuration Service
import { HealthModule } from '@hsuite/health';
import { Injectable } from '@nestjs/common';
import { RedisClientOptions } from 'redis';
@Injectable()
export class HealthModuleConfigurationService {
createProductionRedisConfig(): RedisClientOptions {
return {
socket: {
host: process.env.REDIS_HOST || 'redis-cluster.production.com',
port: parseInt(process.env.REDIS_PORT || '6379'),
tls: process.env.REDIS_TLS === 'true',
connectTimeout: 10000,
commandTimeout: 5000,
lazyConnect: true
},
password: process.env.REDIS_PASSWORD,
database: parseInt(process.env.REDIS_DATABASE || '0'),
enableReadyCheck: true,
maxRetriesPerRequest: 3,
retryDelayOnFailover: 100,
enableOfflineQueue: false,
pingInterval: 30000,
// Health check specific settings
keepAlive: true,
family: 4
};
}
createDevelopmentRedisConfig(): RedisClientOptions {
return {
socket: {
host: 'localhost',
port: 6379,
connectTimeout: 5000
},
database: 0,
enableReadyCheck: false,
maxRetriesPerRequest: 1
};
}
createHighAvailabilityRedisConfig(): RedisClientOptions {
return {
socket: {
host: process.env.REDIS_SENTINEL_HOST || 'redis-sentinel.internal',
port: parseInt(process.env.REDIS_SENTINEL_PORT || '26379'),
connectTimeout: 5000,
commandTimeout: 3000
},
password: process.env.REDIS_PASSWORD,
enableReadyCheck: true,
maxRetriesPerRequest: 5,
retryDelayOnFailover: 100,
enableOfflineQueue: false,
// Sentinel configuration
sentinels: [
{ host: 'sentinel1.internal', port: 26379 },
{ host: 'sentinel2.internal', port: 26379 },
{ host: 'sentinel3.internal', port: 26379 }
],
name: 'mymaster', // Redis master name in Sentinel
// Failover settings
reconnectOnError: (err: Error) => {
const targetError = 'READONLY';
return err.message.includes(targetError);
}
};
}
async validateRedisConfiguration(config: RedisClientOptions): Promise<boolean> {
try {
const Redis = require('ioredis');
const redis = new Redis(config);
// Test basic operations
await redis.ping();
await redis.set('health:test', 'validation', 'EX', 10);
const result = await redis.get('health:test');
await redis.del('health:test');
await redis.quit();
return result === 'validation';
} catch (error) {
console.error('Redis configuration validation failed:', error);
return false;
}
}
createConfigurationBasedOnEnvironment(): RedisClientOptions {
const environment = process.env.NODE_ENV || 'development';
switch (environment) {
case 'production':
return this.createProductionRedisConfig();
case 'staging':
return this.createHighAvailabilityRedisConfig();
default:
return this.createDevelopmentRedisConfig();
}
}
}
// Usage in app module
@Module({
imports: [
HealthModule.forRootAsync({
imports: [ConfigModule],
useFactory: async (
configService: ConfigService,
healthConfig: HealthModuleConfigurationService
) => {
const redisConfig = healthConfig.createConfigurationBasedOnEnvironment();
// Validate configuration before using
const isValid = await healthConfig.validateRedisConfiguration(redisConfig);
if (!isValid) {
console.warn('Redis configuration validation failed, using fallback');
return healthConfig.createDevelopmentRedisConfig();
}
return redisConfig;
},
inject: [ConfigService, HealthModuleConfigurationService]
})
],
providers: [HealthModuleConfigurationService]
})
export class AppModule {}
Health Metrics Analytics Service
import { HealthService } from '@hsuite/health';
import { Injectable } from '@nestjs/common';
@Injectable()
export class HealthMetricsAnalyticsService {
private metricsHistory: Array<{ timestamp: Date; metrics: any }> = [];
private readonly maxHistorySize = 1000;
constructor(private healthService: HealthService) {
// Start collecting metrics every minute
this.startMetricsCollection();
}
async analyzeSystemPerformance(timeRange: { start: Date; end: Date }) {
try {
const relevantMetrics = this.metricsHistory.filter(
entry => entry.timestamp >= timeRange.start && entry.timestamp <= timeRange.end
);
if (relevantMetrics.length === 0) {
throw new Error('No metrics available for the specified time range');
}
const analysis = {
timeRange,
dataPoints: relevantMetrics.length,
cpu: this.analyzeCPUMetrics(relevantMetrics),
memory: this.analyzeMemoryMetrics(relevantMetrics),
storage: this.analyzeStorageMetrics(relevantMetrics),
network: this.analyzeNetworkMetrics(relevantMetrics),
trends: this.calculateTrends(relevantMetrics),
recommendations: []
};
// Generate recommendations
analysis.recommendations = this.generateRecommendations(analysis);
return analysis;
} catch (error) {
throw new Error(`Performance analysis failed: ${error.message}`);
}
}
private startMetricsCollection() {
setInterval(async () => {
try {
const metrics = await this.healthService.infos();
this.metricsHistory.push({
timestamp: new Date(),
metrics: metrics
});
// Maintain history size
if (this.metricsHistory.length > this.maxHistorySize) {
this.metricsHistory = this.metricsHistory.slice(-this.maxHistorySize);
}
} catch (error) {
console.error('Metrics collection failed:', error);
}
}, 60000); // Collect every minute
}
private analyzeCPUMetrics(metricsData: any[]) {
const cpuValues = metricsData.map(entry => entry.metrics.cpu.usage);
return {
average: this.calculateAverage(cpuValues),
min: Math.min(...cpuValues),
max: Math.max(...cpuValues),
median: this.calculateMedian(cpuValues),
percentile95: this.calculatePercentile(cpuValues, 95),
spikes: cpuValues.filter(value => value > 90).length,
trend: this.calculateTrend(cpuValues)
};
}
private analyzeMemoryMetrics(metricsData: any[]) {
const memoryValues = metricsData.map(entry => entry.metrics.memory.usedMemPercentage);
return {
average: this.calculateAverage(memoryValues),
min: Math.min(...memoryValues),
max: Math.max(...memoryValues),
median: this.calculateMedian(memoryValues),
percentile95: this.calculatePercentile(memoryValues, 95),
criticalEvents: memoryValues.filter(value => value > 95).length,
trend: this.calculateTrend(memoryValues)
};
}
private analyzeStorageMetrics(metricsData: any[]) {
const storageValues = metricsData.map(entry => parseFloat(entry.metrics.drive.usedPercentage));
return {
average: this.calculateAverage(storageValues),
min: Math.min(...storageValues),
max: Math.max(...storageValues),
median: this.calculateMedian(storageValues),
growth: this.calculateStorageGrowth(metricsData),
trend: this.calculateTrend(storageValues)
};
}
private analyzeNetworkMetrics(metricsData: any[]) {
const inputBytes = metricsData.map(entry => entry.metrics.network.inputBytes);
const outputBytes = metricsData.map(entry => entry.metrics.network.outputBytes);
return {
input: {
average: this.calculateAverage(inputBytes),
peak: Math.max(...inputBytes),
trend: this.calculateTrend(inputBytes)
},
output: {
average: this.calculateAverage(outputBytes),
peak: Math.max(...outputBytes),
trend: this.calculateTrend(outputBytes)
},
totalTraffic: {
average: this.calculateAverage(inputBytes.map((val, i) => val + outputBytes[i])),
peak: Math.max(...inputBytes.map((val, i) => val + outputBytes[i]))
}
};
}
private calculateTrends(metricsData: any[]) {
const timePoints = metricsData.map(entry => entry.timestamp.getTime());
const startTime = Math.min(...timePoints);
const endTime = Math.max(...timePoints);
const duration = endTime - startTime;
return {
analysisWindowMinutes: Math.round(duration / (1000 * 60)),
dataPointsCollected: metricsData.length,
averageInterval: Math.round(duration / metricsData.length / 1000), // seconds
systemStability: this.calculateSystemStability(metricsData)
};
}
private generateRecommendations(analysis: any): string[] {
const recommendations = [];
// CPU recommendations
if (analysis.cpu.average > 80) {
recommendations.push('Consider CPU optimization or scaling - average usage is high');
}
if (analysis.cpu.spikes > analysis.dataPoints * 0.1) {
recommendations.push('Investigate CPU spikes - frequent high usage detected');
}
// Memory recommendations
if (analysis.memory.average > 85) {
recommendations.push('Memory usage is consistently high - consider optimization');
}
if (analysis.memory.criticalEvents > 0) {
recommendations.push('Critical memory events detected - immediate attention required');
}
// Storage recommendations
if (analysis.storage.average > 85) {
recommendations.push('Storage usage is high - plan for cleanup or expansion');
}
if (analysis.storage.trend > 0.1) {
recommendations.push('Storage usage is growing rapidly - monitor closely');
}
// Network recommendations
if (analysis.network.totalTraffic.peak > analysis.network.totalTraffic.average * 10) {
recommendations.push('Network traffic spikes detected - investigate unusual activity');
}
return recommendations;
}
// Utility calculation methods
private calculateAverage(values: number[]): number {
return values.reduce((sum, val) => sum + val, 0) / values.length;
}
private calculateMedian(values: number[]): number {
const sorted = [...values].sort((a, b) => a - b);
const mid = Math.floor(sorted.length / 2);
return sorted.length % 2 === 0
? (sorted[mid - 1] + sorted[mid]) / 2
: sorted[mid];
}
private calculatePercentile(values: number[], percentile: number): number {
const sorted = [...values].sort((a, b) => a - b);
const index = Math.ceil((percentile / 100) * sorted.length) - 1;
return sorted[index];
}
private calculateTrend(values: number[]): number {
if (values.length < 2) return 0;
// Simple linear trend calculation
const firstHalf = values.slice(0, Math.floor(values.length / 2));
const secondHalf = values.slice(Math.floor(values.length / 2));
const firstAvg = this.calculateAverage(firstHalf);
const secondAvg = this.calculateAverage(secondHalf);
return (secondAvg - firstAvg) / firstAvg;
}
private calculateStorageGrowth(metricsData: any[]): number {
if (metricsData.length < 2) return 0;
const first = parseFloat(metricsData[0].metrics.drive.usedGb);
const last = parseFloat(metricsData[metricsData.length - 1].metrics.drive.usedGb);
return last - first; // GB growth
}
private calculateSystemStability(metricsData: any[]): string {
// Calculate coefficient of variation for CPU and memory
const cpuValues = metricsData.map(entry => entry.metrics.cpu.usage);
const memoryValues = metricsData.map(entry => entry.metrics.memory.usedMemPercentage);
const cpuStdDev = this.calculateStandardDeviation(cpuValues);
const memoryStdDev = this.calculateStandardDeviation(memoryValues);
const cpuCV = cpuStdDev / this.calculateAverage(cpuValues);
const memoryCV = memoryStdDev / this.calculateAverage(memoryValues);
const avgCV = (cpuCV + memoryCV) / 2;
if (avgCV < 0.1) return 'Excellent';
if (avgCV < 0.2) return 'Good';
if (avgCV < 0.3) return 'Fair';
return 'Poor';
}
private calculateStandardDeviation(values: number[]): number {
const avg = this.calculateAverage(values);
const squaredDiffs = values.map(val => Math.pow(val - avg, 2));
const avgSquaredDiff = this.calculateAverage(squaredDiffs);
return Math.sqrt(avgSquaredDiff);
}
getMetricsHistory() {
return {
totalDataPoints: this.metricsHistory.length,
oldestEntry: this.metricsHistory.length > 0 ? this.metricsHistory[0].timestamp : null,
newestEntry: this.metricsHistory.length > 0 ? this.metricsHistory[this.metricsHistory.length - 1].timestamp : null,
memoryUsage: `${this.metricsHistory.length} entries (max: ${this.maxHistorySize})`
};
}
}
🔗 Integration
Required Dependencies
{
"@nestjs/common": "^10.4.2",
"@nestjs/core": "^10.4.2",
"@hsuite/nestjs-swagger": "^1.0.3",
"@compodoc/compodoc": "^1.1.23"
}
Module Integration
import { Module } from '@nestjs/common';
import { HealthModule, HealthService, DagHealthIndicator } from '@hsuite/health';
@Module({
imports: [
HealthModule.forRoot({
socket: {
host: process.env.REDIS_HOST || 'localhost',
port: parseInt(process.env.REDIS_PORT || '6379')
},
password: process.env.REDIS_PASSWORD,
database: parseInt(process.env.REDIS_DATABASE || '0')
})
],
providers: [
SystemHealthMonitoringService,
DAGNetworkMonitoringService,
HealthModuleConfigurationService,
HealthMetricsAnalyticsService
],
exports: [
HealthService,
DagHealthIndicator,
SystemHealthMonitoringService,
DAGNetworkMonitoringService,
HealthMetricsAnalyticsService
]
})
export class HealthMonitoringModule {}
Documentation Generation
# Generate comprehensive documentation
npm run compodoc
# Generate documentation with coverage report
npm run compodoc:coverage
Environment Configuration
# Redis Configuration
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_PASSWORD=your-secure-password
REDIS_DATABASE=0
REDIS_TLS=false
# Health Monitoring Settings
HEALTH_CHECK_INTERVAL=30000
METRICS_COLLECTION_INTERVAL=60000
ALERT_THRESHOLD_CPU=90
ALERT_THRESHOLD_MEMORY=85
ALERT_THRESHOLD_STORAGE=90
# DAG Network Settings
DAG_NETWORK_MONITORING=true
DAG_STATUS_CACHE_TTL=1000
Integration with HSuite Ecosystem
// Complete integration with other HSuite modules
import { HealthModule, HealthService } from '@hsuite/health';
import { AuthModule } from '@hsuite/auth';
import { SmartNetworkModule } from '@hsuite/smart-network';
@Module({
imports: [
AuthModule,
SmartNetworkModule,
HealthModule.forRootAsync({
imports: [ConfigModule],
useFactory: async (configService: ConfigService) => ({
socket: {
host: configService.get('REDIS_HOST', 'localhost'),
port: configService.get('REDIS_PORT', 6379)
},
password: configService.get('REDIS_PASSWORD'),
database: configService.get('REDIS_DATABASE', 0)
}),
inject: [ConfigService]
})
]
})
export class HealthEcosystemModule {}
@Injectable()
export class IntegratedHealthService {
constructor(
private healthService: HealthService,
private authService: AuthService,
private networkService: SmartNetworkService
) {}
async getComprehensiveSystemStatus() {
// 1. Get basic health status
const health = await this.healthService.check();
// 2. Get system metrics
const metrics = await this.healthService.infos();
// 3. Check authentication service health
const authHealth = await this.authService.getHealthStatus();
// 4. Check network service health
const networkHealth = await this.networkService.getNetworkStatus();
return {
timestamp: new Date().toISOString(),
overallStatus: health.status,
components: {
system: health.details,
authentication: authHealth,
network: networkHealth
},
metrics: {
cpu: metrics.cpu,
memory: metrics.memory,
storage: metrics.drive,
network: metrics.network,
uptime: metrics.uptime
},
platform: {
os: metrics.platform,
architecture: metrics.arch,
release: metrics.release
}
};
}
}
Performance Considerations
📊 Caching Strategy
1-Second Caching - Health checks and metrics responses cached for optimal performance
Memory Management - Efficient resource metrics collection with OS utilities
Event-Driven Updates - DAG network status uses events to reduce polling overhead
🔧 Optimization Features
Multi-core CPU Calculations - Advanced CPU usage calculations for multi-core systems
Efficient Collection - Optimized OS utility integration for resource monitoring
Connection Pooling - Redis connection optimization for health checks
🛡️ Error Handling
Comprehensive Exception Management - Proper error types and detailed messages
Graceful Degradation - Fallback mechanisms for failed health checks
Service Isolation - Individual service failures don't affect overall monitoring
🏥 Enterprise Health Monitoring: Comprehensive system diagnostics with real-time resource tracking and specialized DAG network monitoring.
📊 Advanced Analytics: Performance analysis, trend calculation, and intelligent recommendations for system optimization.
🌐 DAG Network Integration: Event-driven network monitoring with threshold management and automated status updates.
Built with ❤️ by the HbarSuite Team Copyright © 2024 HbarSuite. All rights reserved.
Last updated