|
| Overall system health management include: |
- Grid level usage metrics
- System monitoring and event reporting
- Application monitoring and event reporting
- Help desk utilization reports
|
| Change Management and Reports: |
- If required, we will track changes to the systems and report changes on a regular basis.
- Monthly reports for production systems.
- Reports will include system utilization (cpu, network, storage)
- Report when utilization is approaching predefined system limits.
|
SLA Based Performance Monitoring
Right Servers can actively monitor all systems 24 hours per day, 7 days per week to insure availability of key system resources. This system polling is performed via SNMP (Simple Network Management Protocol) every 5 minutes and includes CPU utilization, disk utilization on each volume, and memory usage.
If the utilization of any key resource peaks, a Help Desk ticket is created for action. Right Servers performs initial analysis of all such events and will not generate alerts during periods of known high usage such as nightly backups or scheduled re-indexing jobs.
|
| Otherwise, Right Servers will contact the Client based on the following thresholds: |
- 85% Utilization - Warning Alert - Email notification of event
- 90% Utilization - Critical Alert - Escalation of event per customer contact procedures
|
SLA Based Performance Statistic Collection
Right Servers can collect additional statistics via SNMP for purposes of performance analysis. This information is kept for a predetermined amount of time and can be made available to the Client in various ways, in both graphical and tabular form. In additional to CPU and Disk utilization (used for availability management), the following information is collected if available:
|
- User count
- Process count
- RAM usage and Swap usage
- Mount point usage
- Network Interface usage and errors
- Resource Monitoring
- CPU Percent Idle
- RAM Percent Unused
- Disk Space Percent Unused
- Service Monitoring
|
| We can also monitor the up/down status of the following services by default (as appropriate): |
- SNMP service
- SSH (port 22)
- Terminal Services (port 3389)
- HTTP (port 80)
- Network interface status
|
Most of these alerts are critical if they're down longer than a few seconds. We can also provide customized level monitoring per customer.
|