Hosts must maintain high reliability and schedule maintenance in advance to minimize disruption to users.
Scheduling maintenance
Schedule maintenance at least one week in advance. Users receive email reminders about upcoming maintenance on their active Pods.
Contact Runpod on Discord or Slack before:
- Scheduling maintenance on more than a few machines.
- Performing operations that could affect user data.
Err on the side of caution and overcommunicate.
Maintenance rules
| Rule | Details |
|---|
| Advance notice | Schedule at least 1 week in advance. |
| Auto-unlisting | Machines are automatically unlisted 4 days before scheduled maintenance. |
| Active users | You can bring down machines with active users only during a maintenance window. |
| Excessive maintenance | Too much maintenance results in reliability penalties. |
Immediate maintenance
Immediate maintenance is only for quick, necessary repairs on unrented servers.
Even unrented servers may contain user data. Do not perform operations that could cause data loss during immediate maintenance.
Reliability requirements
Runpod requires 99.99% uptime. Reliability is calculated on a 30-day rolling window.
How reliability is calculated
reliability = (total_minutes - downtime_minutes + buffer) / total_minutes
For example, 30 minutes of network downtime in a month:
(43200 - 30 + 10) / 43200 = 99.95%
The 10-minute buffer accounts for brief interruptions from agent upgrades.
Reliability impacts
| Scenario | Impact |
|---|
| Scheduled maintenance | No reliability impact. |
| Unplanned downtime | Decreases reliability score. |
| Unlisted machines | Downtime still affects reliability. |
Machines with less than 98% reliability are automatically removed from the available GPU pool. Only users who already have data on the machine can still access it.
It takes a full month with no further downtime to regenerate back to 100% reliability.