Burn testing

Run a GPU burn test

Stop the Runpod agent

Stop the Runpod agent to prevent the machine from accepting jobs during testing:

sudo systemctl stop runpod

Run the burn test

Run a GPU burn test for at least 48 hours (172800 seconds) to verify stability under load:

docker run --gpus all --rm jorghi21/gpu-burn-test 172800

Monitor GPU temperatures and watch for any errors during the test.

Verify CPU, memory, and storage

Use stress-ng to test other system components:

# Install stress-ng
sudo apt install stress-ng

# Run a 1-hour stress test on CPU, memory, and disk
stress-ng --cpu 0 --vm 2 --hdd 2 --timeout 3600

Restart the Runpod agent

After all tests pass, start the Runpod agent again:

sudo systemctl start runpod

Self-rent your machine

Go to your machine dashboard and self-rent the machine. Test it with popular templates to verify everything works correctly with real workloads.

What to watch for

GPU temperature: Sustained temperatures above 85°C may indicate cooling issues.

Memory errors: Any GPU memory errors during the burn test require investigation.

System stability: Crashes, freezes, or unexpected reboots indicate hardware problems.

Performance degradation: Significant performance drops over time may indicate thermal throttling.

Run a GPU burn test