OC-201c · Module 3
Preventive Maintenance
3 min read
The best incident is the one that never happens. Preventive maintenance is the practice of scheduled, proactive work that keeps the system healthy: log rotation, credential renewal, dependency updates, performance baseline reviews. It is not glamorous work. It is the work that prevents 3 AM pages. The operators who complain about maintenance workload are the same ones who get paged on weekends because a full disk crashed the database.
Build a maintenance calendar. Weekly: review logs for warnings that have not escalated to errors yet. Rotate logs if they exceed size thresholds. Check disk space. Monthly: update Node.js and npm dependencies. Review API provider changelogs for deprecated endpoints. Renew any expiring API keys or OAuth tokens. Quarterly: test the full restore procedure. Review performance baselines and recalibrate alert thresholds. Audit the .env file against actual requirements and remove unused credentials. This calendar is not overhead — it is the operational discipline that keeps a system running for years instead of months.
- Weekly Review logs for new warning patterns. Rotate logs exceeding 100 MB. Check disk space on the OpenClaw machine. Verify all scheduled cron jobs fired within their expected windows.
- Monthly Update Node.js and npm packages. Check API provider changelogs for breaking changes. Renew any API keys or OAuth tokens expiring in the next 60 days. Review cost metrics and investigate any upward trends.
- Quarterly Perform a full restore test: rebuild the system from git backup and database backup on a clean machine. Review performance baselines and adjust alert thresholds. Audit environment variables and remove unused credentials. Update the restore documentation if any steps have changed.