Operations¶
Operational guides and runbooks for managing CANFAR platform infrastructure, releases, and deployments.
Overview¶
This section provides comprehensive documentation for platform operators managing CANFAR deployments. Whether you're releasing new versions, troubleshooting issues, or maintaining infrastructure, these guides will help you follow best practices and ensure reliable operations.
Key Responsibilities¶
Release Management¶
- Coordinate releases using Release Please automation
- Review and merge release PRs with proper approvals
- Monitor post-release workflows and verify deployments
- Manage hotfixes and rollback procedures when needed
Infrastructure Operations¶
- Deploy Helm charts and configuration overlays
- Manage environment-specific configurations (staging, production)
- Monitor platform health and respond to incidents
- Maintain secrets and access controls
CI/CD Maintenance¶
- Keep GitHub Actions workflows up to date
- Monitor workflow runs and troubleshoot failures
- Update documentation and configuration files
Tools & Technologies¶
The CANFAR deployment infrastructure relies on:
- Kubernetes - Container orchestration platform
- Helm - Package manager for Kubernetes applications
- GitHub Actions - CI/CD automation and workflows
- Release Please - Automated release management and changelog generation
- MkDocs Material - Documentation site generation
- uv - Python package and dependency management
Getting Help¶
For operational support or questions:
- Check the relevant runbook in this documentation
- Review recent GitHub Actions workflow runs for error logs
- Contact the CADC operations team
- Consult the main CANFAR documentation
Best Practices¶
- Always follow the release checklist - Skip no steps to ensure consistent, reliable releases
- Test in staging first - Validate changes in staging before promoting to production
- Monitor post-deployment - Watch metrics and logs after every deployment
- Document incidents - Capture lessons learned and update runbooks
- Keep secrets secure - Rotate credentials regularly and limit access
- Maintain audit trails - All changes go through pull requests with proper reviews