Observability & Reliability
AdvancedBuild observable systems and practice Site Reliability Engineering. Master Prometheus, Grafana, OpenTelemetry, and incident management.
Courses in this path
Recommended order shown. Learn in any order you prefer.
Observability & Reliability
Observability Fundamentals
Understand the principles of observability — metrics, logs, and traces — and why they matter for running reliable systems.
Observability & Reliability
Prometheus
Master Prometheus for metrics collection, PromQL for querying, and alerting rules for proactive incident detection.
Observability & Reliability
Grafana
Build beautiful, informative dashboards with Grafana. Connect data sources, design panels, and create alerts.
Observability & Reliability
SRE Principles
Learn Site Reliability Engineering principles — SLOs, error budgets, incident management, and the practices that keep systems reliable.
Capstone Project
Build a complete observability stack with Prometheus, Grafana, OpenTelemetry, and Loki. Create SLOs, error budgets, and practice incident response.
Ready to start Observability & Reliability?
Free and open-source. Start with any course and learn at your own pace.