CrashLoopBackOff: Why Pods Crash and How to Fix Them
CrashLoopBackOff is the most common Kubernetes error. It's also the most frustrating. Your pod starts. It crashes. It restarts. It crashes again. It's a loop. Here's how to break it.
🎯 The Big Picture
Think of CrashLoopBackOff like a car that won't start. You turn the key. The engine tries to start. It fails. You try again. It fails again. The problem isn't that it's trying. The problem is why it's failing.
CrashLoopBackOff means your container is starting, crashing, and Kubernetes is restarting it. The loop continues until you fix the root cause.
What is CrashLoopBackOff?
CrashLoopBackOff is a pod state that means:
- Container started
- Container crashed (exited with error)
- Kubernetes waited (backoff period)
- Kubernetes restarted the container
- Container crashed again
- Repeat (with increasing backoff time)
The backoff time increases: 10s → 20s → 40s → 80s → 160s (max 300s)
Understanding the CrashLoopBackOff State
Pod states you'll see:
CrashLoopBackOff ← Final state (waiting before retry)
↓
Error ← Container crashed
↓
Running ← Container started (briefly)
↓
CrashLoopBackOff ← Back to waiting
The cycle continues until you fix the problem.
Step-by-Step Debugging Process
Step 1: Identify the Problem Pod
kubectl get pods
Look for:
- Status:
CrashLoopBackOff - Restarts: High number (10+, 100+)
Example output:
NAME READY STATUS RESTARTS AGE
my-app-abc123 0/1 CrashLoopBackOff 15 5m
Step 2: Describe the Pod
kubectl describe pod <pod-name>
Look for:
- Events: What happened?
- Last State: Why did it exit?
- Exit Code: What error code?
- Reason: Why did it crash?
Key sections to check:
Events:
Warning BackOff 2m ago kubelet Back-off restarting failed container
Normal Pulled 2m ago kubelet Container image "my-app:v1" already present
Warning Failed 2m ago kubelet Error: container failed to start
Last State: Terminated
Reason: Error
Exit Code: 1
Started: ...
Finished: ...
Step 3: Check Container Logs
# Current container logs
kubectl logs <pod-name>
# Previous container instance (often more useful)
kubectl logs <pod-name> --previous
# Follow logs in real-time
kubectl logs <pod-name> -f
The logs tell you WHY it crashed.
Common log patterns:
- Application errors
- Configuration issues
- Missing files
- Permission problems
- Connection failures
Step 4: Check Specific Container Logs
If you have multiple containers:
kubectl logs <pod-name> -c <container-name>
kubectl logs <pod-name> -c <container-name> --previous
Step 5: Execute into the Container (If It Stays Up Long Enough)
kubectl exec -it <pod-name> -- /bin/sh
If the container crashes too fast, this won't work. Use logs instead.
Common Causes and Solutions
Cause 1: Application Error
Symptoms:
- Exit code: 1 (or non-zero)
- Logs show application errors
- Application code issue
Example logs:
Error: Cannot connect to database
at app.js:25
Solutions:
- Fix the application code
- Check application configuration
- Verify dependencies are installed
- Check environment variables
Debugging:
kubectl logs <pod-name> --previous
# Look for stack traces, error messages
Cause 2: Missing Configuration
Symptoms:
- Exit code: 1
- Logs: "Configuration file not found"
- Missing ConfigMap or Secret
Example logs:
Error: Config file /app/config.json not found
Solutions:
-
Check if ConfigMap exists:
kubectl get configmap
kubectl describe configmap <configmap-name> -
Check if Secret exists:
kubectl get secret
kubectl describe secret <secret-name> -
Verify volume mounts:
kubectl describe pod <pod-name>
# Check Volumes and Volume Mounts sections -
Fix the deployment:
- Add missing ConfigMap/Secret
- Fix volume mount paths
- Verify mount points
Cause 3: Wrong Command or Entrypoint
Symptoms:
- Exit code: 127 (command not found)
- Logs: "command not found"
- Wrong command in container spec
Example logs:
/bin/sh: 1: mycommand: not found
Solutions:
-
Check the command:
kubectl get pod <pod-name> -o yaml
# Look for command, args, or container command -
Verify the command exists in the image:
docker run <image> <command> -
Fix the deployment:
- Correct the command
- Use full path to executable
- Check entrypoint in Dockerfile
Cause 4: Resource Limits Too Low
Symptoms:
- Exit code: 137 (killed by OOM)
- Logs: "Out of memory"
- Pod evicted
Example logs:
Killed
Solutions:
-
Check resource limits:
kubectl describe pod <pod-name>
# Look for Limits and Requests -
Check node resources:
kubectl top nodes
kubectl describe node -
Increase limits:
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi" # Increase this
cpu: "500m"
Cause 5: Health Check Failing
Symptoms:
- Exit code: varies
- Liveness/readiness probe failing
- Pod marked as unhealthy
Solutions:
-
Check probe configuration:
kubectl describe pod <pod-name>
# Look for Liveness and Readiness -
Test the probe endpoint:
kubectl exec -it <pod-name> -- curl localhost:<probe-port><probe-path> -
Fix the probe:
- Adjust timeout
- Fix endpoint path
- Increase initial delay
- Check probe command
Cause 6: Permission Issues
Symptoms:
- Exit code: 1
- Logs: "Permission denied"
- File system permissions
Example logs:
Permission denied: /app/data/file.txt
Solutions:
-
Check file permissions:
kubectl exec -it <pod-name> -- ls -la /app -
Fix in Dockerfile:
RUN chmod +x /app/script.sh
RUN chown -R appuser:appuser /app -
Use securityContext:
securityContext:
runAsUser: 1000
runAsGroup: 1000
fsGroup: 1000
Real-World Example: The Database Connection
Problem: Pod in CrashLoopBackOff. Logs show:
Error: connect ECONNREFUSED 127.0.0.1:5432
Debugging:
- Checked logs: Database connection refused
- Checked ConfigMap: Database host was
localhost(wrong!) - Fixed ConfigMap: Changed to service name
postgres-service - Restarted deployment: Pod started successfully
Solution:
# ConfigMap
data:
DB_HOST: "postgres-service" # Not localhost!
DB_PORT: "5432"
Hands-On Exercise: Debug a CrashLoopBackOff
Create a pod that will crash:
apiVersion: v1
kind: Pod
metadata:
name: crash-pod
spec:
containers:
- name: app
image: busybox
command: ["/bin/sh", "-c", "exit 1"] # Will crash!
Apply it:
kubectl apply -f crash-pod.yaml
Debug it:
- Check pod status:
kubectl get pods - Describe pod:
kubectl describe pod crash-pod - Check logs:
kubectl logs crash-pod - Fix the issue (change command to something that works)
This is how you learn. Break things. Fix them.
My Take: CrashLoopBackOff Debugging
CrashLoopBackOff used to frustrate me. I'd see it and panic. I'd try random fixes.
Then I learned the systematic approach:
- Describe the pod - See what happened
- Check logs - See why it crashed
- Identify the cause - Application, config, resources?
- Fix the root cause - Not the symptom
Now I fix CrashLoopBackOff in minutes, not hours.
Memory Tip: The Car Analogy
CrashLoopBackOff is like a car that won't start:
- You turn the key (container starts)
- Engine tries (application runs)
- Engine fails (application crashes)
- You try again (Kubernetes restarts)
- Engine fails again (crash loop)
The problem isn't the trying. It's why it's failing. Find the root cause.
Common Mistakes
- Not checking logs: Logs tell you why it crashed
- Not using --previous: Current logs might be empty
- Fixing symptoms, not causes: Address the root issue
- Not checking events: Events show what Kubernetes saw
- Panicking: Stay calm, be systematic
Key Takeaways
- CrashLoopBackOff means container is crashing - Find why
- Check logs first - They tell you the problem
- Use --previous flag - Previous instance logs are often more useful
- Describe the pod - Events show what happened
- Fix the root cause - Not just restart
What's Next?
Now that you understand CrashLoopBackOff, let's tackle another common issue. Next: Pod Troubleshooting: ImagePullBackOff.
Remember: CrashLoopBackOff isn't the problem. It's the symptom. The logs tell you the real problem. Always check the logs.