CrashLoopBackOff: Why Pods Crash and How to Fix Them

CrashLoopBackOff is the most common Kubernetes error. It's also the most frustrating. Your pod starts. It crashes. It restarts. It crashes again. It's a loop. Here's how to break it.

🎯 The Big Picture

Think of CrashLoopBackOff like a car that won't start. You turn the key. The engine tries to start. It fails. You try again. It fails again. The problem isn't that it's trying. The problem is why it's failing.

CrashLoopBackOff means your container is starting, crashing, and Kubernetes is restarting it. The loop continues until you fix the root cause.

What is CrashLoopBackOff?

CrashLoopBackOff is a pod state that means:

Container started
Container crashed (exited with error)
Kubernetes waited (backoff period)
Kubernetes restarted the container
Container crashed again
Repeat (with increasing backoff time)

The backoff time increases: 10s → 20s → 40s → 80s → 160s (max 300s)

Understanding the CrashLoopBackOff State

Pod states you'll see:

CrashLoopBackOff  ← Final state (waiting before retry)
    ↓
Error             ← Container crashed
    ↓
Running           ← Container started (briefly)
    ↓
CrashLoopBackOff  ← Back to waiting

The cycle continues until you fix the problem.

Step-by-Step Debugging Process

Step 1: Identify the Problem Pod

kubectl get pods

Look for:

Status: CrashLoopBackOff
Restarts: High number (10+, 100+)

Example output:

NAME                    READY   STATUS             RESTARTS   AGE
my-app-abc123           0/1     CrashLoopBackOff   15         5m

Step 2: Describe the Pod

kubectl describe pod <pod-name>

Look for:

Events: What happened?
Last State: Why did it exit?
Exit Code: What error code?
Reason: Why did it crash?

Key sections to check:

Events:
  Warning  BackOff    2m ago   kubelet  Back-off restarting failed container
  Normal   Pulled     2m ago   kubelet  Container image "my-app:v1" already present
  Warning  Failed     2m ago   kubelet  Error: container failed to start

Last State:     Terminated
  Reason:       Error
  Exit Code:    1
  Started:      ...
  Finished:     ...

Step 3: Check Container Logs

# Current container logs
kubectl logs <pod-name>

# Previous container instance (often more useful)
kubectl logs <pod-name> --previous

# Follow logs in real-time
kubectl logs <pod-name> -f

The logs tell you WHY it crashed.

Common log patterns:

Application errors
Configuration issues
Missing files
Permission problems
Connection failures

Step 4: Check Specific Container Logs

If you have multiple containers:

kubectl logs <pod-name> -c <container-name>
kubectl logs <pod-name> -c <container-name> --previous

Step 5: Execute into the Container (If It Stays Up Long Enough)

kubectl exec -it <pod-name> -- /bin/sh

If the container crashes too fast, this won't work. Use logs instead.

Common Causes and Solutions

Cause 1: Application Error

Symptoms:

Exit code: 1 (or non-zero)
Logs show application errors
Application code issue

Example logs:

Error: Cannot connect to database
at app.js:25

Solutions:

Fix the application code
Check application configuration
Verify dependencies are installed
Check environment variables

Debugging:

kubectl logs <pod-name> --previous
# Look for stack traces, error messages

Cause 2: Missing Configuration

Symptoms:

Exit code: 1
Logs: "Configuration file not found"
Missing ConfigMap or Secret

Example logs:

Error: Config file /app/config.json not found

Solutions:

Check if ConfigMap exists:

kubectl get configmap
kubectl describe configmap <configmap-name>

Check if Secret exists:

kubectl get secret
kubectl describe secret <secret-name>

Verify volume mounts:

kubectl describe pod <pod-name>
# Check Volumes and Volume Mounts sections

Fix the deployment:
- Add missing ConfigMap/Secret
- Fix volume mount paths
- Verify mount points

Cause 3: Wrong Command or Entrypoint

Symptoms:

Exit code: 127 (command not found)
Logs: "command not found"
Wrong command in container spec

Example logs:

/bin/sh: 1: mycommand: not found

Solutions:

Check the command:

kubectl get pod <pod-name> -o yaml
# Look for command, args, or container command

Verify the command exists in the image:
```
docker run <image> <command>
```
Fix the deployment:
- Correct the command
- Use full path to executable
- Check entrypoint in Dockerfile

Cause 4: Resource Limits Too Low

Symptoms:

Exit code: 137 (killed by OOM)
Logs: "Out of memory"
Pod evicted

Example logs:

Killed

Solutions:

Check resource limits:

kubectl describe pod <pod-name>
# Look for Limits and Requests

Check node resources:
```
kubectl top nodes
kubectl describe node
```

Increase limits:

resources:
  requests:
    memory: "256Mi"
    cpu: "250m"
  limits:
    memory: "512Mi"  # Increase this
    cpu: "500m"

Cause 5: Health Check Failing

Symptoms:

Exit code: varies
Liveness/readiness probe failing
Pod marked as unhealthy

Solutions:

Check probe configuration:

kubectl describe pod <pod-name>
# Look for Liveness and Readiness

Test the probe endpoint:

kubectl exec -it <pod-name> -- curl localhost:<probe-port><probe-path>

Fix the probe:
- Adjust timeout
- Fix endpoint path
- Increase initial delay
- Check probe command

Cause 6: Permission Issues

Symptoms:

Exit code: 1
Logs: "Permission denied"
File system permissions

Example logs:

Permission denied: /app/data/file.txt

Solutions:

Check file permissions:

kubectl exec -it <pod-name> -- ls -la /app

Fix in Dockerfile:

RUN chmod +x /app/script.sh
RUN chown -R appuser:appuser /app

Use securityContext:

securityContext:
  runAsUser: 1000
  runAsGroup: 1000
  fsGroup: 1000

Real-World Example: The Database Connection

Problem: Pod in CrashLoopBackOff. Logs show:

Error: connect ECONNREFUSED 127.0.0.1:5432

Debugging:

Checked logs: Database connection refused
Checked ConfigMap: Database host was localhost (wrong!)
Fixed ConfigMap: Changed to service name postgres-service
Restarted deployment: Pod started successfully

Solution:

# ConfigMap
data:
  DB_HOST: "postgres-service"  # Not localhost!
  DB_PORT: "5432"

Hands-On Exercise: Debug a CrashLoopBackOff

Create a pod that will crash:

apiVersion: v1
kind: Pod
metadata:
  name: crash-pod
spec:
  containers:
  - name: app
    image: busybox
    command: ["/bin/sh", "-c", "exit 1"]  # Will crash!

Apply it:

kubectl apply -f crash-pod.yaml

Debug it:

Check pod status: kubectl get pods
Describe pod: kubectl describe pod crash-pod
Check logs: kubectl logs crash-pod
Fix the issue (change command to something that works)

This is how you learn. Break things. Fix them.

My Take: CrashLoopBackOff Debugging

CrashLoopBackOff used to frustrate me. I'd see it and panic. I'd try random fixes.

Then I learned the systematic approach:

Describe the pod - See what happened
Check logs - See why it crashed
Identify the cause - Application, config, resources?
Fix the root cause - Not the symptom

Now I fix CrashLoopBackOff in minutes, not hours.

Memory Tip: The Car Analogy

CrashLoopBackOff is like a car that won't start:

You turn the key (container starts)
Engine tries (application runs)
Engine fails (application crashes)
You try again (Kubernetes restarts)
Engine fails again (crash loop)

The problem isn't the trying. It's why it's failing. Find the root cause.

Common Mistakes

Not checking logs: Logs tell you why it crashed
Not using --previous: Current logs might be empty
Fixing symptoms, not causes: Address the root issue
Not checking events: Events show what Kubernetes saw
Panicking: Stay calm, be systematic

Key Takeaways

CrashLoopBackOff means container is crashing - Find why
Check logs first - They tell you the problem
Use --previous flag - Previous instance logs are often more useful
Describe the pod - Events show what happened
Fix the root cause - Not just restart

What's Next?

Now that you understand CrashLoopBackOff, let's tackle another common issue. Next: Pod Troubleshooting: ImagePullBackOff.

Remember: CrashLoopBackOff isn't the problem. It's the symptom. The logs tell you the real problem. Always check the logs.

🎯 The Big Picture​

What is CrashLoopBackOff?​

Understanding the CrashLoopBackOff State​

Step-by-Step Debugging Process​

Step 1: Identify the Problem Pod​

Step 2: Describe the Pod​

Step 3: Check Container Logs​

Step 4: Check Specific Container Logs​

Step 5: Execute into the Container (If It Stays Up Long Enough)​

Common Causes and Solutions​

Cause 1: Application Error​

Cause 2: Missing Configuration​

Cause 3: Wrong Command or Entrypoint​

Cause 4: Resource Limits Too Low​

Cause 5: Health Check Failing​

Cause 6: Permission Issues​

Real-World Example: The Database Connection​

Hands-On Exercise: Debug a CrashLoopBackOff​

My Take: CrashLoopBackOff Debugging​

Memory Tip: The Car Analogy​

Common Mistakes​

Key Takeaways​

What's Next?​

🎯 The Big Picture

What is CrashLoopBackOff?

Understanding the CrashLoopBackOff State

Step-by-Step Debugging Process

Step 1: Identify the Problem Pod

Step 2: Describe the Pod

Step 3: Check Container Logs

Step 4: Check Specific Container Logs

Step 5: Execute into the Container (If It Stays Up Long Enough)

Common Causes and Solutions

Cause 1: Application Error

Cause 2: Missing Configuration

Cause 3: Wrong Command or Entrypoint

Cause 4: Resource Limits Too Low

Cause 5: Health Check Failing

Cause 6: Permission Issues

Real-World Example: The Database Connection

Hands-On Exercise: Debug a CrashLoopBackOff

My Take: CrashLoopBackOff Debugging

Memory Tip: The Car Analogy

Common Mistakes

Key Takeaways

What's Next?