Pod Troubleshooting: When Hotel Rooms Have Problems

Pods fail. Rooms have problems. That's reality. But we can fix them. That's troubleshooting.

🎯 The Big Picture

Think of pod troubleshooting like fixing hotel room problems. Room won't open (pod won't start). Guest can't check in (container fails). Room service broken (application error). That's pod troubleshooting.

Pod troubleshooting is systematic. Check status. Check logs. Check events. Find the problem. Fix it.

The Hotel Room Problem Analogy

Think of pod problems like hotel room problems:

Pod won't start: Room won't open Container crashes: Guest can't check in Application error: Room service broken Resource issues: Room too small Network issues: Phone doesn't work

Once you see it this way, troubleshooting makes perfect sense.

Systematic Troubleshooting Approach

Step-by-step process:

Step 1: Check Pod Status

kubectl get pods

Step 2: Describe Pod

kubectl describe pod <pod-name>

Step 3: Check Logs

kubectl logs <pod-name>

Step 4: Check Events

kubectl get events --field-selector involvedObject.name=<pod-name>

Step 5: Debug Container

kubectl exec -it <pod-name> -- /bin/sh

Think of it as: Systematic inspection. Check everything. Find problem.

Common Pod Problems

Problem 1: CrashLoopBackOff

What it is:

Pod starts
Container crashes
Restarts
Crashes again
Loop continues

Think of it as: Guest checks in. Immediately leaves. Checks in again. Leaves again. Loop.

Troubleshooting:

# Check pod status
kubectl get pods

# Check logs
kubectl logs <pod-name>

# Check previous container logs
kubectl logs <pod-name> --previous

# Describe pod
kubectl describe pod <pod-name>

Common causes:

Application error
Missing dependencies
Configuration error
Resource limits

Solution:

Fix application code
Add missing dependencies
Fix configuration
Adjust resource limits

Problem 2: ImagePullBackOff

What it is:

Can't pull image
Pod can't start
Image not found or access denied

Think of it as: Guest reservation invalid. Can't find guest. Can't check in.

Troubleshooting:

# Check pod status
kubectl describe pod <pod-name>

# Check image name
kubectl get pod <pod-name> -o yaml | grep image

# Try pulling manually
docker pull <image-name>

Common causes:

Image doesn't exist
Wrong image name
Private registry authentication
Network issues

Solution:

Verify image name
Check image exists
Configure image pull secrets
Fix network

Problem 3: Pending

What it is:

Pod created
Not scheduled
Waiting for resources

Think of it as: Room reserved. But not assigned. Waiting for floor.

Troubleshooting:

# Check pod status
kubectl describe pod <pod-name>

# Check node resources
kubectl describe nodes

# Check for taints
kubectl describe nodes | grep Taint

Common causes:

Insufficient resources
Node taints
No suitable nodes
Resource requests too high

Solution:

Add more nodes
Remove taints
Reduce resource requests
Check node capacity

Problem 4: ErrImagePull

What it is:

Error pulling image
Authentication failed
Network error

Think of it as: Can't get guest information. Access denied. Network problem.

Troubleshooting:

# Check pod events
kubectl describe pod <pod-name>

# Check image pull secrets
kubectl get secrets

# Verify registry access
docker login <registry>

Common causes:

Authentication failed
Network issues
Registry unavailable
Wrong credentials

Solution:

Create image pull secret
Fix network
Verify registry
Update credentials

Real-World Example: Troubleshooting CrashLoopBackOff

Scenario: Pod in CrashLoopBackOff

Step 1: Check status

kubectl get pods

Output:

NAME    READY   STATUS             RESTARTS   AGE
my-pod  0/1     CrashLoopBackOff  5          2m

Step 2: Check logs

kubectl logs my-pod

Output:

Error: Cannot connect to database

Step 3: Check previous logs

kubectl logs my-pod --previous

Step 4: Describe pod

kubectl describe pod my-pod

Output shows:

Events: Container failed
Last state: Terminated
Reason: Error

Step 5: Check configuration

kubectl get pod my-pod -o yaml | grep -A 10 env

Found: Missing DATABASE_URL environment variable

Step 6: Fix

# Add environment variable
env:
- name: DATABASE_URL
  value: "postgres://db:5432/mydb"

Step 7: Verify

kubectl get pods
# Pod now Running

That's troubleshooting. Systematic. Effective.

Troubleshooting Tools

kubectl describe

Best friend for troubleshooting:

kubectl describe pod <pod-name>

Shows:

Pod status
Container status
Events
Conditions
Resource usage

Think of it as: Complete room inspection. Everything visible.

kubectl logs

View container logs:

# Current logs
kubectl logs <pod-name>

# Previous container logs
kubectl logs <pod-name> --previous

# Follow logs
kubectl logs -f <pod-name>

# Specific container
kubectl logs <pod-name> -c <container-name>

Think of it as: Room service log. See what happened.

kubectl exec

Debug inside container:

# Interactive shell
kubectl exec -it <pod-name> -- /bin/sh

# Run command
kubectl exec <pod-name> -- ps aux

# Check environment
kubectl exec <pod-name> -- env

Think of it as: Enter room. See what's inside.

kubectl get events

View cluster events:

# All events
kubectl get events

# Pod-specific events
kubectl get events --field-selector involvedObject.name=<pod-name>

# Recent events
kubectl get events --sort-by=.metadata.creationTimestamp

Think of it as: Hotel event log. See what happened.

My Take: Troubleshooting Strategy

Here's what I do:

When pod fails:

Check status (kubectl get pods)
Describe pod (kubectl describe pod)
Check logs (kubectl logs)
Check events (kubectl get events)
Debug container (kubectl exec)
Fix and verify

The key: Systematic approach. Check everything. Find root cause. Fix it.

Memory Tip: The Hotel Room Problem Analogy

Pod troubleshooting = Fixing hotel room problems

CrashLoopBackOff: Guest can't stay ImagePullBackOff: Can't find guest Pending: Room not assigned ErrImagePull: Can't get guest info

Once you see it this way, troubleshooting makes perfect sense.

Common Mistakes

Not checking logs: Missing obvious errors
Not describing pod: Missing events
Not checking previous logs: Missing crash reason
Giving up too soon: Problems are fixable
Not systematic: Random troubleshooting

Key Takeaways

Troubleshooting is systematic - Follow a process
Check status first - Know the state
Describe pod - See events and conditions
Check logs - See what happened
Debug inside container - See what's wrong
Most problems are fixable - Don't give up

What's Next?

Now that you understand pod troubleshooting, you're ready for more advanced troubleshooting scenarios. Next: Troubleshooting Methodology.

Remember: Pod troubleshooting is like fixing hotel room problems. Systematic. Check status. Check logs. Check events. Find problem. Fix it. Most problems are fixable.

🎯 The Big Picture​

The Hotel Room Problem Analogy​

Systematic Troubleshooting Approach​

Common Pod Problems​

Problem 1: CrashLoopBackOff​

Problem 2: ImagePullBackOff​

Problem 3: Pending​

Problem 4: ErrImagePull​

Real-World Example: Troubleshooting CrashLoopBackOff​

Troubleshooting Tools​

kubectl describe​

kubectl logs​

kubectl exec​

kubectl get events​

My Take: Troubleshooting Strategy​

Memory Tip: The Hotel Room Problem Analogy​

Common Mistakes​

Key Takeaways​

What's Next?​

🎯 The Big Picture

The Hotel Room Problem Analogy

Systematic Troubleshooting Approach

Common Pod Problems

Problem 1: CrashLoopBackOff

Problem 2: ImagePullBackOff

Problem 3: Pending

Problem 4: ErrImagePull

Real-World Example: Troubleshooting CrashLoopBackOff

Troubleshooting Tools

kubectl describe

kubectl logs

kubectl exec

kubectl get events

My Take: Troubleshooting Strategy

Memory Tip: The Hotel Room Problem Analogy

Common Mistakes

Key Takeaways

What's Next?