Deployment Troubleshooting: When Room Management Has Problems
Deployments fail. Room management has problems. That's reality. But we can fix them. That's troubleshooting.
🎯 The Big Picture​
Think of deployment troubleshooting like fixing hotel room management problems. Rooms not created (pods not created). Update failed (renovation failed). Rooms crashing (pods crashing). That's deployment troubleshooting.
Deployment troubleshooting is systematic. Check deployment. Check ReplicaSet. Check pods. Find the problem. Fix it.
The Hotel Room Management Problem Analogy​
Think of deployment problems like room management problems:
Pods not created: Rooms not created Update stuck: Renovation stuck Pods crashing: Rooms have problems ReplicaSet issues: Floor manager problems Resource issues: Not enough capacity
Once you see it this way, troubleshooting makes perfect sense.
Systematic Troubleshooting Approach​
Step-by-step process:
Step 1: Check Deployment Status
kubectl get deployment hotel-management
Step 2: Describe Deployment
kubectl describe deployment hotel-management
Step 3: Check ReplicaSet
kubectl get replicaset -l app=hotel-app
kubectl describe replicaset <replicaset-name>
Step 4: Check Pods
kubectl get pods -l app=hotel-app
kubectl describe pod <pod-name>
Step 5: Check Logs
kubectl logs <pod-name>
kubectl logs <pod-name> --previous
Step 6: Check Events
kubectl get events --sort-by=.metadata.creationTimestamp
Think of it as: Systematic inspection. Check everything. Find problem.
Common Deployment Problems​
Problem 1: Deployment Not Creating Pods​
Symptoms:
- Deployment created
- No pods running
- Desired > Available
Troubleshooting:
# Check deployment
kubectl describe deployment hotel-management
# Check ReplicaSet
kubectl get replicaset -l app=hotel-app
# Check events
kubectl get events
Common causes:
- Selector mismatch
- Resource constraints
- Node taints
- Image pull issues
Solution:
- Fix selector
- Add resources
- Remove taints
- Fix image
Think of it as: Room management created. But no rooms. Check why.
Problem 2: Update Stuck​
Symptoms:
- Update in progress
- Not completing
- Pods not updating
Troubleshooting:
# Check rollout status
kubectl rollout status deployment/hotel-management
# Check deployment
kubectl describe deployment hotel-management
# Check pods
kubectl get pods -l app=hotel-app
Common causes:
- Health checks failing
- Resource constraints
- Image pull issues
- Application errors
Solution:
- Fix health checks
- Add resources
- Fix image
- Fix application
Think of it as: Renovation stuck. Not completing. Check why.
Problem 3: Pods Crashing​
Symptoms:
- Pods created
- Immediately crash
- CrashLoopBackOff
Troubleshooting:
# Check pods
kubectl get pods -l app=hotel-app
# Check logs
kubectl logs <pod-name>
kubectl logs <pod-name> --previous
# Describe pod
kubectl describe pod <pod-name>
Common causes:
- Application errors
- Configuration errors
- Missing dependencies
- Resource limits
Solution:
- Fix application
- Fix configuration
- Add dependencies
- Adjust limits
Think of it as: Rooms created. But guests can't stay. Check why.
Problem 4: ReplicaSet Issues​
Symptoms:
- ReplicaSet not creating pods
- Pods not matching selector
- ReplicaSet scaling issues
Troubleshooting:
# Check ReplicaSet
kubectl describe replicaset <replicaset-name>
# Check selector
kubectl get replicaset <replicaset-name> -o yaml | grep selector
# Check pods
kubectl get pods --show-labels
Common causes:
- Selector mismatch
- Template issues
- Resource constraints
Solution:
- Fix selector
- Fix template
- Add resources
Think of it as: Floor manager problems. Can't manage rooms. Check why.
Real-World Example: Complete Troubleshooting​
Scenario: Deployment update stuck
Step 1: Check deployment:
kubectl get deployment hotel-management
Output:
NAME READY UP-TO-DATE AVAILABLE AGE
hotel-management 3/5 2 3 10m
Problem: Update stuck. Only 2 of 5 updated.
Step 2: Check rollout status:
kubectl rollout status deployment/hotel-management
Output:
Waiting for rollout to finish: 2 of 5 new replicas have been updated...
Step 3: Check new pods:
kubectl get pods -l app=hotel-app
Output:
NAME READY STATUS RESTARTS AGE
hotel-management-xxxxx (old) 1/1 Running 0 10m
hotel-management-yyyyy (old) 1/1 Running 0 10m
hotel-management-zzzzz (old) 1/1 Running 0 10m
hotel-management-aaaaa (new) 0/1 CrashLoopBackOff 3 2m
hotel-management-bbbbb (new) 0/1 CrashLoopBackOff 3 2m
Problem: New pods crashing.
Step 4: Check logs:
kubectl logs hotel-management-aaaaa
Output:
Error: Cannot connect to database
Problem: Database connection issue.
Step 5: Check configuration:
kubectl describe pod hotel-management-aaaaa | grep -A 10 Env
Found: Missing DATABASE_URL environment variable.
Step 6: Fix:
# Update deployment
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: db-secret
key: url
Step 7: Apply fix:
kubectl apply -f deployment.yaml
Step 8: Verify:
kubectl rollout status deployment/hotel-management
kubectl get pods -l app=hotel-app
# All pods running
That's troubleshooting. Systematic. Effective.
Troubleshooting Tools​
kubectl describe deployment​
Best friend for troubleshooting:
kubectl describe deployment hotel-management
Shows:
- Deployment status
- ReplicaSet status
- Pod status
- Events
- Conditions
Think of it as: Complete management inspection. Everything visible.
kubectl rollout status​
Monitor rollout:
kubectl rollout status deployment/hotel-management
Shows:
- Rollout progress
- Current state
- Completion status
Think of it as: Renovation progress. See status.
kubectl rollout history​
View history:
kubectl rollout history deployment/hotel-management
Shows:
- Revision history
- Change causes
- Rollback options
Think of it as: Renovation history. Previous designs.
My Take: Troubleshooting Strategy​
Here's what I do:
When deployment fails:
- Check deployment status
- Describe deployment
- Check ReplicaSet
- Check pods
- Check logs
- Check events
- Fix and verify
The key: Systematic approach. Check everything. Find root cause. Fix it.
Memory Tip: The Hotel Room Management Problem Analogy​
Deployment troubleshooting = Fixing room management problems
Pods not created: Rooms not created Update stuck: Renovation stuck Pods crashing: Rooms have problems ReplicaSet issues: Floor manager problems
Once you see it this way, troubleshooting makes perfect sense.
Common Mistakes​
- Not checking deployment: Missing deployment-level issues
- Not checking ReplicaSet: Missing ReplicaSet issues
- Not checking pods: Missing pod-level issues
- Not checking logs: Missing application errors
- Not systematic: Random troubleshooting
Key Takeaways​
- Troubleshooting is systematic - Follow a process
- Check deployment first - Know the state
- Check ReplicaSet - Foundation level
- Check pods - Execution level
- Check logs - Application level
- Most problems are fixable - Don't give up
What's Next?​
Congratulations! You've completed the Deployments & ReplicaSets module. Now let's learn about services. Next: Understanding Services.
Remember: Deployment troubleshooting is like fixing room management problems. Systematic. Check deployment. Check ReplicaSet. Check pods. Find problem. Fix it. Most problems are fixable.