Deployment Troubleshooting: When Deployments Don't Work

Deployment issues are frustrating. Your deployment exists. But pods aren't starting. Replicas aren't matching. Here's how to fix it.

🎯 The Big Picture

Think of deployment issues like hotel room management. The rooms should be ready. But they're not. The problem isn't that the deployment exists. The problem is why pods aren't working.

Deployment troubleshooting involves checking replicas, pod status, rollout status, and resource constraints. Here's how to fix it.

Common Deployment Issues

Symptoms:

Replicas not matching desired
Pods not starting
Rollout stuck
Deployment not updating

Step-by-Step Debugging Process

Step 1: Check Deployment Status

kubectl get deployments
kubectl describe deployment <deployment-name>

Look for:

Desired vs available replicas
Replica set status
Events
Conditions

Step 2: Check ReplicaSets

kubectl get replicasets
kubectl describe replicaset <rs-name>

Look for:

Desired vs ready replicas
Pod status
Events

Step 3: Check Pods

kubectl get pods -l app=<app-label>
kubectl describe pod <pod-name>

Look for:

Pod status
Events
Resource constraints
Image pull issues

Common Causes and Solutions

Cause 1: Replicas Not Matching

Symptoms:

Desired: 3, Available: 1
Pods not starting
Replicas not scaling

Solutions:

Check deployment:

kubectl describe deployment <deployment-name>
# Look for Replicas section

Check pod status:

kubectl get pods -l app=<app-label>
# Check why pods aren't starting

Check resource constraints:
```
kubectl top nodes
kubectl describe nodes
```
Fix resource issues:
- Reduce resource requests
- Add more nodes
- Fix pod issues

Cause 2: Rollout Stuck

Symptoms:

Deployment updating
Rollout not completing
New pods not ready

Solutions:

Check rollout status:

kubectl rollout status deployment/<deployment-name>

Check new pods:

kubectl get pods -l app=<app-label>
# Check new replica set pods

Check readiness probe:

kubectl describe pod <pod-name>
# Look for Readiness section

Fix readiness probe:

readinessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 10  # Increase if needed
  periodSeconds: 5

Cause 3: Image Pull Issues

Symptoms:

Pods in ImagePullBackOff
Deployment not updating
New pods can't start

Solutions:

Check pod status:

kubectl get pods -l app=<app-label>
# Look for ImagePullBackOff

Check image:

kubectl describe pod <pod-name>
# Look for image pull errors

Fix image:
- Correct image name
- Fix authentication
- Check registry access

Cause 4: Resource Constraints

Symptoms:

Pods pending
Deployment not scaling
Insufficient resources

Solutions:

Check node resources:
```
kubectl top nodes
kubectl describe nodes
```

Check pod requests:

kubectl describe deployment <deployment-name>
# Look for resource requests

Reduce resource requests:

resources:
  requests:
    cpu: "100m"      # Reduce if too high
    memory: "128Mi"  # Reduce if too high

Or add nodes:
- Scale cluster
- Use cluster autoscaler

Cause 5: Health Check Failures

Symptoms:

Pods restarting
Deployment not updating
Liveness/readiness failing

Solutions:

Check probe status:

kubectl describe pod <pod-name>
# Look for Liveness and Readiness

Test probe endpoint:

kubectl exec -it <pod-name> -- curl localhost:8080/health

Fix probe:

livenessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 30  # Increase if app needs time
  periodSeconds: 10

readinessProbe:
  httpGet:
    path: /ready
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 5

Real-World Example: Rollout Stuck

Problem: Deployment rollout stuck. New pods not becoming ready.

Debugging:

Checked rollout:

kubectl rollout status deployment/my-app
# Waiting for rollout to finish

Checked new pods:

kubectl get pods -l app=my-app
# New pods: 0/1 Ready

Checked readiness probe:

kubectl describe pod <new-pod>
# Readiness probe failing

Tested endpoint:

kubectl exec -it <pod> -- curl localhost:8080/health
# Endpoint works but takes 20 seconds

Fixed probe:

readinessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 20  # Increased from 5
  periodSeconds: 5

Rollout completed: Pods became ready

Solution: Readiness probe initial delay too short. Increased delay. Rollout completed.

Hands-On Exercise: Debug Deployment

Create deployment with high resource requests:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: test-deployment
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: app
        image: nginx:alpine
        resources:
          requests:
            cpu: "100"  # Too high!
            memory: "1000Gi"  # Too high!

Debug it:

Check deployment: kubectl get deployment test-deployment
Check pods: kubectl get pods -l app=test-deployment
Describe pods: kubectl describe pod <pod-name>
Fix the issue (reduce resource requests)

This is how you learn. Break things. Fix them.

My Take: Deployment Troubleshooting

Deployment issues used to confuse me. I'd see deployments but pods weren't working.

Then I learned the systematic approach:

Check deployment status - Desired vs available
Check replica sets - Are they creating pods?
Check pod status - Why aren't pods starting?
Check resource constraints - Enough resources?
Check health probes - Are they configured correctly?

Now I fix deployment issues in minutes, not hours.

Memory Tip: The Hotel Room Management Analogy

Deployment issues are like hotel room management:

Rooms should be ready (Pods should be ready)
Wrong room type (Resource constraints)
Room key doesn't work (Image pull issues)
Room not ready (Health check failures)
Can't access room (Rollout stuck)

Check each component. Find the issue.

Common Mistakes

Not checking replica sets: RS shows pod creation status
Ignoring pod status: Pods tell you why they're not ready
Wrong resource requests: Too high = pending
Health probes too aggressive: App needs time to start
Not checking events: Events show what happened

Key Takeaways

Check deployment status - Desired vs available replicas
Check replica sets - Are pods being created?
Check pod status - Why aren't pods ready?
Check resource constraints - Enough resources?
Check health probes - Configured correctly?

What's Next?

Now that you understand deployment troubleshooting, let's create a comprehensive troubleshooting guide. Next: Complete Troubleshooting Guide.

Remember: Deployment issues are usually pod issues or resource constraints. Check deployment status. Check pods. Check resources. Fix the root cause.

🎯 The Big Picture​

Common Deployment Issues​

Step-by-Step Debugging Process​

Step 1: Check Deployment Status​

Step 2: Check ReplicaSets​

Step 3: Check Pods​

Common Causes and Solutions​

Cause 1: Replicas Not Matching​

Cause 2: Rollout Stuck​

Cause 3: Image Pull Issues​

Cause 4: Resource Constraints​

Cause 5: Health Check Failures​

Real-World Example: Rollout Stuck​

Hands-On Exercise: Debug Deployment​

My Take: Deployment Troubleshooting​

Memory Tip: The Hotel Room Management Analogy​

Common Mistakes​

Key Takeaways​

What's Next?​

🎯 The Big Picture

Common Deployment Issues

Step-by-Step Debugging Process

Step 1: Check Deployment Status

Step 2: Check ReplicaSets

Step 3: Check Pods

Common Causes and Solutions

Cause 1: Replicas Not Matching

Cause 2: Rollout Stuck

Cause 3: Image Pull Issues

Cause 4: Resource Constraints

Cause 5: Health Check Failures

Real-World Example: Rollout Stuck

Hands-On Exercise: Debug Deployment

My Take: Deployment Troubleshooting

Memory Tip: The Hotel Room Management Analogy

Common Mistakes

Key Takeaways

What's Next?