Service Troubleshooting: When Services Don't Work
Service issues are frustrating. Your pods are running. Your service exists. But nothing works. Here's how to fix it.
🎯 The Big Picture​
Think of service issues like a hotel phone system. The phones exist. The rooms exist. But calls aren't connecting. The problem isn't that things exist. The problem is why they're not connecting.
Service troubleshooting involves checking selectors, endpoints, ports, and network policies. Here's how to fix it.
Common Service Issues​
Symptoms:
- Service exists but not accessible
- Connection refused
- Timeout errors
- Service not routing traffic
Step-by-Step Debugging Process​
Step 1: Check Service Status​
kubectl get svc
kubectl describe svc <service-name>
Look for:
- Service type
- ClusterIP/NodePort/LoadBalancer
- Selectors
- Ports
Step 2: Check Endpoints​
kubectl get endpoints <service-name>
kubectl describe endpoints <service-name>
Key check:
- Are endpoints empty? → Selectors don't match pods
- Are endpoints correct? → Verify pod IPs
Example:
NAME ENDPOINTS AGE
my-service 10.244.1.5:8080,10.244.2.3:8080 5m
Step 3: Check Pod Selectors​
kubectl get pods --show-labels
kubectl get svc <service-name> -o yaml | grep selector
Key check:
- Do pod labels match service selectors?
- Are labels correct?
Common Causes and Solutions​
Cause 1: Selector Mismatch​
Symptoms:
- Service has no endpoints
- Endpoints list is empty
- Pods exist but not connected
Solutions:
-
Check service selectors:
kubectl get svc <service-name> -o yaml
# Look for selector section -
Check pod labels:
kubectl get pods --show-labels -
Fix selector or labels:
# Service
apiVersion: v1
kind: Service
spec:
selector:
app: my-app # Must match pod labels
# Pod
metadata:
labels:
app: my-app # Must match service selector
Cause 2: Wrong Port​
Symptoms:
- Service exists
- Endpoints exist
- Connection refused
Solutions:
-
Check service port:
kubectl get svc <service-name>
# Check PORT(S) column -
Check pod port:
kubectl describe pod <pod-name>
# Look for container port -
Fix port mapping:
apiVersion: v1
kind: Service
spec:
ports:
- port: 80 # Service port
targetPort: 8080 # Pod port (must match)
Cause 3: Pods Not Ready​
Symptoms:
- Endpoints exist
- Pods are running
- But not receiving traffic
Solutions:
-
Check pod readiness:
kubectl get pods
# Check READY column (should be 1/1) -
Check readiness probe:
kubectl describe pod <pod-name>
# Look for Readiness section -
Fix readiness probe:
readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
Cause 4: Network Policies​
Symptoms:
- Everything looks correct
- But traffic blocked
- Network policy blocking
Solutions:
-
Check network policies:
kubectl get networkpolicies
kubectl describe networkpolicy <policy-name> -
Check if policy blocks traffic:
- Review ingress rules
- Review egress rules
- Check pod selectors
-
Fix or adjust policy:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
spec:
podSelector:
matchLabels:
app: my-app
ingress:
- from:
- podSelector:
matchLabels:
app: client
egress:
- {} # Allow all egress
Cause 5: Service Type Issues​
Symptoms:
- Service exists
- Can't access from outside
- Wrong service type
Solutions:
-
Check service type:
kubectl get svc
# Check TYPE column -
Use correct type:
# ClusterIP (internal only)
apiVersion: v1
kind: Service
spec:
type: ClusterIP
# NodePort (external access)
spec:
type: NodePort
ports:
- port: 80
nodePort: 30080
# LoadBalancer (cloud)
spec:
type: LoadBalancer
Real-World Example: Selector Mismatch​
Problem: Service exists but no endpoints. Can't access application.
Debugging:
-
Checked service:
kubectl get svc my-service
# Service exists -
Checked endpoints:
kubectl get endpoints my-service
# Endpoints: <none> -
Checked selectors:
kubectl get svc my-service -o yaml | grep selector
# selector: app: my-app
kubectl get pods --show-labels
# Labels: app: myapplication # Mismatch! -
Fixed labels:
# Updated pod labels to match service selector
metadata:
labels:
app: my-app # Changed from myapplication -
Verified:
kubectl get endpoints my-service
# Endpoints now populated
Solution: Selector mismatch. Fixed labels. Service working.
Hands-On Exercise: Debug Service​
Create service with wrong selector:
apiVersion: v1
kind: Service
metadata:
name: test-service
spec:
selector:
app: wrong-label # Won't match pods
ports:
- port: 80
targetPort: 8080
Create pod with different label:
apiVersion: v1
kind: Pod
metadata:
name: test-pod
labels:
app: correct-label # Doesn't match service
spec:
containers:
- name: app
image: nginx:alpine
ports:
- containerPort: 80
Debug it:
- Check service:
kubectl get svc test-service - Check endpoints:
kubectl get endpoints test-service - Check selectors: Compare service selector with pod labels
- Fix the issue (match labels or selector)
This is how you learn. Break things. Fix them.
My Take: Service Troubleshooting​
Service issues used to confuse me. I'd see services but nothing worked.
Then I learned the systematic approach:
- Check endpoints - Are they populated?
- Check selectors - Do they match pod labels?
- Check ports - Are they correct?
- Check readiness - Are pods ready?
- Check network policies - Are they blocking?
Now I fix service issues in minutes, not hours.
Memory Tip: The Hotel Phone System Analogy​
Service issues are like hotel phone system:
- Phones exist (Service exists)
- Rooms exist (Pods exist)
- Wrong room number (Selector mismatch)
- Wrong extension (Port mismatch)
- Phone off hook (Pod not ready)
- Call blocked (Network policy)
Check each component. Find the mismatch.
Common Mistakes​
- Not checking endpoints: Empty endpoints = selector mismatch
- Port mismatch: Service port vs pod port
- Labels don't match: Selector vs pod labels
- Not checking readiness: Pods not ready
- Ignoring network policies: Policies might block
Key Takeaways​
- Check endpoints first - Empty = selector issue
- Verify selectors match labels - Must be exact
- Check ports - Service port vs pod port
- Check readiness - Pods must be ready
- Check network policies - Might be blocking
What's Next?​
Now that you understand service troubleshooting, let's tackle storage issues. Next: Storage Troubleshooting.
Remember: Service issues are usually selector mismatches or port issues. Check endpoints first. Verify selectors. Check ports. Fix the mismatch.