Troubleshooting
LogZilla documentation for Troubleshooting
Quick checks
bash# Pods and services
kubectl get pods
kubectl get svc
# Inspect a pod
kubectl describe pod <pod-name>
# Container logs
kubectl logs <pod-name> -c <container-name>
Common issues
- Probes failing (NotReady / CrashLoopBackOff):
- Inspect readiness/liveness/startup probe configuration in manifests.
- Review container logs for stack traces or healthcheck errors.
- Secrets or ConfigMaps missing:
- Ensure
Common Config and Secrets
are applied before modules. - Verify base64 values and keys match manifest references.
- Ensure
- Storage pending:
- Replace
storageClassName
with the cluster’s class or remove the field to use the default StorageClass. - Confirm PV/PVC provisioning status.
- Replace
- Ingress errors:
- Confirm the correct IngressClass and annotations for the provider.
- On GKE, verify NEG backends and health checks.
- External ports not reachable:
- Confirm LoadBalancer Services were assigned external IPs and firewall rules permit inbound TCP/UDP as required.
Component-specific tips
- Ingest
syslogng
exposes TCP/UDP 514, JSON 515, RFC5424 601; ensure thesyslog
Service exists and pods are Ready.httpreceiver
serves/incoming
on port 80; verify thehttpreceiver
Service and Ingress/LB route.
- Storage
- Check
storagemodule
and InfluxDB logs for disk or memory pressure. - Validate PVCs (
sm-data
,sm-archives
,influxdb-data
).
- Check
- API
gunicorn
health endpoint:/ping
on port 80.tornado
health endpoint:/ping
on port 8001.
- Query
- Ensure
SM_API_ADDRESSES
points to the actualstorage-<ordinal>
range.
- Ensure
Reapply and rollouts
bash# Reapply a manifest after editing
kubectl apply -f <file>.yaml
# Restart a statefulset to pick up changes
kubectl rollout restart statefulset/<name>
# Monitor rollout
kubectl rollout status statefulset/<name>