Troubleshooting

LogZilla documentation for Troubleshooting

Quick checks

bash
# Pods and services
kubectl get pods
kubectl get svc

# Inspect a pod
kubectl describe pod <pod-name>

# Container logs
kubectl logs <pod-name> -c <container-name>

Common issues

  • Probes failing (NotReady / CrashLoopBackOff):
    • Inspect readiness/liveness/startup probe configuration in manifests.
    • Review container logs for stack traces or healthcheck errors.
  • Secrets or ConfigMaps missing:
    • Ensure Common Config and Secrets are applied before modules.
    • Verify base64 values and keys match manifest references.
  • Storage pending:
    • Replace storageClassName with the cluster’s class or remove the field to use the default StorageClass.
    • Confirm PV/PVC provisioning status.
  • Ingress errors:
    • Confirm the correct IngressClass and annotations for the provider.
    • On GKE, verify NEG backends and health checks.
  • External ports not reachable:
    • Confirm LoadBalancer Services were assigned external IPs and firewall rules permit inbound TCP/UDP as required.

Component-specific tips

  • Ingest
    • syslogng exposes TCP/UDP 514, JSON 515, RFC5424 601; ensure the syslog Service exists and pods are Ready.
    • httpreceiver serves /incoming on port 80; verify the httpreceiver Service and Ingress/LB route.
  • Storage
    • Check storagemodule and InfluxDB logs for disk or memory pressure.
    • Validate PVCs (sm-data, sm-archives, influxdb-data).
  • API
    • gunicorn health endpoint: /ping on port 80.
    • tornado health endpoint: /ping on port 8001.
  • Query
    • Ensure SM_API_ADDRESSES points to the actual storage-<ordinal> range.

Reapply and rollouts

bash
# Reapply a manifest after editing
kubectl apply -f <file>.yaml

# Restart a statefulset to pick up changes
kubectl rollout restart statefulset/<name>

# Monitor rollout
kubectl rollout status statefulset/<name>
Troubleshooting | LogZilla Documentation