LogZilla

Quick checks

bash
# Pods and services
kubectl get pods
kubectl get svc

# Inspect a pod
kubectl describe pod <pod-name>

# Container logs
kubectl logs <pod-name> -c <container-name>

Common issues

Probes failing (NotReady / CrashLoopBackOff):
- Inspect readiness/liveness/startup probe configuration in manifests.
- Review container logs for stack traces or healthcheck errors.
Secrets or ConfigMaps missing:
- Ensure Common Config and Secrets are applied before modules.
- Verify base64 values and keys match manifest references.
Storage pending:
- Replace storageClassName with the cluster’s class or remove the field to use the default StorageClass.
- Confirm PV/PVC provisioning status.
Ingress errors:
- Confirm the correct IngressClass and annotations for the provider.
- On GKE, verify NEG backends and health checks.
External ports not reachable:
- Confirm LoadBalancer Services were assigned external IPs and firewall rules permit inbound TCP/UDP as required.

Component-specific tips

Ingest
- syslogng exposes TCP/UDP 514, JSON 515, RFC5424 601; ensure the syslog Service exists and pods are Ready.
- httpreceiver serves /incoming on port 80; verify the httpreceiver Service and Ingress/LB route.
Storage
- Check storagemodule and InfluxDB logs for disk or memory pressure.
- Validate PVCs (sm-data, sm-archives, influxdb-data).
API
- gunicorn health endpoint: /ping on port 80.
- tornado health endpoint: /ping on port 8001.
Query
- Ensure SM_API_ADDRESSES points to the actual storage-<ordinal> range.

Reapply and rollouts

bash
# Reapply a manifest after editing
kubectl apply -f <file>.yaml

# Restart a statefulset to pick up changes
kubectl rollout restart statefulset/<name>

# Monitor rollout
kubectl rollout status statefulset/<name>