LogZilla

Overview

LogZilla demonstrated two reference points for throughput:

Single‑server processing of approximately ten terabytes per day.
~5 million events per second (~230 TB/day at ~500 bytes/event) on Kubernetes‑based deployments.

These results stem from minimizing per‑event overhead, maintaining high concurrency, and reducing downstream volume with immediate‑first forwarding and real‑time deduplication. The following sections explain the architecture, methodology, and when this capacity matters.

TL;DR

Single server: ~10 TB/day under sustained load. Kubernetes: ~5M EPS (~230 TB/day at ~500 bytes/event).
Immediate‑first and real‑time dedup reduce downstream ingest while preserving investigative fidelity.
Favor EPD licensing and direct‑search archives to align cost with value.

Decision quickstart

Use the same preprocessed dataset to compare options and verify:

Dataset and event size (bytes/event), average and peak rates.
Billable unit and transforms location (EPD vs per‑GB vs workload).
Archive behavior (direct‑search vs rehydration) and restore limits/time.
Growth and surge handling (onboarding spikes, incident bursts).

Architecture notes (what enables throughput)

Concurrency and batching tuned for ingest, normalization, and forwarding.
Lightweight normalization and optional enrichment at ingest.
Immediate‑first behavior: first occurrence forwards instantly; duplicates are counted and summarized.
Deduplication window per source class to capture repeats without delaying detection.
Searchable archives to keep long‑tail history without rehydration.

Immediate‑first and deduplication reduce paid ingest downstream while preserving fidelity for investigations.

Methodology (how benchmarks were executed)

Representational event mixes for network and security telemetry.
Sustained load with back‑pressure and error monitoring enabled.
Capacity determination based on stability (no data loss) and latency targets.
For the ~5M EPS benchmark, a multi‑node Kubernetes deployment was used and sized for sustained ingest.

Results summary

Single‑server: ~10 TB/day under sustained load.
Kubernetes: ~5M EPS (~230 TB/day at ~500 bytes/event).
Stable operation with predictable latency and no data loss in the test window.

Benchmark summary table

Environment	Nodes/size	Throughput	Bytes/event	Approx TB/day	Notes
Single server	Single server (test profile)	~10 TB/day	~500 (assumed)	~10	Sustained ingest; stable latency
Kubernetes	25 nodes (8 vCPU, 8GB RAM each)	~5M EPS	~500	~230	Sized for sustained ingest

Sizing quick checklist

Event mix and parsing: define average bytes/event, normalized fields, and any transforms applied at ingest.
Ingest targets: document average GB/day, peak events per second, and burst behavior during incidents.
Single‑server profile: confirm CPU cores, memory, fast NVMe storage, and 10/25/40G NICs sized to the target.
Kubernetes profile: confirm node count, vCPU/RAM per node, storage class, and network throughput.
Policy settings: keep immediate‑first enabled, choose conservative dedup windows per source class, and define routing destinations.
Validation: verify no loss under back‑pressure, stable latency, and acceptable error rates during sustained load.

What this enables

Smaller clusters for a given ingest target, which lowers operational complexity.
Lower downstream index volume when immediate‑first and dedup are applied upstream.
Faster investigations due to cleaner datasets and preserved first occurrences with accurate counts.

When to apply (practical guidance)

High‑volume infrastructure logs or EDR exports where duplicate bursts occur.
Workloads that demand lower end‑to‑end latency under sustained ingest.
Scenarios where rehydration overhead would slow investigations; searchable archives keep history accessible.

Use cases and comparison context

Cost‑sensitive SIEM back ends. High‑throughput ingest paired with immediate‑ first and dedup reduces downstream volume. For platform trade‑offs, see /blogs/cloud-siem-cost-control-patterns/ and /blogs/splunk-alternatives-2025/.
Large EDR exports and infrastructure chat. Sustained, bursty sources benefit from conservative windows that suppress repeats without delaying first‑seen.
Multi‑tenant or multi‑team environments. Enrichment and routing enable clear ownership and targeted forwarding. For budgeting perspectives, see /blogs/total-cost-ownership-cloud-log-management-2025/ and /blogs/logzilla-cloud-vs-splunk-cloud-cost-analysis-2025/.

Considerations and safe tuning

Start with conservative dedup windows for noisy sources (for example, 30–60 seconds) and immediate‑first enabled.
Track forwarded volume, duplicate ratios, latency, and error rates weekly.
Keep rules as code with review/rollback; tune per source class.

Related implementation resources

Upstream preprocessing patterns and decision criteria: see /blogs/cloud-siem-cost-control-patterns/.
Deduplication strategy and tuning detail: see /blogs/taming-log-storms-advanced-event-deduplication-strategies/.
Cost modeling and TCO guidance: see /blogs/total-cost-ownership-cloud-log-management-2025/.

Micro-FAQ

What hardware footprint is needed for ~10 TB/day?

The benchmark used a single server sized for the test profile. Actual capacity depends on event mix, parsing, and policy settings.

Does deduplication hide important signals?

No. Immediate‑first preserves the first occurrence in real time while duplicates are counted and summarized. Full history remains searchable.

How does this translate to smaller environments?

The same architectural choices that enable high throughput reduce overhead and contention at lower volumes as well.

What does EPD licensing mean?

Licensing is based on Events Per Day (EPD), not storage volume.

Next Steps

Validate immediate‑first and conservative windows in a pilot.
Measure deltas in forwarded volume, search latency, and duplicate ratios.
Size infrastructure to measured post‑preprocessing volumes, not raw ingest.

LogZilla Performance at Scale - From Single Server to ~5M EPS