TL;DR
- Centralized collection and normalization improved visibility across OT and IT networks, creating consistent audit trails and faster investigations.
- Ingest‑time preprocessing reduced noisy duplicates while preserving first occurrences for real‑time visibility.
- Automation remained policy‑bound and safety‑aware; actions were coordinated from playbooks validated for operational impact.
The problem
VEKS operates power plant networks where safety and reliability are paramount. Operational technology (OT) systems emit high‑volume telemetry with periodic status lines and bursts during incidents. Disparate sources, differing formats, and separate tooling complicated investigations and slowed response. The team needed centralized visibility with clear ownership, consistent retention, and a measured approach to automation that respected OT safety constraints.
Environment and constraints
- OT segments with ICS equipment and strict change‑control practices.
- Safety policies that limit automated actions and require pre‑approved flows.
- Compliance alignment for program requirements (for example, NIST SP 800‑82 and NERC CIP), plus site‑specific policies for collection and retention.
- Network segmentation and limited maintenance windows for configuration changes.
NIST SP 800-82 provides guidance on Industrial Control Systems (ICS) security.
NERC CIP standards provide cybersecurity requirements for Bulk Electric System cyber systems.
Syslog protocol with structured data is standardized in RFC 5424.
Solution
VEKS deployed centralized log collection with secure transport and focused normalization at ingest:
- Secure transport. Enable TLS for sensitive paths and cross‑boundary flows.
- Ingest‑time preprocessing. Apply immediate‑first forwarding so the first occurrence is visible in real time; deduplicate repeats within a conservative window and track accurate counts.
- Enrichment. Add ownership, site, and device role to accelerate triage and route events to the right responders.
- Routing. Deliver security‑relevant streams to premium destinations; keep summaries and first‑occurrence pointers for repetitive operational chatter.
- Auditability. Treat rules as code, with review and rollback; publish KPIs for forwarded volume, duplicate ratios, and latency.
Rollout and operations
The team executed a phased rollout:
- Baseline current ingest and incident patterns in a small pilot.
- Enable immediate‑first behavior and conservative dedup windows for chatty categories (for example, interface flaps and periodic keepalives).
- Add enrichment fields used by responders (owner, site, device role) and verify that dashboards and searches reflect the context.
- Route only security‑relevant streams to downstream analytics; keep summaries for repetitive categories elsewhere.
- Review KPIs weekly with operations and safety stakeholders; adjust windows and routes based on measured outcomes.
Results
- Lower noise levels in dashboards and queues during incident bursts, while the first occurrence remained visible immediately.
- Faster investigations due to consistent enrichment and ownership fields.
- Clearer audit trails from centralized collection, consistent retention, and change‑controlled preprocessing rules.
Lessons learned and guidance
- Keep automation policy‑bound and safety‑aware; require corroboration before actions that alter network state.
- Start with a narrow set of high‑volume categories; expand only after KPIs demonstrate stable improvements.
- Maintain direct‑searchable archives to avoid rehydration delays during review and audit.
Related resources
- Upstream cost and fidelity patterns:
/blogs/cloud-siem-cost-control-patterns/
- Secure transport in OT and IT networks:
/blogs/syslog-ng-tls-secure-logging/
- Foundational collection and triage practices:
/blogs/syslog-essentials/
- Reducing downstream costs without losing visibility:
/blogs/reduce-siem-costs-intelligent-preprocessing/
Next steps
Organizations planning similar programs should begin with centralized collection, normalization, and clear retention policies. Define response playbooks with safety reviews, then measure outcomes such as investigation latency, duplicate ratios, and forwarded volume. Expand coverage incrementally as KPIs improve and stakeholders confirm the operational impact.
Micro-FAQ
Which logs are most important in power plant OT networks?
Controller and HMI events, authentication and access logs, historian and engineering workstation activity, and perimeter traffic.
Should response actions be automated in OT?
Automation can coordinate predefined actions such as isolating segments or disabling non‑essential pathways, aligned to safety and policy.
Which standards inform OT security programs?
NIST SP 800‑82 for ICS guidance and NERC CIP for BES cyber systems, plus site policies and vendor hardening guides.