The Power of IT Knowledge

We all know that knowledge is power. Conversely, lack of knowledge leaves you powerless to manage your infrastructure. Andrew Lerner at Gartner posted an article titled, “Simplicity Should Break Ties” and noted that “automation is probably one of the best kept secrets in networking in terms of improving availability and reducing operational expense,” and I agree 100%.

Logs and Unstructured Data: The Key to Technology OptimizationLogs and other unstructured data are a vital component of technology optimization. Event metrics provide information on protocol changes, administrative activity, faults, security, and so much more. Furthermore, analyzing and correlating the data is a key part of root cause analysis. When data is collected over a period of time, it can provide important visibility into trends, recurring problems, or just the overall health of an infrastructure. Metrics of events are just as important as the contents of those events.

LogZilla: A NetOps Solution Beyond Just a Log Tool

LogZilla goes beyond “just a log tool”. Anyone can make a dashboard, but everyone else is missing the point about Network Management. You need a solution that solves your pain, not a “tool” that points it out and emails you every 3 seconds. This is NetOps. This is LogZilla.

Reactive, Proactive, and Preemptive Network Management

Networks have millions (and for some, billions or even trillions) of logs per day flowing in. The ability to quickly understand what is most important is normally a difficult and time-consuming task. LogZilla saves time and quickly identifies the most actionable problems and even those rare problems which would have otherwise been overlooked by legacy tools.

Reactive Analysis and Known Error Databases

85% of the largest companies in the world still use event analysis in a reactive manner. This is partially due to the fear of “too much to look at” and the lack of a viable solution for removing the non-actionable events that clutter the user’s view. Reactive analysis only serves to provide post-mortem information on why something went wrong. Ideally, this information should be collected so that the next time it occurs, the lesson learned can be avoided. Sadly, many companies fail to record this and turn that knowledge into actionable information by storing it somewhere such as a “Known Error Database”, or KEDB.

Put plainly, a KEDB is used to record lessons learned in order to apply automated actions using the management software the next time it occurs. In return, this event is now able to become a “Proactive” trigger for future occurrences.

Proactive Management: Applying Lessons Learned

In a proactive environment, the lessons learned from past mistakes (and recorded in a KEDB, or at least somewhere) are applied to events as they occur. This enables companies to avoid those past problems.

Let’s take a “Known Event” as an example. Almost every network engineer has seen this in some form or another (depending on the vendor hardware):

%CDP-4-DUPLEX_MISMATCH: duplex mismatch discovered on GigabitEthernet1/0/1 (not full duplex), with foo.example.com GigabitEthernet0/1 (full duplex).

The message indicates that the duplex configuration of an Ethernet port is different from the configuration set on at least one of the neighboring ports. This means that users, servers, or whatever is connected to that port are getting half of the bandwidth that they should be.

This is a good example of something that is quite simple to fix, but tends to go unchecked in many environments. This results in an increase of user complaints about slow network access. Why ignore it? Why not simply fix it? The answer lies in the sheer volume of them that get reported on a daily basis in large company networks. It requires a configuration change on the device and some companies are (correctly) a bit apprehensive of change. The right answer is to follow change control procedures and actually fix it; don’t ignore it.

The next step in the proactive process is to notify… tell someone or something that it happened. But is this really being proactive? Alerting is only part of the story. And in large networks, it doesn’t help much to just alert on bad things happening.

Preemptive Management: Automated Remediation and Enrichment

In a preemptive environment, we employ the use of both proactive knowledge coupled with external knowledge to make informed decisions about how to automatically remediate known errors. Gathering intel on the affected entities allows for event enrichment and intelligence obtained from multiple sources of information such as KEDB’s, SLA’s, device locations, device importance, Configuration and Compliance/Change Management (NCCM’s), Performance Management, Security, Network and Infrastructure Diagrams, even external sources of information such as Local Weather for that location, power outages, etc.

Let’s take that same “Known Event” used in the proactive model and extend it to an actionable, preemptive process.

%CDP-4-DUPLEX_MISMATCH: duplex mismatch discovered on GigabitEthernet1/0/1 (not full duplex), with foo.example.com GigabitEthernet0/1 (full duplex).

Cisco Duplex Trigger
Cisco Duplex Mismatch Auto Repair

Now we know about the event, and now we know what it means (since we have it in our KEDB). But what do we do with it?

  • Where did it come from?
  • Who owns it?
  • Was this an authorized change?
  • What were the last 5 logins to that device?
  • What should that device’s configuration be?
  • Do we have a “Gold Standard” configuration database?

I’ll use the last two as an example, but throw in a bonus.

In LogZilla, we have a simple “trigger” that looks for this event.

  • Cisco Duplex Mismatch Auto Repair
  • LogZilla detects the event within 1 second
  • LogZilla scans our configuration DB and looks the device up along with that interface configuration.
  • 1 second later, LogZilla has logged into the device and it’s fixed.

And, for a bonus, let’s tell someone what LogZilla did.

  • Bonus: What did you do?

Our CEO did a short demo of this process a little while back which you can see here.

The LogZilla NetOps Platform delivers data enrichment, automation and simplicity, enables faster response, and preemptively identifies and resolves network problems before they become outages. LogZilla is built By NetOps, For NetOps.

Real-World Use Cases

  1. Insurance: An insurance company implemented LogZilla to proactively monitor and manage their network infrastructure, reducing downtime and improving customer satisfaction.
  2. Banking: A large bank used LogZilla to preemptively identify and resolve network issues, ensuring continuous uptime for their online banking services.
  3. Federal Government: A government agency deployed LogZilla to enhance network security and automate incident response, protecting sensitive data and improving overall cybersecurity posture.
  4. Healthcare: A hospital utilized LogZilla to optimize their network performance, ensuring reliable access to critical patient data and medical systems.
  5. Retail: A global retail chain implemented LogZilla to monitor and manage their network infrastructure across multiple locations, improving overall operational efficiency.
  6. Energy: An energy company used LogZilla to proactively identify and resolve network issues, ensuring a stable and reliable infrastructure for their power distribution systems.

Duplex to Slack

  • LogZilla detects the event within 1 second
  • LogZilla scans our configuration DB and looks the device up along with that interface configuration.
  • 1 second later, LogZilla has logged into the device and it’s fixed.

And, for a bonus, let’s tell someone what LogZilla did.

Bonus: What did you do?

Our CEO did a short demo of this process a little while back which you can see here.

The transition from reactive to proactive and preemptive network management is crucial for businesses to optimize their infrastructure, reduce downtime, and improve overall operational efficiency. LogZilla's NetOps Platform offers a comprehensive solution that goes beyond traditional log management tools to deliver data enrichment, automation, and simplicity, enabling businesses to quickly identify, prioritize, and resolve network issues before they escalate into outages. By leveraging LogZilla's powerful capabilities, businesses can transform their network management processes, ensuring a stable, secure, and high-performance network infrastructure.

Posted 
August 11, 2021
 in 
LogZilla University
 category

More from the

LogZilla University

 category

View All