Application Performance Monitoring

LogZilla documentation for Application Performance Monitoring

Application Performance Monitoring Correlation

Application performance correlation detects degradation patterns, error clustering, and service dependency issues that span multiple events over time. LogZilla's pre-processing extracts performance metrics, while SEC handles complex threshold analysis and trend correlation.

Prerequisites: Ensure Event Correlation is enabled and forwarder reloading is available as shown in the Event Correlation Overview.

Response Time Degradation Detection

LogZilla Forwarder Configuration

yaml
# /etc/logzilla/forwarder.d/app-performance.yaml
type: sec
sec_name: app-performance
rules:
  - match:
      - field: program
        op: "eq"
        value: ["nginx", "apache", "webapp", "api-gateway"]
      - field: message
        op: "=~"
        value: "response_time|duration|elapsed"
    rewrite:
      message: "APP_RESPONSE host=$HOST service=$PROGRAM"

SEC Rule: Response Time Degradation with Recovery

text
# Detect application performance issues using message patterns
type=SingleWithThreshold
ptype=SubStr
pattern=APP_RESPONSE
context=($ENV{EVENT_MESSAGE} =~ /(slow|timeout|degraded|high.*latency)/)
desc=Application performance degradation detected on $ENV{EVENT_HOST}
action=eval %service $ENV{EVENT_PROGRAM}; \
       eval %host $ENV{EVENT_HOST}; \
       shellcmd (logger -t SEC-PERFORMANCE -p local0.warning \
       "PERFORMANCE_DEGRADATION service=%service host=%host"); \
       shellcmd (logger -t PERF-ALERT "Performance degradation: host=%host service=%service")
thresh=10  # 10 performance issues
window=300  # within 5 minutes

LogZilla Trigger: Performance Degradation Response

yaml
name: "Application Performance Degradation"
filter:
  - field: program
    op: eq
    value: SEC-PERFORMANCE
  - field: message
    op: "=~"
    value: "PERFORMANCE_DEGRADATION"
actions:
  exec_script: true
  script_path: "/usr/local/bin/performance-degradation-response.sh"
  send_webhook: true
  send_webhook_template: |
    {
      "alert_type": "performance_degradation",
      "service": "{{event:ut:service}}",
      "endpoint": "{{event:ut:endpoint}}",
      "host": "{{event:ut:host}}",
      "avg_response_time": "{{event:ut:avg_response}}",
      "severity": "warning"
    }

Intelligent Performance Response Script

bash
#!/bin/bash
# /usr/local/bin/performance-degradation-response.sh

SERVICE="$EVENT_USER_TAGS_SERVICE"
ENDPOINT="$EVENT_USER_TAGS_ENDPOINT"
HOST="$EVENT_USER_TAGS_HOST"
AVG_RESPONSE="$EVENT_USER_TAGS_AVG_RESPONSE"

# Query application metrics for additional context
CPU_USAGE=$(curl -s "http://$HOST:9090/metrics" | grep "cpu_usage" | awk '{print $2}')
MEMORY_USAGE=$(curl -s "http://$HOST:9090/metrics" | grep "memory_usage" | awk '{print $2}')
ACTIVE_CONNECTIONS=$(curl -s "http://$HOST:9090/metrics" | grep "active_connections" | awk '{print $2}')

# Determine root cause and appropriate response
if [[ $(echo "$CPU_USAGE > 80" | bc) -eq 1 ]]; then
    # High CPU - scale horizontally if possible
    logger -t PERF-RESPONSE "High CPU detected on $HOST - attempting auto-scale"
    curl -X POST "https://orchestrator.company.com/api/scale-service" \
         -d "service=$SERVICE&action=scale_out&reason=high_cpu"
         
elif [[ $(echo "$MEMORY_USAGE > 90" | bc) -eq 1 ]]; then
    # Memory pressure - restart service
    logger -t PERF-RESPONSE "Memory pressure on $HOST - scheduling service restart"
    curl -X POST "https://orchestrator.company.com/api/restart-service" \
         -d "service=$SERVICE&host=$HOST&reason=memory_pressure"
         
elif [[ $(echo "$ACTIVE_CONNECTIONS > 1000" | bc) -eq 1 ]]; then
    # Connection overload - enable rate limiting
    logger -t PERF-RESPONSE "Connection overload on $HOST - enabling rate limiting"
    curl -X POST "https://loadbalancer.company.com/api/rate-limit" \
         -d "service=$SERVICE&limit=100&duration=300"
         
else
    # Unknown cause - create investigation ticket
    logger -t PERF-RESPONSE "Unknown performance issue on $HOST - creating ticket"
    curl -X POST "https://servicedesk.company.com/api/tickets" \
         -d "priority=medium&subject=Performance Investigation&service=$SERVICE&host=$HOST"
fi

# Update performance dashboard
curl -X POST "https://dashboard.company.com/api/performance-events" \
     -d "service=$SERVICE&endpoint=$ENDPOINT&host=$HOST&response_time=$AVG_RESPONSE&status=degraded"

Error Rate Correlation

HTTP Error Clustering Detection

Correlate HTTP error patterns to detect application issues.

SEC Rule: Error Rate Spike Detection

text
# Detect HTTP error rate spikes with error type classification
type=SingleWithThreshold
ptype=SubStr
pattern=APP_RESPONSE
context=($ENV{EVENT_USER_TAGS_HTTP_STATUS} =~ /^[45]/)
desc=HTTP error rate spike detected
action=eval %service $ENV{EVENT_PROGRAM}; \
       eval %host $ENV{EVENT_HOST}; \
       eval %error_4xx (shellcmd_output("grep 'APP_RESPONSE.*status=4' /var/log/sec/performance.log | tail -100 | wc -l")); \
       eval %error_5xx (shellcmd_output("grep 'APP_RESPONSE.*status=5' /var/log/sec/performance.log | tail -100 | wc -l")); \
       if (%error_5xx > %error_4xx) { \
           eval %error_type "server_error"; \
       } else { \
           eval %error_type "client_error"; \
       } \
       shellcmd (logger -t SEC-PERFORMANCE -p local0.alert \
       "ERROR_RATE_SPIKE service=%service host=%host error_type=%error_type 4xx_count=%error_4xx 5xx_count=%error_5xx")
thresh=50  # 50 errors
window=300 # within 5 minutes

Database Performance Correlation

Generic Application Error Correlation

Note: This example uses message pattern matching since no database parsing apps exist in LogZilla by default. To parse specific database fields, you would need to create custom parsing rules.

LogZilla Forwarder Configuration

yaml
# /etc/logzilla/forwarder.d/app-error-correlation.yaml
type: sec
sec_name: app-errors
rules:
  - match:
      - field: program
        op: "eq"
        value: ["webapp", "api-server", "microservice"]
      - field: message
        op: "=~"
        value: "(ERROR|FATAL|Exception|timeout|slow.*query)"
    rewrite:
      message: "APP_ERROR host=$HOST service=$PROGRAM severity=$SEVERITY"

SEC Rule: Application Error Correlation

text
# Correlate application errors without high-cardinality user tags
type=SingleWithThreshold
ptype=SubStr
pattern=APP_ERROR
context=($ENV{EVENT_SEVERITY} <= 3)  # Error level and above
desc=Application error spike detected on $ENV{EVENT_HOST}
action=eval %host $ENV{EVENT_HOST}; \
       eval %service $ENV{EVENT_PROGRAM}; \
       shellcmd (logger -t SEC-ALERT -p local0.alert \
       "ERROR_SPIKE_DETECTED host=%host service=%service"); \
       shellcmd (/usr/local/bin/app-error-response.sh "%host" "%service")
thresh=10  # 10 errors
window=300  # within 5 minutes

Service Dependency Correlation

Cascading Service Failure Detection

Detect when upstream service failures cause downstream performance issues.

SEC Rule: Service Dependency Correlation

text
# Create context for upstream service failures
type=Single
ptype=SubStr
pattern=PERFORMANCE_DEGRADATION
context=($ENV{EVENT_USER_TAGS_SERVICE} =~ /^(auth|payment|inventory)/)
desc=Upstream service degradation detected
action=eval %upstream_service $ENV{EVENT_USER_TAGS_SERVICE}; \
       create UPSTREAM_DEGRADED_%upstream_service 1800; \
       shellcmd (logger -t SEC-DEPENDENCY "UPSTREAM_ISSUE service=\"%upstream_service\"")

# Detect downstream impact
type=Single
ptype=SubStr
pattern=PERFORMANCE_DEGRADATION
context=(UPSTREAM_DEGRADED_auth || UPSTREAM_DEGRADED_payment || UPSTREAM_DEGRADED_inventory) && \
        ($ENV{EVENT_USER_TAGS_SERVICE} =~ /^(web|api|mobile)/)
desc=Downstream service impact from upstream degradation
action=eval %downstream_service $ENV{EVENT_USER_TAGS_SERVICE}; \
       eval %upstream_cause (if (UPSTREAM_DEGRADED_auth) { "auth"; } elsif (UPSTREAM_DEGRADED_payment) { "payment"; } else { "inventory"; }); \
       shellcmd (logger -t SEC-DEPENDENCY -p local0.warning \
       "CASCADING_FAILURE downstream=%downstream_service upstream_cause=%upstream_cause")

Application Resource Correlation

Memory Leak Detection

Correlate memory usage patterns with application performance over time.

SEC Rule: Memory Leak Detection

text
# Track memory usage trends over time
type=SingleWithScript
ptype=SubStr
pattern=MEMORY_USAGE
script=/usr/local/bin/check-memory-trend.sh $ENV{EVENT_HOST} $ENV{EVENT_USER_TAGS_MEMORY_USAGE}
desc=Memory leak detected on $ENV{EVENT_HOST}
action=eval %host $ENV{EVENT_HOST}; \
       eval %memory_usage $ENV{EVENT_USER_TAGS_MEMORY_USAGE}; \
       eval %service $ENV{EVENT_USER_TAGS_SERVICE}; \
       shellcmd (logger -t SEC-PERFORMANCE -p local0.alert \
       "MEMORY_LEAK_DETECTED host=%host service=%service memory_usage=%memory_usage")
action2=logonly  # Normal memory usage

Memory Trend Analysis Script

bash
#!/bin/bash
# /usr/local/bin/check-memory-trend.sh

HOST="$1"
CURRENT_MEMORY="$2"

# Get memory usage history for the last hour
MEMORY_HISTORY="/tmp/memory_history_$HOST"
echo "$(date +%s) $CURRENT_MEMORY" >> "$MEMORY_HISTORY"

# Keep only last hour of data
awk -v cutoff=$(($(date +%s) - 3600)) '$1 >= cutoff' "$MEMORY_HISTORY" > "${MEMORY_HISTORY}.tmp"
mv "${MEMORY_HISTORY}.tmp" "$MEMORY_HISTORY"

# Calculate memory growth rate
MEMORY_SAMPLES=$(wc -l < "$MEMORY_HISTORY")

if [[ $MEMORY_SAMPLES -ge 10 ]]; then
    # Calculate linear regression slope for memory growth
    SLOPE=$(awk '{
        sum_x += NR; sum_y += $2; sum_xy += NR * $2; sum_x2 += NR * NR; n++
    } END {
        slope = (n * sum_xy - sum_x * sum_y) / (n * sum_x2 - sum_x * sum_x)
        print slope
    }' "$MEMORY_HISTORY")
    
    # If memory is growing consistently (slope > 1MB/minute)
    if [[ $(echo "$SLOPE > 1048576" | bc) -eq 1 ]]; then
        exit 0  # Memory leak detected
    fi
fi

exit 1  # Normal memory usage

Related Topics

Application Performance Monitoring | LogZilla Documentation