Data Archiving And Retention

LogZilla documentation for Data Archiving And Retention

LogZilla provides automated data archiving to manage storage space while maintaining access to historical log data. The archiving system moves older data from active storage to compressed archive storage, where it remains searchable but with slower performance.

Important: Archive settings directly affect data retention and storage usage. Incorrect configuration can result in premature data loss. Always verify settings before making changes in production environments.

How Archiving Works

LogZilla manages data in two states:

  • Online Data: Recent data stored in active indexes for fast searching
  • Archived Data: Older data moved to compressed storage, still searchable but with reduced performance

Data is organized into hourly chunks and automatically transitioned based on configured retention policies.

Archive Configuration

Archive settings are managed through the storage configuration:

bash
# View current archive settings
logzilla settings list storage

# Configure archive settings
logzilla settings update AUTO_ARCHIVE_ENABLED=true
logzilla settings update ARCHIVE_FLUSH_DAYS=365
logzilla settings update ARCHIVE_EXPIRE_DAYS=30
logzilla settings update AUTOARCHIVE_CRON_HOUR=5

Archive Settings

SettingDescriptionDefaultImpact
AUTO_ARCHIVE_ENABLEDEnable automatic archivingtrueDisabling stops all automatic archiving
ARCHIVE_FLUSH_DAYSDays before data is archived365Shorter = less online storage used
ARCHIVE_EXPIRE_DAYSDays before archived data is deleted5Critical: Data permanently lost after this period
AUTOARCHIVE_CRON_HOURDaily archive job time (24-hour format)5Schedule during low-usage periods

Critical Warning: ARCHIVE_EXPIRE_DAYS determines when archived data is permanently deleted. Total data retention is ARCHIVE_FLUSH_DAYS + ARCHIVE_EXPIRE_DAYS. Verify this meets your retention requirements.

Automatic Archiving

LogZilla automatically manages data lifecycle:

  1. Daily Processing: Archive job runs at configured hour
  2. Data Archiving: Online data older than ARCHIVE_FLUSH_DAYS is archived
  3. Data Expiration: Archived data older than ARCHIVE_EXPIRE_DAYS is deleted
  4. No Downtime: Process runs in background without service interruption

Manual Archive Management

Use the logzilla archives command for manual operations:

Archive Specific Date Ranges

bash
# Archive data for specific period
logzilla archives archive --ts-from 2024-01-01 --ts-to 2024-02-01

# Archive data older than specified days
logzilla archives archive --expire-days 30

# Round timestamps to nearest hour (aligns with LogZilla's hourly chunks)
logzilla archives archive --ts-from 2024-01-01 --ts-to 2024-02-01 --round

Remove Archived Data

bash
# Permanently remove archived data for specific period
logzilla archives remove --ts-from 2023-01-01 --ts-to 2023-02-01

Warning: The remove command permanently deletes data. This action cannot be undone. Ensure you have backups if the data might be needed later.

Automatic Archive Searching

LogZilla automatically determines whether to search archived data based on the date range specified:

  1. Recent Data: Searches within the online data retention period use only active indexes for fast results
  2. Historical Data: Searches with date ranges extending beyond online data automatically include archived data
  3. Performance Impact: Archive search performance depends on the storage hardware used for archived data
  4. Seamless Experience: Users don't need to manually "restore" data - LogZilla handles archive access transparently

No Rehydration Required

Unlike traditional log management systems that require users to manually rehydrate archived data back to active storage, LogZilla provides direct access to archived logs without any manual intervention.

Traditional Rehydration Process (Not Required with LogZilla)

Most log management platforms require administrators to:

  1. Identify archived data needed for analysis
  2. Request rehydration of specific time ranges or data sets
  3. Wait for rehydration to complete (often hours or days)
  4. Pay additional costs for rehydration processing and temporary storage
  5. Manage rehydrated data lifecycle and cleanup

LogZilla's Seamless Archive Access

LogZilla eliminates the rehydration bottleneck entirely:

  • No rehydration delays: Archived data is immediately accessible
  • No rehydration costs: No additional charges for accessing historical data
  • No rehydration management: No need to track or cleanup rehydrated datasets
  • No rehydration planning: No advance planning required for historical analysis
  • No rehydration limits: Access any archived timeframe without restrictions

Benefits Over Rehydration-Based Systems

Immediate Access: Start analyzing historical data instantly without waiting for rehydration processes to complete.

Cost Efficiency: Eliminate rehydration fees and temporary storage costs associated with bringing archived data back online.

Operational Simplicity: Remove the complexity of managing rehydration workflows, scheduling, and cleanup processes.

Forensic Readiness: Respond to security incidents immediately without rehydration delays that could impact investigation timelines.

Compliance Reporting: Generate compliance reports spanning any timeframe without advance rehydration planning.

Competitive Advantage: LogZilla's architecture provides direct archive access capabilities that eliminate the operational overhead and costs associated with traditional rehydration workflows found in other log management platforms.

API Queries

Authentication Required: All LogZilla API requests require authentication. See Getting Started for API token creation and usage instructions.

LogZilla automatically determines whether to search archived data based on the date range specified in your query. When your time range extends beyond the online data retention period, archived data is automatically included.

Example: Recent Data Search (Online Only)

bash
# List existing API tokens (requires admin access)
sudo logzilla authtoken list

# Use an existing USER token (replace with your actual token)
TOKEN="6160ce50a098067f39d1acc72396b31c5518a5ca7b178538"

# Or create a new USER token if needed (creates user-prefixed token)
# TOKEN=$(sudo logzilla authtoken create | tail -1)

# Search last hour (online data only - fast)
curl -X POST \
  -H "Authorization: token $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "Search",
    "params": {
      "time_range": {"preset": "last_hour"},
      "filter": [{"field": "message", "value": "error"}],
      "page_size": 100
    }
  }' \
  "http://your-logzilla-server/api/query"

Example: Historical Data Search (Includes Archives)

bash
# Search last 30 days (includes archived data - slower)
curl -X POST \
  -H "Authorization: token $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "Search",
    "params": {
      "time_range": {"preset": "last_30_days"},
      "filter": [{"field": "message", "value": "error"}],
      "page_size": 100
    }
  }' \
  "http://your-logzilla-server/api/query"

Example: Specific Date Range Search

bash
# Search specific date range (may include archives)
curl -X POST \
  -H "Authorization: token $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "Search",
    "params": {
      "time_range": {
        "ts_from": 1609459200,
        "ts_to": 1609545600
      },
      "filter": [
        {"field": "host", "value": "web-server-01"},
        {"field": "severity", "op": "le", "value": 3}
      ],
      "sort": ["first_occurrence"],
      "page_size": 50
    }
  }' \
  "http://your-logzilla-server/api/query"

Performance Note: Queries spanning archived data will take longer to complete and may return status 202 ACCEPTED for asynchronous processing. Use the returned query_id to check results or implement websocket subscriptions for real-time updates.

Monitoring and Logs

Archive Activity Logs

Monitor archive operations through LogZilla's system logs:

bash
# View archive-related log entries
sudo grep -i archive /var/log/logzilla/logzilla.log

# Monitor real-time archive activity
sudo tail -f /var/log/logzilla/logzilla.log | grep -i archive

# Check archive job status
logzilla status | grep -i archive

API-Based Archive Monitoring

bash
# Use existing API token (replace with your actual token)
TOKEN="6160ce50a098067f39d1acc72396b31c5518a5ca7b178538"

# Or create new USER token if needed (creates user-prefixed token)
# TOKEN=$(sudo logzilla authtoken create | tail -1)

# Query archive-related system events
curl -X POST \
  -H "Authorization: token $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "Search",
    "params": {
      "time_range": {"preset": "last_24_hours"},
      "filter": [
        {"field": "program", "value": "logzilla"},
        {"field": "message", "value": "archive"}
      ],
      "sort": ["-first_occurrence"],
      "page_size": 50
    }
  }' \
  "http://your-logzilla-server/api/query"

Archive Status

Monitor current archive status:

bash
# View archived data chunks
curl http://your-server/api/archives

System Logs

Archive operations are logged to the main LogZilla log:

bash
# Monitor archive operations
grep -i "archive" /var/log/logzilla/logzilla.log

Configuration Examples

Common Retention Scenarios

90 Days Online, 1 Year Total Retention:

bash
logzilla settings update ARCHIVE_FLUSH_DAYS=90
logzilla settings update ARCHIVE_EXPIRE_DAYS=275
# Total retention: 90 + 275 = 365 days

30 Days Online, 3 Years Total Retention:

bash
logzilla settings update ARCHIVE_FLUSH_DAYS=30
logzilla settings update ARCHIVE_EXPIRE_DAYS=1065
# Total retention: 30 + 1065 = 1095 days (3 years)

7 Days Online, 6 Months Total Retention:

bash
logzilla settings update ARCHIVE_FLUSH_DAYS=7
logzilla settings update ARCHIVE_EXPIRE_DAYS=173
# Total retention: 7 + 173 = 180 days (6 months)

Best Practices

Storage Planning

  • Calculate Total Retention: ARCHIVE_FLUSH_DAYS + ARCHIVE_EXPIRE_DAYS
  • Monitor Disk Usage: Archive storage is located at /var/lib/logzilla-archive/
  • Plan for Growth: Consider data volume increases over time
  • Use --round Flag: When manually archiving, use --round to align with hourly chunks

Performance Optimization

  • Schedule Wisely: Run archive jobs during low-usage periods
  • Limit Archive Searches: Use archived data searches sparingly
  • Monitor Impact: Archive operations can affect system performance

Data Safety

  • Verify Settings: Confirm retention periods meet compliance requirements
  • Test Procedures: Validate archive and search functionality in test environments
  • Document Policies: Maintain clear data retention documentation

Relocating Archive Storage

Use this procedure to move the archive volume to a new host directory (bind mount) when reallocating disk space.

Check Current Archive Size

bash
# Inspect size of the archive volume (no host path assumptions)
docker run --rm -v lz_archive:/archive \
  logzilla/runtime sh -lc 'du -csh /archive'

Move the Archive to a New Host Directory

bash
logzilla stop

# Ensure /new_archive_dir exists and has sufficient space
docker run --rm \
  -v /new_archive_dir:/new_archive_dir \
  -v lz_archive:/temp_archive \
  logzilla/runtime sh -lc 'mv /temp_archive/* /new_archive_dir/'

# Recreate the volume as a bind mount to the new directory
docker rm lz_watcher
docker volume rm lz_archive
docker volume create --opt type=none --opt o=bind \
  --opt device=/new_archive_dir lz_archive

logzilla start

Notes:

  • Replace /new_archive_dir with the destination directory on the host. Create it before running the commands.
  • The archive path inside the container remains /var/lib/logzilla-archive/. The lz_archive volume binds that path to the specified host directory.
  • Ensure no containers are using the archive during the move; logzilla stop handles this.

Troubleshooting

Common Issues

Archive Job Failures: Check system logs and available disk space

Slow Archive Searches: Verify system resources and consider limiting search scope

Missing Archived Data: Confirm data hasn't exceeded ARCHIVE_EXPIRE_DAYS

Storage Space Issues: Monitor both online and archive storage locations

Recovery Procedures

For archive-related issues:

  1. Check system logs for error messages
  2. Verify storage configuration and available space
  3. Contact LogZilla support for data recovery assistance
  4. Maintain external backups for critical data protection
Data Archiving And Retention | LogZilla Documentation