Potential Spike in Web Server Error Logs

This rule detects unusual spikes in error logs from web servers, which may indicate reconnaissance activities such as vulnerability scanning or fuzzing attempts by adversaries. These activities often generate a high volume of error responses as they probe for weaknesses in web applications. Error response codes may potentially indicate server-side issues that could be exploited.

Elastic rule (View on GitHub)

  1[metadata]
  2creation_date = "2025/11/19"
  3integration = ["nginx", "apache", "apache_tomcat", "iis"]
  4maturity = "production"
  5updated_date = "2025/12/01"
  6
  7[rule]
  8author = ["Elastic"]
  9description = """
 10This rule detects unusual spikes in error logs from web servers, which may indicate reconnaissance activities such
 11as vulnerability scanning or fuzzing attempts by adversaries. These activities often generate a high volume of error
 12responses as they probe for weaknesses in web applications. Error response codes may potentially indicate server-side
 13issues that could be exploited.
 14"""
 15from = "now-9m"
 16interval = "10m"
 17language = "esql"
 18license = "Elastic License v2"
 19name = "Potential Spike in Web Server Error Logs"
 20note = """ ## Triage and analysis
 21
 22> **Disclaimer**:
 23> This investigation guide was created using generative AI technology and has been reviewed to improve its accuracy and relevance. While every effort has been made to ensure its quality, we recommend validating the content and adapting it to suit your specific environment and operational needs.
 24
 25### Investigating Potential Spike in Web Server Error Logs
 26
 27This detection flags spikes of web server error responses across HTTP/TLS and common server platforms, signaling active scanning or fuzzing that can expose misconfigurations or exploitable paths. A typical pattern is an automated scanner sweeping endpoints like /admin/, /debug/, /.env, /.git, and backup archives while mutating query parameters, producing repeated 404/403 and occasional 500 responses across multiple applications within minutes.
 28
 29### Possible investigation steps
 30
 31- Pivot on the noisy client IP(s) to build a minute-by-minute timeline across affected hosts showing request rate, status codes, methods, and top paths to distinguish automated scanning from a localized application failure.
 32- Enrich the client with ASN, geolocation, hosting/Tor/proxy reputation, historical sightings, and maintenance windows to quickly decide if it matches a known external scanner or an internal scheduled test.
 33- Aggregate the most requested URIs and verbs and look for telltale patterns such as /.env, /.git, backup archives, admin consoles, or unusual verbs like PROPFIND/TRACE, then correlate any 5xx bursts with application and server error logs and recent deploys or config changes.
 34- Hunt for follow-on success from the same client by checking for subsequent 200/302s to sensitive paths, authentication events and session creation, or evidence of file writes and suspicious child processes on the web hosts.
 35- If traffic traverses a CDN/WAF/load balancer, pivot to those logs to recover true client IPs, review rule matches and throttling, and determine whether similar patterns occurred across multiple edges or regions.
 36
 37### False positive analysis
 38
 39- Internal QA or integration tests that systematically crawl application routes after a deployment can generate bursts of 404/403 and occasional 500s from a single client IP, closely resembling active scanning.
 40- A transient backend outage or misconfiguration (broken asset paths or auth flows) can cause legitimate traffic to return many errors aggregated under a shared egress IP (NAT), pushing per-IP counts above the threshold without adversary activity.
 41
 42### Response and remediation
 43
 44- Immediately block or throttle the noisy client IPs at the WAF/CDN and load balancer by enabling per-IP rate limits and signatures for scanner patterns such as repeated hits to /.env, /.git, /admin, backup archives, or unusual verbs like PROPFIND/TRACE.
 45- If errors include concentrated 5xx responses from one web host, drain that node from service behind the load balancer, capture its web and application error logs, and roll back the most recent deploy or config change until error rates normalize.
 46- Remove risky exposures uncovered by the scan by denying access to environment files and VCS directories (.env, .git), disabling directory listing, locking down admin consoles, and rejecting unsupported HTTP methods at the web server.
 47- Escalate to Incident Response if the same client shifts from errors to successful access on sensitive endpoints (200/302 to /admin, /login, or API keys), if you observe file writes under the webroot or suspicious child processes, or if multiple unrelated clients show the same pattern across regions.
 48- Recover service by redeploying known-good builds, re-enabling health checks, running smoke tests against top routes, and restoring normal WAF/CDN policies while keeping a temporary blocklist for the offending IPs.
 49- Harden long term by tuning WAF/CDN to auto-throttle bursty 404/403/500 patterns, disabling TRACE/OPTIONS where unused, minimizing verbose error pages, and ensuring logs capture the true client IP via X-Forwarded-For or True-Client-IP headers.
 50"""
 51risk_score = 21
 52rule_id = "6631a759-4559-4c33-a392-13f146c8bcc4"
 53severity = "low"
 54tags = [
 55    "Domain: Web",
 56    "Use Case: Threat Detection",
 57    "Tactic: Reconnaissance",
 58    "Data Source: Nginx",
 59    "Data Source: Apache",
 60    "Data Source: Apache Tomcat",
 61    "Data Source: IIS",
 62    "Resources: Investigation Guide",
 63]
 64timestamp_override = "event.ingested"
 65type = "esql"
 66query = '''
 67from logs-nginx.error-*, logs-apache_tomcat.error-*, logs-apache.error-*, logs-iis.error-*
 68| keep
 69    @timestamp,
 70    event.type,
 71    event.dataset,
 72    source.ip,
 73    agent.id,
 74    host.name
 75| where source.ip is not null
 76| stats
 77    Esql.event_count = count(),
 78    Esql.host_name_values = values(host.name),
 79    Esql.agent_id_values = values(agent.id),
 80    Esql.event_dataset_values = values(event.dataset)
 81    by source.ip, agent.id
 82| where
 83    Esql.event_count > 50
 84'''
 85
 86[[rule.threat]]
 87framework = "MITRE ATT&CK"
 88
 89[[rule.threat.technique]]
 90id = "T1595"
 91name = "Active Scanning"
 92reference = "https://attack.mitre.org/techniques/T1595/"
 93
 94[[rule.threat.technique.subtechnique]]
 95id = "T1595.002"
 96name = "Vulnerability Scanning"
 97reference = "https://attack.mitre.org/techniques/T1595/002/"
 98
 99[[rule.threat.technique.subtechnique]]
100id = "T1595.003"
101name = "Wordlist Scanning"
102reference = "https://attack.mitre.org/techniques/T1595/003/"
103
104[rule.threat.tactic]
105id = "TA0043"
106name = "Reconnaissance"
107reference = "https://attack.mitre.org/tactics/TA0043/"

Triage and analysis

Disclaimer: This investigation guide was created using generative AI technology and has been reviewed to improve its accuracy and relevance. While every effort has been made to ensure its quality, we recommend validating the content and adapting it to suit your specific environment and operational needs.

Investigating Potential Spike in Web Server Error Logs

This detection flags spikes of web server error responses across HTTP/TLS and common server platforms, signaling active scanning or fuzzing that can expose misconfigurations or exploitable paths. A typical pattern is an automated scanner sweeping endpoints like /admin/, /debug/, /.env, /.git, and backup archives while mutating query parameters, producing repeated 404/403 and occasional 500 responses across multiple applications within minutes.

Possible investigation steps

  • Pivot on the noisy client IP(s) to build a minute-by-minute timeline across affected hosts showing request rate, status codes, methods, and top paths to distinguish automated scanning from a localized application failure.
  • Enrich the client with ASN, geolocation, hosting/Tor/proxy reputation, historical sightings, and maintenance windows to quickly decide if it matches a known external scanner or an internal scheduled test.
  • Aggregate the most requested URIs and verbs and look for telltale patterns such as /.env, /.git, backup archives, admin consoles, or unusual verbs like PROPFIND/TRACE, then correlate any 5xx bursts with application and server error logs and recent deploys or config changes.
  • Hunt for follow-on success from the same client by checking for subsequent 200/302s to sensitive paths, authentication events and session creation, or evidence of file writes and suspicious child processes on the web hosts.
  • If traffic traverses a CDN/WAF/load balancer, pivot to those logs to recover true client IPs, review rule matches and throttling, and determine whether similar patterns occurred across multiple edges or regions.

False positive analysis

  • Internal QA or integration tests that systematically crawl application routes after a deployment can generate bursts of 404/403 and occasional 500s from a single client IP, closely resembling active scanning.
  • A transient backend outage or misconfiguration (broken asset paths or auth flows) can cause legitimate traffic to return many errors aggregated under a shared egress IP (NAT), pushing per-IP counts above the threshold without adversary activity.

Response and remediation

  • Immediately block or throttle the noisy client IPs at the WAF/CDN and load balancer by enabling per-IP rate limits and signatures for scanner patterns such as repeated hits to /.env, /.git, /admin, backup archives, or unusual verbs like PROPFIND/TRACE.
  • If errors include concentrated 5xx responses from one web host, drain that node from service behind the load balancer, capture its web and application error logs, and roll back the most recent deploy or config change until error rates normalize.
  • Remove risky exposures uncovered by the scan by denying access to environment files and VCS directories (.env, .git), disabling directory listing, locking down admin consoles, and rejecting unsupported HTTP methods at the web server.
  • Escalate to Incident Response if the same client shifts from errors to successful access on sensitive endpoints (200/302 to /admin, /login, or API keys), if you observe file writes under the webroot or suspicious child processes, or if multiple unrelated clients show the same pattern across regions.
  • Recover service by redeploying known-good builds, re-enabling health checks, running smoke tests against top routes, and restoring normal WAF/CDN policies while keeping a temporary blocklist for the offending IPs.
  • Harden long term by tuning WAF/CDN to auto-throttle bursty 404/403/500 patterns, disabling TRACE/OPTIONS where unused, minimizing verbose error pages, and ensuring logs capture the true client IP via X-Forwarded-For or True-Client-IP headers.

Related rules

to-top