Unusual High Confidence Content Filter Blocks Detected

Nov 11, 2025 · Domain: LLM Data Source: AWS Bedrock Data Source: AWS S3 Use Case: Policy Violation Mitre Atlas: T0051 Mitre Atlas: T0054 Resources: Investigation Guide ·

Share on:

Detects repeated high-confidence 'BLOCKED' actions coupled with specific 'Content Filter' policy violation having codes such as 'MISCONDUCT', 'HATE', 'SEXUAL', INSULTS', 'PROMPT_ATTACK', 'VIOLENCE' indicating persistent misuse or attempts to probe the model's ethical boundaries.

Elastic rule (View on GitHub)

  1[metadata]
  2creation_date = "2024/05/05"
  3integration = ["aws_bedrock"]
  4maturity = "production"
  5updated_date = "2025/11/10"
  6
  7[rule]
  8author = ["Elastic"]
  9description = """
 10Detects repeated high-confidence 'BLOCKED' actions coupled with specific 'Content Filter' policy violation having codes
 11such as 'MISCONDUCT', 'HATE', 'SEXUAL', INSULTS', 'PROMPT_ATTACK', 'VIOLENCE' indicating persistent misuse or attempts
 12to probe the model's ethical boundaries.
 13"""
 14false_positives = ["New model deployments.", "Testing updates to compliance policies."]
 15from = "now-60m"
 16interval = "10m"
 17language = "esql"
 18license = "Elastic License v2"
 19name = "Unusual High Confidence Content Filter Blocks Detected"
 20note = """## Triage and analysis
 21
 22### Investigating Unusual High Confidence Content Filter Blocks Detected
 23
 24Amazon Bedrock Guardrail is a set of features within Amazon Bedrock designed to help businesses apply robust safety and privacy controls to their generative AI applications.
 25
 26It enables users to set guidelines and filters that manage content quality, relevancy, and adherence to responsible AI practices.
 27
 28Through Guardrail, organizations can enable Content filter for Hate, Insults, Sexual Violence and Misconduct along with Prompt Attack filters prompts
 29to prevent the model from generating content on specific, undesired subjects, and they can establish thresholds for harmful content categories.
 30
 31#### Possible investigation steps
 32
 33- Identify the user account whose prompts caused high confidence content filter blocks and whether it should perform this kind of action.
 34- Investigate other alerts associated with the user account during the past 48 hours.
 35- Consider the time of day. If the user is a human (not a program or script), did the activity take place during a normal time of day?
 36- Examine the account's prompts and responses in the last 24 hours.
 37- If you suspect the account has been compromised, scope potentially compromised assets by tracking Amazon Bedrock model access, prompts generated, and responses to the prompts by the account in the last 24 hours.
 38
 39### False positive analysis
 40
 41- Verify the user account that queried denied topics, is not testing any new model deployments or updated compliance policies in Amazon Bedrock guardrails.
 42
 43### Response and remediation
 44
 45- Initiate the incident response process based on the outcome of the triage.
 46- Disable or limit the account during the investigation and response.
 47- Identify the possible impact of the incident and prioritize accordingly; the following actions can help you gain context:
 48    - Identify the account role in the cloud environment.
 49    - Identify if the attacker is moving laterally and compromising other Amazon Bedrock Services.
 50    - Identify any regulatory or legal ramifications related to this activity.
 51- Review the permissions assigned to the implicated user group or role behind these requests to ensure they are authorized and expected to access bedrock and ensure that the least privilege principle is being followed.
 52- Determine the initial vector abused by the attacker and take action to prevent reinfection via the same vector.
 53- Using the incident response data, update logging and audit policies to improve the mean time to detect (MTTD) and the mean time to respond (MTTR).
 54"""
 55references = [
 56    "https://docs.aws.amazon.com/bedrock/latest/userguide/guardrails-components.html",
 57    "https://atlas.mitre.org/techniques/AML.T0051",
 58    "https://atlas.mitre.org/techniques/AML.T0054",
 59    "https://www.elastic.co/security-labs/elastic-advances-llm-security",
 60]
 61risk_score = 47
 62rule_id = "4f855297-c8e0-4097-9d97-d653f7e471c4"
 63setup = """## Setup
 64
 65This rule requires that guardrails are configured in AWS Bedrock. For more information, see the AWS Bedrock documentation:
 66
 67https://docs.aws.amazon.com/bedrock/latest/userguide/guardrails-create.html
 68"""
 69severity = "medium"
 70tags = [
 71    "Domain: LLM",
 72    "Data Source: AWS Bedrock",
 73    "Data Source: AWS S3",
 74    "Use Case: Policy Violation",
 75    "Mitre Atlas: T0051",
 76    "Mitre Atlas: T0054",
 77    "Resources: Investigation Guide",
 78]
 79timestamp_override = "event.ingested"
 80type = "esql"
 81
 82query = '''
 83from logs-aws_bedrock.invocation-*
 84
 85// Expand multi-value fields
 86| mv_expand gen_ai.compliance.violation_code
 87| mv_expand gen_ai.policy.confidence
 88| mv_expand gen_ai.policy.name
 89| mv_expand gen_ai.policy.action
 90
 91// Filter for high-confidence content policy blocks with targeted violations
 92| where
 93  gen_ai.policy.action == "BLOCKED"
 94  and gen_ai.policy.name == "content_policy"
 95  and gen_ai.policy.confidence like "HIGH"
 96  and gen_ai.compliance.violation_code in ("HATE", "MISCONDUCT", "SEXUAL", "INSULTS", "PROMPT_ATTACK", "VIOLENCE")
 97
 98// keep ECS + compliance fields
 99| keep
100  user.id,
101  gen_ai.compliance.violation_code
102
103// count blocked violations per user per violation type
104| stats
105    Esql.ml_policy_blocked_violation_count = count()
106  by
107    user.id,
108    gen_ai.compliance.violation_code
109
110// Aggregate all violation types per user
111| stats
112    Esql.ml_policy_blocked_violation_total_count = sum(Esql.ml_policy_blocked_violation_count)
113  by
114    user.id
115
116// Filter for users with more than 5 total violations
117| where Esql.ml_policy_blocked_violation_total_count > 5
118
119// sort by violation volume
120| sort Esql.ml_policy_blocked_violation_total_count desc
121'''

Triage and analysis

Investigating Unusual High Confidence Content Filter Blocks Detected

Amazon Bedrock Guardrail is a set of features within Amazon Bedrock designed to help businesses apply robust safety and privacy controls to their generative AI applications.

It enables users to set guidelines and filters that manage content quality, relevancy, and adherence to responsible AI practices.

Through Guardrail, organizations can enable Content filter for Hate, Insults, Sexual Violence and Misconduct along with Prompt Attack filters prompts to prevent the model from generating content on specific, undesired subjects, and they can establish thresholds for harmful content categories.

Possible investigation steps

Identify the user account whose prompts caused high confidence content filter blocks and whether it should perform this kind of action.
Investigate other alerts associated with the user account during the past 48 hours.
Consider the time of day. If the user is a human (not a program or script), did the activity take place during a normal time of day?
Examine the account's prompts and responses in the last 24 hours.
If you suspect the account has been compromised, scope potentially compromised assets by tracking Amazon Bedrock model access, prompts generated, and responses to the prompts by the account in the last 24 hours.

False positive analysis

Verify the user account that queried denied topics, is not testing any new model deployments or updated compliance policies in Amazon Bedrock guardrails.

Response and remediation

Initiate the incident response process based on the outcome of the triage.
Disable or limit the account during the investigation and response.
Identify the possible impact of the incident and prioritize accordingly; the following actions can help you gain context:
- Identify the account role in the cloud environment.
- Identify if the attacker is moving laterally and compromising other Amazon Bedrock Services.
- Identify any regulatory or legal ramifications related to this activity.
Review the permissions assigned to the implicated user group or role behind these requests to ensure they are authorized and expected to access bedrock and ensure that the least privilege principle is being followed.
Determine the initial vector abused by the attacker and take action to prevent reinfection via the same vector.
Using the incident response data, update logging and audit policies to improve the mean time to detect (MTTD) and the mean time to respond (MTTR).

References

Create your guardrail - Amazon Bedrock

Learn about the different filters and blockers of a guardrail in Amazon Bedrock

Read More
https://atlas.mitre.org/techniques/AML.T0051

Read More
https://atlas.mitre.org/techniques/AML.T0054

Read More
Elastic Advances LLM Security with Standardized Fields and Integrations — Elastic Security Labs

Discover Elastic’s latest advancements in LLM security, focusing on standardized field integrations and enhanced detection capabilities. Learn how adopting these standards can safeguard your systems.

Read More

Unusual High Confidence Content Filter Blocks Detected

Elastic rule (View on GitHub)

Triage and analysis

Investigating Unusual High Confidence Content Filter Blocks Detected

Possible investigation steps

False positive analysis

Response and remediation

References

Create your guardrail - Amazon Bedrock

https://atlas.mitre.org/techniques/AML.T0051

https://atlas.mitre.org/techniques/AML.T0054

Elastic Advances LLM Security with Standardized Fields and Integrations — Elastic Security Labs

Related rules