Unusual High Word Policy Blocks Detected

Detects repeated compliance violation 'BLOCKED' actions coupled with specific policy name such as 'word_policy', indicating persistent misuse or attempts to probe the model's denied topics.

Elastic rule (View on GitHub)

 1[metadata]
 2creation_date = "2024/11/20"
 3maturity = "production"
 4updated_date = "2024/11/20"
 5min_stack_comments = "ES|QL rule type is still in technical preview as of 8.13, however this rule was tested successfully; integration in tech preview"
 6min_stack_version = "8.13.0"
 7
 8[rule]
 9author = ["Elastic"]
10description = """
11Detects repeated compliance violation 'BLOCKED' actions coupled with specific policy name such as 'word_policy',
12indicating persistent misuse or attempts to probe the model's denied topics.
13"""
14false_positives = ["New model deployments.", "Testing updates to compliance policies."]
15from = "now-60m"
16interval = "10m"
17language = "esql"
18license = "Elastic License v2"
19name = "Unusual High Word Policy Blocks Detected"
20references = [
21    "https://docs.aws.amazon.com/bedrock/latest/userguide/guardrails-components.html",
22    "https://atlas.mitre.org/techniques/AML.T0051",
23    "https://atlas.mitre.org/techniques/AML.T0054",
24    "https://www.elastic.co/security-labs/elastic-advances-llm-security"
25]
26risk_score = 47
27rule_id = "3216949c-9300-4c53-b57a-221e364c6457"
28note = """## Triage and analysis
29
30### Investigating Amazon Bedrock Guardrail High Word Policy Blocks.
31
32Amazon Bedrock Guardrail is a set of features within Amazon Bedrock designed to help businesses apply robust safety and privacy controls to their generative AI applications.
33
34It enables users to set guidelines and filters that manage content quality, relevancy, and adherence to responsible AI practices.
35
36Through Guardrail, organizations can define "word filters" to prevent the model from generating content on profanity, undesired subjects,
37and they can establish thresholds for harmful content categories, including hate speech, violence, or offensive language.
38
39#### Possible investigation steps
40
41- Identify the user account whose prompts contained profanity and whether it should perform this kind of action.
42- Investigate other alerts associated with the user account during the past 48 hours.
43- Consider the time of day. If the user is a human (not a program or script), did the activity take place during a normal time of day?
44- Examine the account's prompts and responses in the last 24 hours.
45- If you suspect the account has been compromised, scope potentially compromised assets by tracking Amazon Bedrock model access, prompts generated, and responses to the prompts by the account in the last 24 hours.
46
47### False positive analysis
48
49- Verify the user account that queried denied topics, is not testing any new model deployments or updated compliance policies in Amazon Bedrock guardrails.
50
51### Response and remediation
52
53- Initiate the incident response process based on the outcome of the triage.
54- Disable or limit the account during the investigation and response.
55- Identify the possible impact of the incident and prioritize accordingly; the following actions can help you gain context:
56    - Identify the account role in the cloud environment.
57    - Identify if the attacker is moving laterally and compromising other Amazon Bedrock Services.
58    - Identify any regulatory or legal ramifications related to this activity.
59- Review the permissions assigned to the implicated user group or role behind these requests to ensure they are authorized and expected to access bedrock and ensure that the least privilege principle is being followed.
60- Determine the initial vector abused by the attacker and take action to prevent reinfection via the same vector.
61- Using the incident response data, update logging and audit policies to improve the mean time to detect (MTTD) and the mean time to respond (MTTR).
62"""
63setup = """## Setup
64
65This rule requires that guardrails are configured in AWS Bedrock. For more information, see the AWS Bedrock documentation:
66
67https://docs.aws.amazon.com/bedrock/latest/userguide/guardrails-create.html
68"""
69severity = "medium"
70tags = [
71    "Domain: LLM",
72    "Data Source: AWS Bedrock",
73    "Data Source: AWS S3",
74    "Use Case: Policy Violation",
75    "Mitre Atlas: T0051",
76    "Mitre Atlas: T0054",
77]
78timestamp_override = "event.ingested"
79type = "esql"
80
81query = '''
82from logs-aws_bedrock.invocation-*
83| MV_EXPAND gen_ai.policy.name 
84| where gen_ai.policy.action == "BLOCKED" and gen_ai.compliance.violation_detected == "true" and gen_ai.policy.name == "word_policy"
85| keep user.id
86| stats profanity_words= count() by user.id
87| where profanity_words > 5
88| sort profanity_words desc
89'''

Triage and analysis

Investigating Amazon Bedrock Guardrail High Word Policy Blocks.

Amazon Bedrock Guardrail is a set of features within Amazon Bedrock designed to help businesses apply robust safety and privacy controls to their generative AI applications.

It enables users to set guidelines and filters that manage content quality, relevancy, and adherence to responsible AI practices.

Through Guardrail, organizations can define "word filters" to prevent the model from generating content on profanity, undesired subjects, and they can establish thresholds for harmful content categories, including hate speech, violence, or offensive language.

Possible investigation steps

  • Identify the user account whose prompts contained profanity and whether it should perform this kind of action.
  • Investigate other alerts associated with the user account during the past 48 hours.
  • Consider the time of day. If the user is a human (not a program or script), did the activity take place during a normal time of day?
  • Examine the account's prompts and responses in the last 24 hours.
  • If you suspect the account has been compromised, scope potentially compromised assets by tracking Amazon Bedrock model access, prompts generated, and responses to the prompts by the account in the last 24 hours.

False positive analysis

  • Verify the user account that queried denied topics, is not testing any new model deployments or updated compliance policies in Amazon Bedrock guardrails.

Response and remediation

  • Initiate the incident response process based on the outcome of the triage.
  • Disable or limit the account during the investigation and response.
  • Identify the possible impact of the incident and prioritize accordingly; the following actions can help you gain context:
    • Identify the account role in the cloud environment.
    • Identify if the attacker is moving laterally and compromising other Amazon Bedrock Services.
    • Identify any regulatory or legal ramifications related to this activity.
  • Review the permissions assigned to the implicated user group or role behind these requests to ensure they are authorized and expected to access bedrock and ensure that the least privilege principle is being followed.
  • Determine the initial vector abused by the attacker and take action to prevent reinfection via the same vector.
  • Using the incident response data, update logging and audit policies to improve the mean time to detect (MTTD) and the mean time to respond (MTTR).

References

Related rules

to-top