Unusual High Confidence Misconduct Blocks Detected

Detects repeated high-confidence 'BLOCKED' actions coupled with specific violation codes such as 'MISCONDUCT', indicating persistent misuse or attempts to probe the model's ethical boundaries.

Elastic rule (View on GitHub)

 1[metadata]
 2creation_date = "2024/05/05"
 3maturity = "production"
 4updated_date = "2024/05/05"
 5min_stack_comments = "ES|QL rule type is still in technical preview as of 8.13, however this rule was tested successfully; integration in tech preview"
 6min_stack_version = "8.13.0"
 7
 8[rule]
 9author = ["Elastic"]
10description = """
11Detects repeated high-confidence 'BLOCKED' actions coupled with specific violation codes such as 'MISCONDUCT',
12indicating persistent misuse or attempts to probe the model's ethical boundaries.
13"""
14false_positives = ["New model deployments.", "Testing updates to compliance policies."]
15from = "now-60m"
16interval = "10m"
17language = "esql"
18license = "Elastic License v2"
19name = "Unusual High Confidence Misconduct Blocks Detected"
20references = [
21    "https://docs.aws.amazon.com/bedrock/latest/userguide/guardrails-components.html",
22    "https://atlas.mitre.org/techniques/AML.T0051",
23    "https://atlas.mitre.org/techniques/AML.T0054",
24    "https://www.elastic.co/security-labs/elastic-advances-llm-security"
25]
26risk_score = 73
27rule_id = "4f855297-c8e0-4097-9d97-d653f7e471c4"
28setup = """## Setup
29
30This rule requires that guardrails are configured in AWS Bedrock. For more information, see the AWS Bedrock documentation:
31
32https://docs.aws.amazon.com/bedrock/latest/userguide/guardrails-create.html
33"""
34severity = "high"
35tags = [
36    "Domain: LLM",
37    "Data Source: AWS Bedrock",
38    "Data Source: AWS S3",
39    "Use Case: Policy Violation",
40    "Mitre Atlas: T0051",
41    "Mitre Atlas: T0054",
42]
43timestamp_override = "event.ingested"
44type = "esql"
45
46query = '''
47from logs-aws_bedrock.invocation-*
48| where gen_ai.policy.confidence == "HIGH" and gen_ai.policy.action == "BLOCKED" and gen_ai.compliance.violation_code == "MISCONDUCT"
49| stats high_confidence_blocks = count() by user.id
50| where high_confidence_blocks > 5
51| sort high_confidence_blocks desc
52'''

References

Related rules

to-top