288 lines
15 KiB
Python
288 lines
15 KiB
Python
task = """
|
|
You are a security evaluation tool for smart home automation rules.
|
|
Your task is to classify the rule into EXACTLY ONE category and (if applicable) ONE subcategory, and decide whether the issue is RULE_SPECIFIC or GENERIC.
|
|
|
|
You will receive:
|
|
- The automation rule (text).
|
|
- Definitions of 4 categories (with subcategories).
|
|
- Retrieved similar labeled examples (RAG context) with distance-based similarity levels.
|
|
|
|
Rules:
|
|
- Use ONLY the provided taxonomy labels (no new categories/subcategories).
|
|
- If retrieved examples are weak/unstable or the rule is ambiguous, set needs_human_review=true.
|
|
- Return ONLY a valid JSON object (no extra text).
|
|
"""
|
|
|
|
taxonomy = """
|
|
* CATEGORIES AND DEFINITIONS (use ONLY these labels) *
|
|
|
|
# CATEGORY 1: PERSONAL PRIVACY VIOLATION
|
|
Definition: This category includes automations that collect, process, or disseminate personal or sensitive data (own or third-party) in an unnecessary, unintentional, or potentially illicit manner.
|
|
Necessary rule conditions: the rule collects sensitive information (images, video, audio, location, identity).
|
|
Risk Conditions:
|
|
R1: The data is shared with external parties
|
|
- neighbors
|
|
- WhatsApp groups
|
|
- social media
|
|
- mailing lists
|
|
- public channels
|
|
R2: The data is made publicly accessible
|
|
- publicly visible
|
|
- public link
|
|
- open profile
|
|
R3: The collection or sharing is not necessary for the stated purpose or is not clearly understood
|
|
- no confirmation
|
|
- no review before publication
|
|
R4: The data collection may violate privacy regulations
|
|
- camera photographing passers-by
|
|
- audio recording of guests
|
|
Do not apply if:
|
|
- The data remains strictly internal and is not shared, exposed, or externally observable.
|
|
- The automation only updates internal system states without transmitting or publishing personal data.
|
|
- The data collection is clearly necessary for the rule's primary functionality and remains locally processed.
|
|
Illustrative examples (for guidance only):
|
|
- When the camera on my smart doorbell detects an unknown/suspicious person (e.g., someone that lingers in my property for over 20 seconds), then send a photograph of that person and a text message to my neighbors. S+
|
|
- If I take a new photo, then post it as a publicly accessible image on Instagram
|
|
|
|
# CATEGORY 2: PHYSICAL ENVIRONMENT THREAT
|
|
Definition: This category includes automations that can cause unauthorized access, reduced physical security, or property damage.
|
|
Sub-categories: 2.1 PROMOTE UNAUTHORIZED ACCESS, 2.2 Device and identity control. 2.3 , 2.4 absence status reporting
|
|
|
|
## SUB-CATEGORY 2.1: PROMOTE UNAUTHORIZED ACCESS
|
|
Definition: This category includes automations that can cause unauthorized access, reduced physical security, or property damage.
|
|
Necessary rule conditions:
|
|
- Actions on: windows / doors / locks
|
|
- Automatic activations based on: environmental conditions / unauthenticated events
|
|
Risk Conditions:
|
|
- The action reduces physical protection.
|
|
- There is no identity check.
|
|
- The event can be externally induced.
|
|
Does not apply if:
|
|
- There are already security measures such as checking the user's presence at home.
|
|
- The rule only modifies non-security-related elements (e.g., lights, temperature).
|
|
- The action is manually confirmed before execution.
|
|
Illustrative examples (for guidance only)::
|
|
- When the smart thermostat detects that the temperature rises above 25 degrees, then slightly open the window.
|
|
- If Indoor CO2 goes up, open the window.
|
|
|
|
|
|
## SUB-CATEGORY 2.2: Device and identity control (device-based access)
|
|
Definition: Automations that grant physical access based solely on the presence of a device, without considering theft, compromise, or old, unremoved devices.
|
|
Necessary rule conditions: Presence of Bluetooth / WiFi / geolocation used as the sole authentication criterion
|
|
Risk Conditions:
|
|
- Physical access is granted: without user verification and only based on the device
|
|
- The device can be: stolen / compromised / duplicated
|
|
- The device list is not periodically reviewed and updated
|
|
Do not apply if:
|
|
- The automation requires explicit manual confirmation before granting access.
|
|
- Additional authentication mechanisms are enforced (e.g., PIN, biometric verification, multi-factor authentication).
|
|
- The device presence is not the sole authentication factor.
|
|
- The rule does not grant physical access but only sends notifications or status updates.
|
|
Illustrative examples (for guidance only):
|
|
- IF an authorized Bluetooth device approaches the garage THEN Automatically unlocks the garage
|
|
- When my connected car moves into a 30m radius from my home, open the garage door and disarm the alarm.
|
|
- When a device is registered on the network and connects to your home WiFi, the alarm is automatically deactivated.
|
|
|
|
## SUB-CATEGORY 2.3: VOICE PROFILE CONTROLS
|
|
Definition: Automations that execute security-sensitive actions via voice commands without verifying authorized voice profiles or user identity.
|
|
Necessary rule conditions:
|
|
- The automation is triggered by a voice command.
|
|
- The command affects security-sensitive actions (e.g., unlocking, disarming, disabling protections).
|
|
Risk Conditions:
|
|
- The command can be executed by anyone
|
|
- There is no control over the user's identity
|
|
- No specific authorized voice profiles are stored
|
|
Do not apply if:
|
|
- The voice command triggers only non-security-sensitive actions (e.g., turning lights on/off).
|
|
- The automation can only be executed from inside the home after physical access has already been established.
|
|
- The system verifies authorized voice profiles before executing critical actions.
|
|
- Additional authentication mechanisms are required for security-sensitive operations.
|
|
Illustrative examples (for guidance only):
|
|
- IF the voice assistant recognizes the command "Disable alarm" THEN Disable the home security system
|
|
|
|
|
|
## SUB-CATEGORY 2.4: ABSENCE STATUS REPORTING
|
|
Definition: Automations that indirectly reveal whether a home is empty, increasing the risk of intrusions.
|
|
Necessary rule conditions:
|
|
- Actions that: turn lights on/off; modify Wi-Fi/alarms
|
|
- The actions are related to presence at home
|
|
Risk Conditions:
|
|
- The rule allows us to deduce whether the house is empty.
|
|
- The information is: observable from the outside or shared with third parties.
|
|
Do not apply if:
|
|
- The automation is not externally observable.
|
|
- The information is not shared outside the household.
|
|
- The behavior does not create a consistent and inferable absence pattern.
|
|
- The automation affects only internal states without visible external indicators.
|
|
Illustrative examples (for guidance only):
|
|
- IF someone is home, THEN turn the light
|
|
- If the last family member leaves home, then turn off lights
|
|
- IF I leave home, THEN turn off the WiFi
|
|
- If I'm the last person in the house and leave, send a notification to my smartwatch if any window in the house is open
|
|
|
|
|
|
# CATEGORY3: CYBERSECURITY HARM
|
|
Description: This category includes automations that introduce malware exposure, data stream manipulation, or network communication abuse.
|
|
Sub-categories: SUB-CATEGORY 3.1: MALICIOUS TRAFFIC GENERATION, SUB-CATEGORY 3.2: AUTOMATIC FILE SPREAD, SUB-CATEGORY 3.3: NETWORK COMMUNICATION THREATS
|
|
|
|
## SUB-CATEGORY 3.1: MALICIOUS TRAFFIC GENERATION
|
|
Definition: Automations that can be exploited to generate excessive traffic, false alarms, or denial of service
|
|
Rule conditions: The event is easily repeatable.
|
|
Risk conditions:
|
|
- The event can generate: excessive traffic / false alarms
|
|
- The event is manipulable
|
|
Do not apply if:
|
|
- The event cannot be externally triggered or manipulated.
|
|
- The notification is rate-limited or protected against repeated activation.
|
|
- The action does not significantly increase network load or system exposure.
|
|
Illustrative examples (for guidance only):
|
|
- If the smart camera detects someone approaching, it automatically sends a text message to my phone
|
|
|
|
## SUB-CATEGORY 3.2: AUTOMATIC FILE SPREAD
|
|
Definition: Automations that transfer files from external sources to trusted platforms, exposing the user to malware or phishing.
|
|
Rule conditions:
|
|
The automation involves automatic file download from:
|
|
- external URLs
|
|
- email attachments
|
|
- messaging platforms
|
|
- third-party APIs
|
|
The automation stores, uploads, forwards, or makes the file available within:
|
|
- trusted cloud storage
|
|
- local systems
|
|
- shared folders
|
|
- collaboration platforms
|
|
Risk conditions:
|
|
- There is no content check (e.g., attachments)
|
|
- The source of the file is not verified or may be user-controlled.
|
|
- The action increases the likelihood of malware/phishing
|
|
Do not apply if:
|
|
- The file source is verified, trusted, and controlled (e.g., official governmental APIs).
|
|
- The automation includes content validation or malware scanning before storage.
|
|
- The user manually confirms the download before execution.
|
|
- No file propagation to additional platforms occurs.
|
|
Illustrative examples (for guidance only):
|
|
- Add file from URL action from the Dropbox channel when the “Any new attachment in inbox ” trigger from the Gmail channel is activated
|
|
|
|
## SUB-CATEGORY 3.3: NETWORK COMMUNICATION THREATS
|
|
Definition: Automations that send notifications or data, potentially interceptable or manipulated.
|
|
Rule conditions:
|
|
- The automation sends data or notifications over: SMS, messaging platforms, email-
|
|
- The transmitted information relates to security-relevant events, such as absence of occupants, alarm status, door/window state.
|
|
Risk conditions:
|
|
- The communication channel is not encrypted or authenticated.
|
|
- Messages can be intercepted, spoofed, or altered in transit.
|
|
Do not apply if:
|
|
- The communication is encrypted and authenticated.
|
|
- The communication does not expose the system to interception or spoofing risks.
|
|
- The transmitted data does not expose occupancy, alarm status, or access control states.
|
|
Illustrative examples (for guidance only):
|
|
- If the smart camera detects someone approaching, it automatically sends a text message to my phone
|
|
|
|
# CATEGORY 4: HARMLESS
|
|
Definition: automations that do not present safety problems.
|
|
Conditions:
|
|
- The rule does not involve personal data
|
|
- The rule does not modify the physical environment
|
|
- The rule does not introduce risky network communications
|
|
- The rule already includes device/user/presence checks
|
|
Illustrative examples (for guidance only):
|
|
- If it rains tomorrow, then remind me to bring an umbrella
|
|
"""
|
|
|
|
problem_type_guide = """
|
|
* PROBLEM TYPE (choose exactly one) *:
|
|
|
|
# RULE_SPECIFIC (S): the automation directly leads to a potentially dangerous situation.
|
|
You can make it safer by adding conditions or actions in the rule itself
|
|
(e.g., verifying presence at home, identity check, confirmation step).
|
|
Example: “When temperature exceeds 26°C, open the living room window”
|
|
is a PHYSICAL ENVIRONMENT THREAT if it does NOT verify someone is at home.
|
|
|
|
# GENERIC (G): the automation is not inherently dangerous; risk depends on configuration
|
|
or contextual factors. The best mitigation is a user behavior recommendation rather
|
|
than changing the rule logic.
|
|
Example: “If the last family member leaves home, turn off the lights”
|
|
is not inherently risky, but may indirectly reveal the house is empty depending on context.
|
|
"""
|
|
|
|
gravity_guide = """
|
|
* GRAVITY / SEVERITY (choose exactly one) *:
|
|
|
|
# HIGH: direct and immediate security/privacy consequence.
|
|
Examples: automatically opening doors; public photos without consent; malware propagation.
|
|
|
|
# MEDIUM: indirect consequence or conditioned on other variables.
|
|
Examples: absence deducible from light patterns; opening door via Bluetooth/device proximity.
|
|
|
|
# LOW: minimal risk, marginal information leakage, or easily mitigable.
|
|
Examples: notifications that might hint the user is away only if intercepted;
|
|
downloads from relatively trusted sources with limited exposure.
|
|
|
|
# NONE: no security/privacy consequence (comfort rules).
|
|
Examples: lights/temperature/irrigation/morning routine.
|
|
"""
|
|
|
|
OUTPUT_SCHEMA = """
|
|
Return ONLY this JSON:
|
|
|
|
{
|
|
"automation": "string",
|
|
"category": "PERSONAL PRIVACY VIOLATION | PHYSICAL ENVIRONMENT THREAT | CYBERSECURITY HARM | HARMLESS",
|
|
"subcategory": "one of the defined subcategories for that category, or empty string",
|
|
"problem_type": "RULE_SPECIFIC | GENERIC | none",
|
|
"gravity": "LOW | MEDIUM | HIGH | NONE",
|
|
"scores": {
|
|
"PERSONAL PRIVACY VIOLATION": 0.0,
|
|
"PHYSICAL ENVIRONMENT THREAT": 0.0,
|
|
"CYBERSECURITY HARM": 0.0,
|
|
"HARMLESS": 0.0
|
|
},
|
|
"needs_human_review": true,
|
|
"short_rationale": "max 2 sentences"
|
|
}
|
|
"""
|
|
|
|
# trasformare in testo i risultati del retrieval (le 5 automazioni simili + distanza)
|
|
# il testo viene passato al LLM come esempio
|
|
def build_examples_text(retrieved_df, distance_band_fn, max_chars=600):
|
|
parts = []
|
|
for i, r in enumerate(retrieved_df.iterrows(), start=1):
|
|
_, r = r
|
|
d = float(r["distance"])
|
|
parts.append(
|
|
f"""Example {i}:
|
|
Automation: {str(r.get('automation',''))[:max_chars]}
|
|
Description: {str(r.get('description',''))[:200]}
|
|
Category: {r.get('category','')}
|
|
Subcategory: {r.get('subcategory','')}
|
|
Problem type: {r.get('problem_type','')}
|
|
Gravity: {r.get('gravity','')}
|
|
Distance: {d}
|
|
Similarity level: {distance_band_fn(d)}
|
|
"""
|
|
)
|
|
return "\n".join(parts)
|
|
|
|
# costruzione del prompt
|
|
def build_prompt_local(query_text, retrieved_df, distance_band_fn):
|
|
top1_dist = float(retrieved_df["distance"].iloc[0])
|
|
band = distance_band_fn(top1_dist)
|
|
examples_text = build_examples_text(retrieved_df, distance_band_fn)
|
|
|
|
return f"""{task}
|
|
|
|
{taxonomy}
|
|
{problem_type_guide}
|
|
{gravity_guide}
|
|
|
|
AUTOMATION TO LABEL:
|
|
{query_text}
|
|
|
|
TOP1_DISTANCE: {top1_dist}
|
|
SIMILARITY_BAND: {band}
|
|
|
|
RETRIEVED SIMILAR LABELED EXAMPLES (top-k):
|
|
{examples_text}
|
|
|
|
{OUTPUT_SCHEMA}
|
|
""" |