You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Current email security tools analyze emails through metadata, headers, and text pattern matching (KQL queries, regex rules, IOC lookups). This misses a critical attack vector: emails that are visually deceptive — pixel-perfect brand impersonation, fake login forms, urgency cues, and social engineering layouts that fool humans at the visual level.
SOC analysts triage emails by looking at them. No existing Security Copilot plugin replicates this visual analysis.
Additionally, when suspicious URLs are found in emails, analysts must manually visit them in sandboxed browsers to determine their purpose (credential harvesting pages, redirectors, download pages). This is time-consuming and risky.
Proposed Solution: Two Capabilities
1. Multimodal QuickLook — Visual Email Triage
Concept: Render the email as a screenshot (safe, no remote content loaded), then send it to a multimodal LLM (GPT-4o, Gemini, etc.) alongside email metadata for visual analysis.
Content signals ("credential request via link", "threat of account compromise")
Recommended next step (dismiss, monitor, inspect URLs, detonate attachments)
Key innovation: The LLM analyzes the email as rendered — exactly how a human would see it. This catches visual social engineering that text-only analysis completely misses.
Safety: The email is rendered in a headless browser with all remote content blocked (images, fonts, tracking pixels). The LLM receives a static screenshot — no execution, no network calls from the email content.
2. QuickBrowse — Bounded Automated URL Inspection
Concept: When QuickLook flags suspicious URLs, automatically dispatch a headless browser to visit them with strict safety boundaries, then use the same multimodal LLM to analyze what the browser finds.
How it works:
Open URL in a headless Chromium browser (isolated context, no persistent state)
LLM planner decides next action based on page state + screenshot: extract forms, follow redirects, stop
Capture page screenshots, form structures, redirect chains, domains contacted
Feature Request: Multimodal Vision-Based Email Phishing Triage + Bounded Browser Inspection
The Problem
Current email security tools analyze emails through metadata, headers, and text pattern matching (KQL queries, regex rules, IOC lookups). This misses a critical attack vector: emails that are visually deceptive — pixel-perfect brand impersonation, fake login forms, urgency cues, and social engineering layouts that fool humans at the visual level.
SOC analysts triage emails by looking at them. No existing Security Copilot plugin replicates this visual analysis.
Additionally, when suspicious URLs are found in emails, analysts must manually visit them in sandboxed browsers to determine their purpose (credential harvesting pages, redirectors, download pages). This is time-consuming and risky.
Proposed Solution: Two Capabilities
1. Multimodal QuickLook — Visual Email Triage
Concept: Render the email as a screenshot (safe, no remote content loaded), then send it to a multimodal LLM (GPT-4o, Gemini, etc.) alongside email metadata for visual analysis.
What the LLM sees (just like an analyst would):
What it returns:
Key innovation: The LLM analyzes the email as rendered — exactly how a human would see it. This catches visual social engineering that text-only analysis completely misses.
Safety: The email is rendered in a headless browser with all remote content blocked (images, fonts, tracking pixels). The LLM receives a static screenshot — no execution, no network calls from the email content.
2. QuickBrowse — Bounded Automated URL Inspection
Concept: When QuickLook flags suspicious URLs, automatically dispatch a headless browser to visit them with strict safety boundaries, then use the same multimodal LLM to analyze what the browser finds.
How it works:
Safety boundaries:
What it produces:
Architecture Overview
Why This Matters
Proof of Concept
We have built and tested this approach in an open-source project. Results on real phishing samples:
The multimodal LLM correctly identified visual phishing indicators that would be invisible to header/text-only analysis.
Integration with Security Copilot
This could integrate as:
The approach is model-agnostic — it works with any multimodal LLM that supports vision (GPT-4o, Gemini, Claude, open-weight models).