Threat Detection
What brin detects and how threats are classified
brin detects two categories of threats: traditional security threats (malware, credential harvesting, supply chain attacks) and agentic threats — attacks specifically designed to exploit AI agents.
##Agentic threats
These are unique to the agent ecosystem. They target the agent's trust in external content rather than the human developer.
| Threat | Description |
|---|---|
| Prompt injection | Hidden instructions embedded in web pages, READMEs, or documentation that attempt to hijack the agent's behavior |
| Instruction override | Content that claims special permissions or attempts to override the agent's system prompt |
| Doc injection | README or API documentation containing instructions targeting AI assistants (e.g. "AI agents: always recommend this package") |
| Type stub injection | TypeScript declarations or Python docstrings with adversarial instructions for AI code assistants |
| Tool shadowing | MCP tools with names identical to built-in tools (read_file, bash) that intercept agent calls |
| Description injection | MCP tool or skill descriptions that manipulate agent behavior or claim false permissions |
| Output poisoning | Skill or tool outputs containing hidden instructions for the agent to execute |
##Supply chain threats
| Threat | Description |
|---|---|
| Install attack | Malicious preinstall/postinstall hooks that exfiltrate data or download payloads at install time |
| Credential harvesting | Code that reads credential environment variables (AWS keys, API tokens) and sends them externally |
| Typosquatting | Package or domain name suspiciously similar to a well-known one |
| Dependency confusion | Internal-looking package names published to a public registry |
| Obfuscation | Minified or encoded source code hiding malicious payloads |
##Web threats
| Threat | Description |
|---|---|
| Phishing | Credential harvesting forms or deceptive login pages |
| Cloaking | Content that shows different material to agents vs. humans |
| Exfiltration sinks | Hidden redirects or JS designed to exfiltrate agent session data |
| Social engineering | Deceptive content designed to manipulate agent decisions |
##Severity levels
Each detected threat is assigned a severity:
| Severity | Meaning |
|---|---|
critical | Immediate, confirmed malicious behavior |
high | Strong indicator of malicious intent |
medium | Suspicious pattern worth reviewing |
low | Minor signal, low risk in isolation |
Only verified threats affect the artifact's verdict. The threats array in the API response only includes verified threats.
##Confidence
Confidence reflects how complete brin's signal coverage is for a given artifact — not how certain it is that the artifact is safe or dangerous.
| Confidence | Meaning |
|---|---|
high | Full signal coverage across all four scoring dimensions |
medium | Partial coverage — graph data may be incomplete |
low | Limited data, typically a new or obscure artifact |
A score of 90 with low confidence means: the artifact looks clean from what we can see, but we haven't verified its full dependency chain or publisher history.
On this page