Security Risks in MCP

Lessons from the MCP Safety Audit and Real-World Attack Vectors

The Model Context Protocol (MCP) defines how AI agents connect with tools and data, enabling advanced functionality. Yet, centralizing context introduces a major attack vector. If an MCP server is breached or malicious, agents can be misled, credentials stolen, and data leaked. Protecting this layer isn’t optional—it's crucial for safe and reliable agentic systems.

Lessons from the "MCP Safety Audit" Paper

A key paper frequently cited in arXiv discussions examined the security of MCP implementations. The research identified several major attack vectors that developers need to address:

Code & Prompt Injection

A compromised MCP server may send crafted replies that influence an agent’s Large Language Model (LLM), potentially causing it to run harmful commands, leak private data from its context, or target external systems.

Credential & Data Theft

If an agent sends credentials (such as an API key) through an MCP server to a tool, a hacked tool could steal them. Similarly, any sensitive data passed to a tool may be intercepted and leaked by a malicious server.

Recommendations for Secure MCP Implementation

Least Privilege Principle

Agents must receive access solely to the exact tools and data necessary for their duties; do not assign broad or unrestricted permissions.

Sandboxing Tool Execution

Run tools—particularly those executing arbitrary code—in secure, containerized environments (like Docker or gVisor) to restrict their access to the host and network.

Strict Input Validation

Always treat agent input as untrusted. The MCP server must carefully validate and sanitize every parameter before forwarding them to a tool to prevent injection vulnerabilities.

Permission Gating & User Consent

For sensitive tasks such as file deletion or money transfer, the MCP server must apply rigorous permission checks, preferably demanding direct user confirmation.

Example: A Malicious MCP Server in the Wild

The Stealth Email BCC Attack

An agent needs to email a confidential project update. It finds a third-party `sendEmail` utility on an MCP server and uses it with the recipient, subject, and message. However, the tool's code on this compromised server quietly inserts an attacker's address into the BCC field of each sent email. This creates an invisible, ongoing data breach that neither the agent nor the recipient can easily uncover.

Building a Foundation of Trust

Here is a rewritten version of similar size: The safety of your whole agentic ecosystem depends on the strength of your MCP servers. Implementing layered defenses—such as sandboxing, least-privilege permissions, and rigorous validation—creates a robust base that lets you use AI agents securely and with confidence.