Critical Vulnerabilities Discovered in Major AI Inference Frameworks

Researchers from Oligo Security have found critical vulnerabilities. These issues plague most AI inference frameworks, like NVIDIA TensorRT-LLM, Microsoft Sarathi-Serve, Modular Max Server, vLLM and SGLang. In fact, these are considered the top security threat. They would enable attackers to run arbitrary code, escalate privileges, or even exfiltrate models. These results highlight the critical importance…

Tina Reynolds Avatar

By

Critical Vulnerabilities Discovered in Major AI Inference Frameworks

Researchers from Oligo Security have found critical vulnerabilities. These issues plague most AI inference frameworks, like NVIDIA TensorRT-LLM, Microsoft Sarathi-Serve, Modular Max Server, vLLM and SGLang. In fact, these are considered the top security threat. They would enable attackers to run arbitrary code, escalate privileges, or even exfiltrate models. These results highlight the critical importance for developers to address hazardous coding practices. Taking these steps will safeguard the integrity of their systems.

The largest identified vulnerability is CVE-2025-30165, which vLLM is affected. It has received a Common Vulnerability Scoring System (CVSS) score of 8.0, underlining its criticality. The second most critical issue, CVE-2025-23254, affects NVIDIA TensorRT-LLM and has a CVSS score of 8.8. Fortunately, this latter vulnerability has been patched in the newest version 0.18.2 of NVIDIA TensorRT-LLM. Other frameworks, like Sarathi-Serve, are still unpatched, putting users at risk of attack.

Details on Vulnerabilities and Exploitation Risks

These vulnerabilities are a direct result of code reuse through libraries and components on different projects. Although Lumelsky, the researcher who first identified these faults, lost his appeal, he wants his case to underscore the dangers associated with borrowing architectural elements without rigorous consideration.

“All contained nearly identical unsafe patterns: pickle deserialization over unauthenticated ZMQ TCP sockets.” – Avi Lumelsky

Successful exploitation of these vulnerabilities would enable attackers to execute arbitrary code on vulnerable clusters. This can result in a new set of malicious behavior such as privilege escalation or model theft. Moreover, attackers could use malicious payloads like cryptocurrency miners for direct financial profit.

The software development community moves at incredible speed. This super fast pace often encourages developers to copy and paste code from their colleagues which begs bigger questions. Lumelsky noted the implications of this trend:

“Projects are moving at incredible speed, and it’s common to borrow architectural components from peers.” – Avi Lumelsky

He cautioned that “when code reuse includes unsafe patterns, the consequences ripple outward fast.”

Affected Frameworks and Current Status

These vulnerabilities are not limited to a single inference framework. Their reach extends over several key platforms essential for creating and applying AI. Beyond the above frameworks of concern, Modular Max Server also had a vulnerability (CVE-2025-60455) that has been patched. In the interim, SGLang has rolled out half-measures that still expose users to attacks.

In many cases it was this direct copy-pasting of code that led to the creation of these vulnerabilities. Developers need to understand that code, even safe-looking code, can contain threats if not properly screened.

Moving beyond simply making it possible for exploitation, these vulnerabilities … From there, an attacker could take over the whole integrated development environment (IDE). Knostic highlighted the risks posed by JavaScript running inside Node.js:

“JavaScript running inside the Node.js interpreter, whether introduced by an extension, an MCP server, or a poisoned prompt or rule, immediately inherits the IDE’s privileges: full file-system access, the ability to modify or replace IDE functions (including installed extensions), and the ability to persist code that reattaches after a restart.” – Knostic

Once an attacker achieves interpreter-level execution, they can manipulate the IDE into becoming a platform for malware distribution and exfiltration.