Security Flaw Discovered in LangChain Core Raises Concerns

A serious security vulnerability has been found in LangChain Core, a core Python package essential to the LangChain ecosystem. This is a serious vulnerability, due to the fact that it can be remotely abused by bad actors to obtain sensitive information. This particular vulnerability to prompt injection recently became popularized due to the emergence of…

Tina Reynolds Avatar

By

Security Flaw Discovered in LangChain Core Raises Concerns

A serious security vulnerability has been found in LangChain Core, a core Python package essential to the LangChain ecosystem. This is a serious vulnerability, due to the fact that it can be remotely abused by bad actors to obtain sensitive information. This particular vulnerability to prompt injection recently became popularized due to the emergence of large language models (LLMs) like ChatGPT. Now, developers and organizations utilizing the framework are more than just concerned.

The crux of the problem lies in the “lc” marker, which denotes LangChain objects in the internal serialization format. When the deserialization process is run with the option “secrets_from_env=True,” it allows for deserialization exploitation. In response, LangChain has made this default to “False,” preventing automatic loading of secrets from the environment by default.

Details of the Vulnerability

The recently discovered flaw granted users to instantiate classes from non-whitelist trusted namespaces. This vulnerability allowed for arbitrary code execution via Jinja2 templates. Commonly abused Jinja2 templates are blocked by default to further minimize risk. There’s an escaping bug that lets attackers inject LangChain object structures via user-controlled fields.

Porat noted that an attacker can exploit this by getting a LangChain orchestration loop to serialize and then deserialize content with an ‘lc’ key, they essentially create an unsafe arbitrary object. This would leave the door wide open for arguably dozens of attacker-favorable paths. This paints a very clear picture of how bad actors might exploit the vulnerability to coordinate malicious attacks.

The preferred attack vector attempts to target these specific LLM response fields such as additional_kwargs or response_metadata. Hackers can control these fields by prompt injection and then serialize/deserialize them when operating on streams.

Steps Taken by LangChain

LangChain has acted quickly to fix this serious vulnerability by issuing a patch. This release furthers that cause by introducing newer, more restrictive defaults in its load() and loads() functions. The patch gives users more agency in their actions. They can control exactly which classes can be serialized and deserialized by providing an “allowed_objects” parameter. This change is intended to improve overall security and reduce the risk of future exploits.

“Organizations need to remain vigilant,” Porat stated. This is precisely the scenario of AI versus traditional security that organizations mindlessly find themselves defaulted on. LLM output is an untrusted input. While we can all agree with his warning about the need to maintain strong security and hacking prevention measures when developing with AI technologies.

Recommendations for Users

The vulnerability now affects a number of different npm packages, such as @langchain/core and langchain, as well as others. We urge all users to update to the patched version as soon as possible to be fully protected against potential attacks. By taking these proactive steps, businesses can better protect themselves, their customers, and limit the risk posed by this dangerous vulnerability.