Recently, Microsoft disclosed a critical Whisper Leak vulnerability that threatens all remote language models (LLMs) in a potentially catastrophic manner. This new, entirely passive, side-channel attack enables adversaries to observe encrypted network traffic. They can then use this data to make assumptions about what people are talking about, potentially violating user privacy. Our Microsoft Defender Security Research Team has partnered with global security leaders Jonathan Bar Or and Geoff McDonald. Collectively, they have done years of invaluable research into this concerning trend.
Whisper Leak exploits vulnerabilities that can be easily found in most popular LLMs. It highlights that despite encryption, adversaries may still be able to obtain sensitive information about user interactions under specific circumstances. The ramifications of such a vulnerability go beyond the federal government, threatening data integrity and user privacy across every sector.
Understanding Whisper Leak
Whisper Leak is an advanced side-channel attack that particularly targets remote language models. This vulnerability expands upon previous research. It demonstrates that most LLMs—specifically those from Alibaba, DeepSeek, Mistral, Microsoft, OpenAI and xAI—are shockingly vulnerable to such attacks. Indeed, the researchers found these models are still scoring more than 98% on multi-turn attacks. This research surfaces a systemic problem in our current model architectures.
The study paints a troubling picture. Adversaries with the capacity to surveil communications can easily ascertain the subject matter of chats in encrypted channels. Jonathan Bar Or and Geoff McDonald noted,
“Cyber attackers in a position to observe the encrypted traffic (for example, a nation-state actor at the internet service provider layer, someone on the local network, or someone connected to the same Wi-Fi router) could use this cyber attack to infer if the user’s prompt is on a specific topic.”
This one result highlights a serious and increasing risk of harmful misuse of LLMs in sensitive settings.
The Vulnerability of Open-Weight Models
We recently disclosed Whisper Leak, a new class of prompt injection attack, which demonstrated that eight open-weight LLMs were highly prone to this kind of attack. These include models from major corporations like Alibaba’s Qwen3-32B, Google’s Gemma 3-1B-IT, Meta’s Llama3.3-70B-Instruct and OpenAI’s GPT-OSS-20b. These models cannot connect responses appropriately across multi-turn interactions. This disconnect shines a light on an important opportunity to better align model design with real-world needs.
Furthermore, they posited that the alignment strategies and priorities of development play a huge role on the resilience, or lack thereof, of these models. Capability-first models like Llama 3.3 and Qwen 3 are easier to game. In comparison, more safety-focused designs such as Google’s Gemma 3 show a much more even performance.
“These results underscore a systemic inability of current open-weight models to maintain safety guardrails across extended interactions.”
In light of these shocking discoveries related to Whisper Leak, researchers have proposed some fruitful countermeasures. OpenAI, Microsoft, and Mistral collaborated to devise a strategy involving the addition of a “random sequence of text of variable length” to each response generated by their models. This strategy successfully hides how many tokens are needed for dialogue, thus making the side-channel attack impossible.
Proposed Countermeasures
For all of these advancements, the potential for exploitation and abuse is just too great. A representative from Microsoft cautioned,
The researchers conceded that the ability of cyberattackers to conduct successful attacks would exceed their output projections. They emphasized that advanced attack models and deeper intentions in multi-turn dialogues enable this.
“If a government agency or internet service provider were monitoring traffic to a popular AI chatbot, they could reliably identify users asking questions about specific sensitive topics – whether that’s money laundering, political dissent, or other monitored subjects – even though all the traffic is encrypted.”
The researchers also acknowledged that with more sophisticated attack models and richer patterns available in multi-turn conversations, cyberattackers may achieve higher success rates than initially predicted.

