In fact, over the past few months the cybersecurity industry has sounded the alarm on a flood of bad submissions. They call this problem AI slop in bug bounty programs. Large Language Models (LLMs) with some level of automation can now produce reports that falsely assert that they have discovered a vulnerability where none exists. This poses major problems for security platforms such as HackerOne and Bugcrowd.
HackerOne, the world’s largest platform for bug bounty programs, has experienced this tidal wave of garbage submissions. Such a glaringly poignant point was made by Michiel Prins, co-founder and senior director of product management at HackerOne. He cautioned that AI-generated findings can severely erode the effectiveness of security programs. The industry has seen an uptick in false positives—vulnerabilities that appear legitimate but are fabricated by LLMs without real-world impact.
Vlad Ionescu, co-founder and CTO of RunSybil, stated that the wave of reporting can be intimidating. His new startup, FireHydrant, is focused on AI-powered solutions specifically to make bug hunting easier. If you prompt it to generate a report, it’s going to generate a report. He described how individuals will screenshot these and paste them into the bug bounty platforms. This will inundate the platforms directly. The tsunami of AI-generated content makes the challenge more difficult for security professionals. These security researchers have to wade through these submissions to get to the real vulnerabilities.
Findings generated by LLMs cost a few dollars, look just like expertly generated reports, and are hard to tell apart from real findings. Ionescu remarked on the deceptive nature of these submissions: “People are receiving reports that sound reasonable, they look technically correct. And then you find yourself having to really deep dive on them, looking into, ‘oh shoot, where’s this vulnerability hidden? But it turns out it was all a hallucination after all. All the technical stuff was entirely fictional, as the LLM simply invented it.
Security researchers are swimming in the deep end with AI slop. Their time is too thinly stretched as they are reviewing reports that usually return no actionable findings. Harry Sintonen, an independent security researcher, related his recent experience with a fake bug report he sent to the open-source security project Curl. As he said with confidence, “Curl can smell AI slop from a mile away.” Her sentiment perfectly captures the overall frustration from pause implementation professionals, as they try to navigate the pitfalls that false reports have created.
Bugcrowd has emerged as a dominant force in the bug bounty world. They purposefully leverage proven playbooks, rich workflows and ML reinforcement to dig deep on reports. Even after these improvements, the filtering process they use is still being criticized for its ineffectiveness. Whether this skepticism comes from the increasing volume of AI-generated submissions,
While these challenges persist, cybersecurity experts are starting to take steps to improve the quality of submissions. Ionescu urges anyone interested in the space to invest in AI-powered systems to filter these submissions by their accuracy and reliability. Taking such a proactive approach can go a long way toward mitigating the impact of AI slop on bug bounty programs.
The development has raised important conversations about the implications of AI for researchers’ own practices. Casey Ellis, founder of Bugcrowd, said it’s no surprise that AI is a key component in a significant portion of submissions. As he acknowledged, it has not led to a significant uptick in shoddy studies. He added that researchers are using AI tools in productive ways, such as identifying bugs and generating documentation. This line of thinking implies that technology is ethically neutral — it can be used for good or harm.
Mozilla’s experience further highlights this ongoing dilemma. The organization repeatedly closes a regular flow of false reports. On average, five or six submissions per month are flagged as invalid, under 10% of reports submitted each month. This data points to what we already know is a major concern within the industry – we cannot trust AI content to be reliable.
That’s why HackerOne has built Hai Triage to help fight the growing tide of AI slop. This groundbreaking new system expedites the triaging process while marrying human experience with state-of-the-art AI technology. This new initiative aims to address the issues of poor-quality applications head on. In addition, it protects the integrity of the bug bounty program itself.
The cybersecurity landscape is already complicated and changing, especially with the emergence of new AI tools. Researchers and social media platforms are already fighting the harmful disinformation produced by LLMs. The industry’s united response will determine their capacity to fight back against this growing trend. This response is an important step toward maintaining the quality and reliability of bug bounty submissions going forward.