Declining Performance of AI Coding Assistants Sparks Concern Among Developers

AI coding assistants are radically reshaping the world of software development. Even if they don’t create content, many developers use these tools to quickly generate code. Recent tests reveal a troubling trend: the performance of these AI tools, particularly ChatGPT models including GPT-4 and GPT-5, appears to be declining. This alarming practice has caused a…

Tina Reynolds Avatar

By

Declining Performance of AI Coding Assistants Sparks Concern Among Developers

AI coding assistants are radically reshaping the world of software development. Even if they don’t create content, many developers use these tools to quickly generate code. Recent tests reveal a troubling trend: the performance of these AI tools, particularly ChatGPT models including GPT-4 and GPT-5, appears to be declining. This alarming practice has caused a stir among developers and technologists, leading to legal challenges and searches for their effectiveness in the wild.

The CEO of Carrington Labs has gone on record touting ChatGPT. They praised its extraordinary code generation abilities and usefulness in a myriad of programming duties. The prior iteration, GPT-4, reliably produced useful responses across several attempts. By contrast, the newest version, GPT-5, has been inconsistent enough to call its reliability into question. Now a change in underlying performance dynamics has users asking what the future holds for Ivy League-standard AI coding assistants. Instead, they are getting mixed results, deepening their doubt.

Testing the Models

The author’s full evaluation of ChatGPT, including the various iterations, can be found here. They tried asking the same thing of GPT-5, which produced an error message. We plugged this error message through nine different versions of ChatGPT to see how any given instance would respond. The results were illuminating. Even though GPT-4 still had its stellar run of providing helpful answers most of the time, with every attempt producing great results, GPT-5 was a different story.

Remarkably, GPT-5 actually found a good solution that solved the problem consistently each time it ran. GPT-5 approached the problem by just using the raw index of each row. Looking back, it really just increased each index by one, thereby solving the issue quite easily. This was not without its drawbacks. For example, in a number of cases GPT-5 went off the rails. Instead of just returning the code, it returned additional commentary that some users may find superfluous.

The performance gaps between these two models point to the real issues with this new technology. For example, GPT-4 set a pretty high bar by reliably providing actionable advice. GPT-5 has a tendency to miss specific instructions and every so often come up with unrelated responses, an indication that performance may have dropped.

The Evolution of AI Models

It has proven, historically, that any time AI models started amazing improvements quarter over quarter. An interesting observation by the author is that Core models appear to have reached a quality plateau after a two-year run of consistent improvements. This stagnation is extremely worrying to developers who rely on a constantly evolving set of tools to bring about better coding practices and standards.

In contrast, previous versions of Claude tended to address unsolvable issues with hand-wavey fixes. It’s almost as though they’re simply throwing up their hands in resignation to the complicated questions. In sharp contrast, more recent versions such as GPT-5 are more likely to come up with productive alternatives. Too often, they don’t do enough to solve these issues. This erratic performance has resulted in anger from developers who are looking for consistent and dependable performance from AI coding assistants.

Furthermore, despite successful solutions being generated by GPT-5, the derived values were often still lacking semantic meaning or appropriate context. There were cases where the output was akin to throwing darts at a wall of numbers instead of sound solutions to programming challenges. This ambiguity erodes user trust in the technology and further indicates the need for additional training.

Implications for Developers

As developers increasingly integrate AI coding assistants into their workflows, they must navigate the challenges posed by fluctuating performance levels. This reliability, the confidence that you can trust outputs to be the same every time, is key to codifying and automating developer productivity while assuring code quality.

CEO of Carrington Labs, Andrew L. Therrien, emphasized amazing improvements AI tools have made to code generation. He highlighted the importance of regularly evaluating and adapting them to achieve their greatest effectiveness. As developers encounter issues with newer models like GPT-5, they may need to consider alternative solutions or revert to more reliable versions until performance stabilizes.

This rapidly evolving landscape creates a timely imperative for developers to critically reflect on AI coding assistants and how they may rely on them. The decline in performance observed over 2025 may prompt a reevaluation of how these tools are integrated into development processes.