AI Coding Assistants Face Decline in Efficiency and Quality

Jamie Twiss, the former CEO of Carrington Labs, has been an enthusiastic user of AI coding assistants. This tactic strengthens his organization’s position as a new powerhouse in predictive analytics for mortgage lenders. Recent testing of these tools reveals a concerning trend: the quality and efficiency of AI-generated coding assistance may be diminishing. This steep…

Tina Reynolds Avatar

By

AI Coding Assistants Face Decline in Efficiency and Quality

Jamie Twiss, the former CEO of Carrington Labs, has been an enthusiastic user of AI coding assistants. This tactic strengthens his organization’s position as a new powerhouse in predictive analytics for mortgage lenders. Recent testing of these tools reveals a concerning trend: the quality and efficiency of AI-generated coding assistance may be diminishing. This steep drop raises some critical questions. Are AI language models like ChatGPT trustworthy, and just how much do they boost coding productivity?

Twiss recently conducted a round of tests with multiple AI models, including GPT-4 and GPT-5. His goal was to test how effectively these models could generate code solutions. Unsurprisingly, the results have been a complex picture of evolving strengths and weaknesses in these AI tools. The results GPT-4 produced were reliably the most useful. Conversely, GPT-5 was inconsistent demonstrating a major regression from GPT-4’s performance.

Performance Discrepancies Among AI Models

Through his experiments, Twiss discovered that GPT-4 produced satisfactory responses in each of the ten attempts. This level of reliability was one of the reasons it was such a prized tool when doing extensive coding work at Carrington Labs. The responses generated by GPT-5 exhibited wide inconsistencies in quality. While it certainly solved many problems well, it ultimately fell short in others, sometimes overcompensating but failing to recognize the issues left unaddressed.

Twiss then sent a prompt or error message to GPT-5, telling it to just give back the fixed code. In reply, our model wrote the code to print out the names of the columns in the test cases df for nine of ten test cases. However, as helpful as this was, the amount of redirection did not match the unambiguous message to just show the code. In three cases, GPT-5 refused to complete the request to indicate that a column is likely not included in the dataset. This kind of behavior shows a lack of clarity in comprehension and command application.

The previous Claude models were even more of an underachiever. Caught between impossible challenges, they just seemed to throw their hands up, resulting in minimal deliverables. This glaring discrepancy raises a larger issue over the changing skillsets of AI coding assistants.

Insights from Jamie Twiss’s Testing

Twiss’s assessment was the result of running 10 trials per model and categorizing their outputs as helpful, useless or harmful. To their surprise, they found that despite its impressive abilities, GPT-4 had been more reliable and that GPT-5 faltered on several tasks. In half a dozen attempts, GPT-5 attempted to run code. It had exceptions that resulted in error messages or caused columns to be filled out incorrectly.

Despite some successes—GPT-5 did find effective solutions by using the actual index of each row and adding 1 to it—the overall inconsistency in performance raised eyebrows. One of the six primary aims of AI tools is that they should enhance productivity. Coding in general has become less efficient with the onslaught of new AI technology. With AI’s help, what used to take about five hours now sometimes takes seven or eight hours. Without the boost of AI, the extension takes much longer.

This erosion of efficiency is deeply concerning for practitioners who use these types of models to make their work more efficient and effective. As an innovator in delivering predictive-analytics risk models to lenders, Carrington Labs understands that the demand for mission-critical coding support is great.

The Future of AI Coding Assistants

Twiss looks back on the changing world of AI-powered coding assistants. He observes that the improvements in the quality of GPT models begin to plateau ever so sharply by 2025. This stagnation seems to have marked the beginning of a dramatic recent decline in their effectiveness. The early excitement around these tools as a means to improve coding productivity is fading as we see the mixed results they produce.

The obstacles encountered by users such as Twiss stress an important point. Developers need to continue to hone these models. Dependence on AI for coding tasks is all the rage, but there’s a catch. Users need to be cautious about the shortcomings of these tools and willing to pivot when faced with out-of-the-ordinary outputs.