K Prize Launches with Mixed Results as First Winner Announced

The K Prize, a thrilling multi-round AI coding competition, just crowned its first champion. This challenge was primarily introduced by Andy Konwinski, who is the co-founder of both Databricks and Perplexity. Eduardo Rocha de Andrade, a Brazilian prompt engineer, is the winning winner. He only got 7.5% on the very first test created to measure…

Lisa Wong Avatar

By

K Prize Launches with Mixed Results as First Winner Announced

The K Prize, a thrilling multi-round AI coding competition, just crowned its first champion. This challenge was primarily introduced by Andy Konwinski, who is the co-founder of both Databricks and Perplexity. Eduardo Rocha de Andrade, a Brazilian prompt engineer, is the winning winner. He only got 7.5% on the very first test created to measure AI models’ effectiveness at tackling real-world programming problems. The Laude Institute, the nonprofit organization that manages the competition, has some pretty fantastic news. Along with the honor, Andrade takes home a $50,000 cash prize for his remarkable achievement!

This first score is a jarring departure from the performance standards set by SWE-Bench, a benchmark for AI models. The top score on SWE-Bench’s ‘Verified’ test stands at 75%, while its more challenging ‘Full’ test shows a top score of 34%. The poor showing in the K Prize serves to underscore just how difficult this challenge is. It makes you question if today’s AI models are actually capable of solving more complicated coding challenges, like CodeSignal’s Arcade Mode.

The deadline for models entered in round one of the K Prize was March 12th. This unique competition is designed to directly challenge AI systems with troublesome issues flagged by the GitHub community. It will give a more realistic baseline for assessing their real-world applicability to SE. Konwinski expressed satisfaction with the difficulty level of the K Prize, stating, “We’re glad we built a benchmark that is actually hard.”

Konwinski is powering the future of AI coding. He’s promised an additional $1 million to the first open-source model that achieves more than 90% accuracy on the K Prize test. He expects that with every additional round, players will increasingly learn to play to the competition’s unique dynamic. “As we get more runs of the thing, we’ll have a better sense,” he noted.

Sayash Kapoor, another central figure in the project, was hopeful about all of the above when it comes to future versions of benchmarking tests. “I’m quite bullish about building new tests for existing benchmarks,” he stated, hinting at ongoing developments in evaluating AI performance.

The K Prize is an exciting, audacious challenge to the AI industry. It aims to call attention to the shortcomings of existing models in light of the advancing hype around AI capabilities. Konwinski highlighted this concern, remarking, “If you listen to the hype, it’s like we should be seeing AI doctors and AI lawyers and AI software engineers, and that’s just not true.”