As you might expect, there’s a boom in investment in SV right now aimed at developing RL environments. These virtual environments offer a rich environment to train general-purpose artificial agents on complex multi-step tasks. As demand for these environments grows, leading tech firms and startups are racing to create sophisticated systems that can facilitate the advancement of artificial intelligence.
Reinforcement Learning, a critical driver behind today’s AI boom, gives agents the ability to learn through interaction with their environments. While the idea itself is not new, it has been floating around for some time. It gained real momentum in 2016 when OpenAI released “RL Gyms.” These RL Gyms provided a really nice framework for training AI agents. These days, they lead the charge of venturing into deeper, richer environments throughout the sector.
Notably, Google DeepMind successfully utilized RL techniques within a simulated environment to develop AlphaGo, an AI system that defeated a world champion at the ancient board game Go. This landmark achievement showcased the potential of RL environments and sparked interest from various sectors eager to leverage similar techniques.
The Demand for RL Environments
As the AI landscape evolves, companies like Scale AI are stepping up to meet the increasing demand for RL environments. Ross Taylor, previously an AI research lead at Meta, said scaling these environments is filled with pitfalls.
“I think people are underestimating how difficult it is to scale environments.” – Ross Taylor
Taylor’s reflections make clear just how much of a balance act it often is to develop truly useful RL environments. He notes how even the best public RL environments still take considerable effort to set up and correctly work.
Surge’s success story is even more impressive considering their revenue last year came to $1.2 billion. Today, the company has formed a brand new internal team for creating these Reinforcement Learning environments. Read more Edwin Chen, CEO of Surge, recently told Protocol he’s seen a “huge uptick” in demand for these bespoke environments from AI labs.
Chetan Rane, head of product for agents and RL environments at Scale AI, shares a beautiful illustration. He likens creating RL environments to the production of an “incredibly tedious video game. He further points out that this is just the nature of the business model that Scale AI works under.
Challenges and Opportunities
Although excitement is growing around RL environments, some industry veterans warn about the challenges still to come. Jennifer Li notes that “all the big AI labs are building RL environments in-house,” indicating a competitive landscape where companies strive to develop proprietary solutions.
Brendan Foody, CEO of Mercor, recognizes the huge opportunity that exists in this small niche. He states, “Few understand how large the opportunity around RL environments truly is.” Mercor is entering this space with an initial focus on creating reinforcement learning environments for AI coding agents. This smart approach highlights their focused strategy in the overall industry landscape.
In the midst of all this positive chatter, Karpathy warns against the dangers of thinking too narrowly about reinforcement learning. He mentions that while he is optimistic about environments and agentic interactions, he remains “bearish on reinforcement learning specifically.”
Innovation from Startups
Quite a few startups like Supernumerary, Evals.ai, and others are carrying on making huge strides in developing acclaimed RL environments. Mercor has recently made headlines with its $10 billion valuation and collaborations with industry leaders OpenAI and Meta. Matthew Barnett, co-founder of Mercor, recognizes that developing strong RL environments and evaluations is delicate work that requires complex artistry.
Talent Scale AI is on a mission to bring the best and brightest home. They are said to be paying software engineers up to $500,000 just to create elaborate virtual environments. It underscores the indispensable role that expertly trained practitioners will have in realizing the full promise of RL technology.
After having delved into the competitive nature of the AI research space on all sides, this point is painfully obvious. Wu observes that, as the space is developing quickly, it is creating a difficult environment for firms seeking to best serve AI labs. Public and private organizations alike are in a sprint to innovate. Yet, they have to combat the challenges of reward hacking and more adversarial hallmarks in RL environments.