Hugging Face Unveils SmolVLA, a Groundbreaking Robotics Model for Enhanced AI Capabilities

Hugging Face recently released SmolVLA, a revolutionary open AI model focused on robotics applications. This new, 450 million parameter model is a major technical breakthrough, with implications for the future of AI and robotics. SmolVLA was trained on datasets provided by LeRobot Community Datasets. These curated robotics datasets are hosted on Hugging Face’s state-of-the-art AI…

Lisa Wong Avatar

By

Hugging Face Unveils SmolVLA, a Groundbreaking Robotics Model for Enhanced AI Capabilities

Hugging Face recently released SmolVLA, a revolutionary open AI model focused on robotics applications. This new, 450 million parameter model is a major technical breakthrough, with implications for the future of AI and robotics. SmolVLA was trained on datasets provided by LeRobot Community Datasets. These curated robotics datasets are hosted on Hugging Face’s state-of-the-art AI development platform.

SmolVLA runs on a state-of-the-art asynchronous inference stack. In this new architecture, the model can use its unique architecture to decouple processing its own actions from processing its sensory information, such as vision and audio. This decoupling further enables robots to react more quickly to changing surroundings, improving their overall performance.

Despite its lightweight design, SmolVLA possesses robust capabilities, making it a promising tool for training and evaluating generalist robotics technologies. The model has received a lot of press due to its impressive performance. In a real-world trial, a user on X was able to successfully use it to operate the Koch Arm, a third-party robotic arm. To perform this testing, we used an RTX 2050 graphics card with only 4GB of memory. To dial in SmolVLA’s overall formatting and stylistic choices, we did additional tuning with just 31 demonstration instances. Surprisingly, these test outcomes either equaled or exceeded state-of-the-art single-task baselines.

Hugging Face recently posted their vision for SmolVLA in a blog post. They pointed to the ways this model would help democratize access to vision-language-action (VLA) technologies and accelerate research towards generalist robotic agents.

“SmolVLA aims to democratize access to vision-language-action [VLA] models and accelerate research toward generalist robotic agents.” – Hugging Face

SmolVLA’s introduction marks an exciting inflection point in the open robotics race. It showcases Hugging Face’s commitment to advancing AI technology and making it more accessible to everyone. The production model has a sleek design and fresh, innovative technology. This unique combination will continue to power groundbreaking innovations to robotics and increase collaboration across the AI research community.

“Because of this separation, robots can respond more quickly in fast-changing environments.” – Hugging Face