Mastodon recently made big headlines with updates to its terms of service. These modifications are intended to improve user privacy and avoid unexpected data usage. Beginning July 1, the new regulations will go into effect. Specifically, they prohibit any form of model training, particularly for large language models (LLMs), that involves data collected from the platform. These changes only affect the Mastodon.social server, one of the larger instances in the decentralized shared fediverse ecosystem.
The new terms emphasize Mastodon’s focus on protecting user information. The model has one notable provision that prohibits the scraping of user data. This is true for unpermitted uses such as archiving or training AI models. This move is a meaningful step toward addressing the growing outcry about the unethical and irresponsible use of data. Perhaps most significantly, it addresses the issues surrounding AI training on user-generated content.
Until now, Mastodon let users in the United States join at 13 years old. The new international policy makes 16 the new ceiling. This historic decision continues the trend of seeking to make a safer internet for younger users.
The amended language still employs a heavy hand of legalese. They unambiguously ban all data scraping and the creation of non-manual processes. Specifically, it states, “Use, launch, develop, or distribute any automated system, including without limitation, any spider, robot, cheat utility, scraper, offline reader, or any data mining or similar data gathering extraction tools to access the Instance, except in each case as may be the result of standard search engine or Internet browser and local caching or for human review and interaction with Content on the Instance.”
“We explicitly prohibit the scraping of user data for unauthorized purposes, e.g., archival or large language model (LLM) training. We want to make it clear that training LLMs on the data of Mastodon users on our instances is not permitted,” stated Mastodon representatives regarding the new terms.
In doing so, this initiative has gone a very critical way to bolster user trust across the entire Mastodon user community. It instills confidence in users all over the fediverse. The organization’s platform goes further, proactively working to establish a national standard for data ethics. It serves the public by working zealously to safeguard personal information from exploitation.