Today’s tech companies may have already scraped your web content to train their chatbots. But even so, Google has a new tool to help publishers opt out of the company’s own AI training.
"Today we’re announcing Google-Extended, a new control that web publishers can use to manage whether their sites help improve Bard and Vertex AI generative APIs," Google says.
The control operates as a token that can be placed in a website’s robot.txt file, which tells web crawlers which information to ignore on a domain. Website publishers can already add a token to block OpenAI’s web crawler from taking their content. But the issue is more complicated with Google since the company runs the number one search engine in the world.
The benefit to Google’s new control is that website publishers can continue to be indexed via Google's web crawlers so their search ranking won't tank—but they won't have to worry about their content being used for AI training (in theory).
Opting out, however, could water down Google’s AI training efforts. But the company was likely forced to implement it amid growing concerns and lawsuits about tech companies training their AI systems on user-generated content without permission. In July, Google began holding a “public discussion” with website publishers on developing new standards around using public data for AI training purposes.
It isn’t hard to imagine many website publishers instituting the new Google control to avoid freely giving up their content to the search giant. Nevertheless, Google is hopeful that the rise of AI will benefit the entire web ecosystem. “By using Google-Extended to control access to content on a site, a website administrator can choose whether to help these AI models become more accurate and capable over time,” VP for Trust Danielle Romain wrote in a blog post.
Google adds that it’s also exploring “additional machine-readable approaches to choice and control for web publishers” to address the growing reach of AI applications.