Google is creating chat bots and AI models that are built using all the data that it has mined from the Internet. It has access to more data than anything else on the web, and we gladly handed it over because we want better SEO rankings.
Google took all your content, they trained their AI on it and now their AI can use that information to serve customers who would have come to you.
They can go to Google’s website, ask a question and get the information right there. They won’t be coming to yours. They won’t see your ads. They won’t see your sign-up links.
--> Let me give you an example. I am a developer and traditionally I used to spend a lot of time on programming sites reading articles or browsing Stackoverflow. These days I start by asking ChatGPT or Bard my questions and I get the answers about 70% of the time.
This is happening to many different segments, many different niches and it’s going to get worse.
What’s done is already lost but people are waking up and talking.
So available soon, you will get a facility to specify in your robots.txt that you don’t want your content crawled by an AI robot and used to train a bot.
This is still a proposal, one of the proposals that Google is making, but there may be more later. They started a public discussion on what to do with AI data collection.
Right now there’s no way to block the robots, Google and many other companies already have your data.
Here’s the most important thing. Google needs you. Most of their revenue is from ads and they need you to keep making fresh content so that they can show the ads.
If it’s not viable for website owners to create content anymore there’s nothing left to crawl.
How do we strike a balance? What do you see in the future for Large Language Models and AIs.
Do you think companies can keep crawling data and train an AI without permission on it?