• Subscribe
  • Google's new model is made by mining the Internet. Is that okay?

    Cyril Gupta
    8 replies
    Google is creating chat bots and AI models that are built using all the data that it has mined from the Internet. It has access to more data than anything else on the web, and we gladly handed it over because we want better SEO rankings. Google took all your content, they trained their AI on it and now their AI can use that information to serve customers who would have come to you. They can go to Google’s website, ask a question and get the information right there. They won’t be coming to yours. They won’t see your ads. They won’t see your sign-up links. --> Let me give you an example. I am a developer and traditionally I used to spend a lot of time on programming sites reading articles or browsing Stackoverflow. These days I start by asking ChatGPT or Bard my questions and I get the answers about 70% of the time. This is happening to many different segments, many different niches and it’s going to get worse. What’s done is already lost but people are waking up and talking. So available soon, you will get a facility to specify in your robots.txt that you don’t want your content crawled by an AI robot and used to train a bot. This is still a proposal, one of the proposals that Google is making, but there may be more later. They started a public discussion on what to do with AI data collection. Right now there’s no way to block the robots, Google and many other companies already have your data. Here’s the most important thing. Google needs you. Most of their revenue is from ads and they need you to keep making fresh content so that they can show the ads. If it’s not viable for website owners to create content anymore there’s nothing left to crawl. How do we strike a balance? What do you see in the future for Large Language Models and AIs. Do you think companies can keep crawling data and train an AI without permission on it?

    Replies

    Akanksha Gaur
    Ethical issues are raised by the method of creating a new model by mining the Internet. Powerful models can be made possible by integrating a lot of data from the Internet, but this also poses issues with data privacy, consent, and possible biases. In order to address these issues and sustain ethical standards, businesses like Google must assure responsible data usage, transparency, and accountability. Forging an ethical and reliable AI ecosystem requires striking a balance between innovation and user rights protection.
    Austin Nguyen | Afforai
    I do see the concern, and perhaps there could be something to be done about it. But I won't worry too much about AI replacing a search engine just yet. There are two reasons for that. One is that the AI is still not accurate enough. For something like writing a blog or drafting an email, sure some errors and hallucinations are fine. But for more critical tasks like coding and decision-making, 70% just aren't good enough The second reason why AI won't replace search engines soon is that there is no accountability. If there is some fraudulent information online, you have someone to sue, someone to blame. You can't do that with AI (yet). So for important decisions, I would like to believe people would come to their senses and do the due diligence to verify if the AI is telling accurate information. That being said, a shameless plug coming up, I believe AI can be used for low-risk tasks, as well as the importance of showing where the AI gets the information from. That is the philosophy of Afforai, our launching soon product.
    Cyril Gupta
    @hungnguyenkhac7 Hmm... These are some interest viewpoints. Frankly, I did not think of these angles at all. Accountability as a factor specially. Thanks for your feedback Austin. Keep me posted about Afforai mate
    erwin smith
    To be honest, I'm always on the lookout for ways to optimize my business and deliver better results for my customers, so your idea is okay. Anyway, one area I've been focusing on lately is Internet connection. To improve my customer support system, I found this link that's where I found business support by connecting my supply chains, internal material flows, warehouses and construction sites. By automating repetitive tasks such as answering frequently asked questions, I can free up time for my customer service team to focus on more complex issues. I think it's reasonable to use something like that.
    Akbar Said
    The scenario you've described raises important concerns about data usage, AI models, and the future of content creation. Let's address your questions and the broader implications. Balancing Data Usage and Content Creation: Striking a balance between data usage and content creation is a complex challenge. On one hand, large language models and AI can provide immense value by offering instant answers and information to users. This can enhance user experience and save time for both developers (as in your example) and other users seeking information. On the other hand, website owners invest time, effort, and resources into creating valuable content. If AI models like ChatGPT can access and provide that content without users visiting the original websites, it could potentially impact website traffic, ad revenue, and user engagement for content creators. AI Data Collection and Consent: Data collection by AI models without explicit permission is a contentious issue. Many argue that companies should seek consent from content creators before using their data to train AI models. Consent and data ownership are important principles that need to be respected, and companies should be transparent about their data collection practices. The Role of Robots.txt: The proposal to allow website owners to specify in robots.txt that their content shouldn't be crawled by an AI robot could be a step in the right direction. However, this approach may have its limitations and challenges. For example, it may be difficult to distinguish between AI bots and other web crawlers, and some AI models might still find ways to access data even with such restrictions. The Future of Large Language Models and AIs: The future of large language models and AI will likely involve ongoing discussions and debates about data usage, ethics, and consent. Companies should be responsible for how they handle data and be mindful of the potential impact of their AI models on content creators and the broader web ecosystem. Data Crawling and Training without Permission: Ideally, companies should seek permission from content creators before using their data to train AI models. Respect for intellectual property and data ownership is crucial to foster a fair and sustainable digital environment. In some case you have to check your data connection and speed. And you can easily check internet speed with the help of free online tools like https://www.xn--trktelekomhztest...