I'm a big fan of ChatGPT, Stable Diffusion, etc, but I must admit that all their training is based in crawling "public" text and images published in the Internet.
But when we say "public", it means content that it's been published by their authors with the hope of monetizing this content somehow, usually with visits to their sites.
Is it fair that all this content is used to train these AIs without even citing their authors? I don't think so.
Are these crawlers honouring any robots.txt directives?
Can we block AI crawlers?