Piero Bosio Social Web Site Personale

undefined

@evan @internetarchive Thanks for your info, Evan. My main concern is I notice the WayBackMachine has a copy of my websites in 2025, as recent as July, although I denied bots in 2024 or 2023 for all of my domains. I don't believe many companies are adhering to "robots.txt" contents, but Google Search currently is. Unfortunately, the only way to really protect a website today is to require user account login (secure session) to wall-off public content from web scraping.

https://www.theverge.com/news/757538/reddit-internet-archive-wayback-machine-block-limit

undefined

@internetarchive I notice my websites are still being scraped by your bots even though I manually denied all bots in my domains' robots.txt files a year or two ago. WayBackMachine was great before the age of modern AI, but not now. I don't want companies using your APIs to scrape my archived content to train their AI models. I thought I read if I denied bots in robots.txt, you'd automatically pull related website content from your archives... but my content remains on your servers.

Piero Bosio Social Web Site Personale

Jaycosm🔆

Posts