Skip to content
0
  • Home
  • Piero Bosio
  • Blog
  • World
  • Fediverso
  • News
  • Categories
  • Old Web Site
  • Recent
  • Popular
  • Tags
  • Users
  • Home
  • Piero Bosio
  • Blog
  • World
  • Fediverso
  • News
  • Categories
  • Old Web Site
  • Recent
  • Popular
  • Tags
  • Users
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse

Piero Bosio Social Web Site Personale Logo Fediverso

Social Forum federato con il resto del mondo. Non contano le istanze, contano le persone
Jaycosm🔆undefined

Jaycosm🔆

@jay@mastodon.gamedev.place
About
Posts
2
Topics
0
Shares
0
Groups
0
Followers
0
Following
0

View Original

Posts

Recent

  • The web has a memory — and we’ve saved 1 trillion pages of it!
    Jaycosm🔆undefined Jaycosm🔆

    @evan @internetarchive Thanks for your info, Evan. My main concern is I notice the WayBackMachine has a copy of my websites in 2025, as recent as July, although I denied bots in 2024 or 2023 for all of my domains. I don't believe many companies are adhering to "robots.txt" contents, but Google Search currently is. Unfortunately, the only way to really protect a website today is to require user account login (secure session) to wall-off public content from web scraping.

    https://www.theverge.com/news/757538/reddit-internet-archive-wayback-machine-block-limit

    Uncategorized wayback1t livestream

  • The web has a memory — and we’ve saved 1 trillion pages of it!
    Jaycosm🔆undefined Jaycosm🔆

    @internetarchive I notice my websites are still being scraped by your bots even though I manually denied all bots in my domains' robots.txt files a year or two ago. WayBackMachine was great before the age of modern AI, but not now. I don't want companies using your APIs to scrape my archived content to train their AI models. I thought I read if I denied bots in robots.txt, you'd automatically pull related website content from your archives... but my content remains on your servers.

    Uncategorized wayback1t livestream
  • Login

  • Login or register to search.
  • First post
    Last post