The #BSDCafe #Forgejo instance ("brew") is constantly under "attack" by scrapers.
-
@announcements FWIW, you can get rid of a lot of those scrapers by catching all the self-identifying ones listed in ai.robots.txt, and on top of that, if anything says
Firefox/orChrome/in the user agent that does not also have asec-fetch-modeheader. Catch those, serve them a empty 200, or a 403, or 418, or heck, a 429, and a lot of the bots will be gone.(Mind you, this also catches Googlebot and Bingbot, which I personally consider a win, but you might want to make an exception for them if you want to appear in search results.)
-
@announcements FWIW, you can get rid of a lot of those scrapers by catching all the self-identifying ones listed in ai.robots.txt, and on top of that, if anything says
Firefox/orChrome/in the user agent that does not also have asec-fetch-modeheader. Catch those, serve them a empty 200, or a 403, or 418, or heck, a 429, and a lot of the bots will be gone.(Mind you, this also catches Googlebot and Bingbot, which I personally consider a win, but you might want to make an exception for them if you want to appear in search results.)
@algernon @announcements thank you!