Admins: Set up Anubis ASAP!
- 
cross-posted from: https://poptalk.scrubbles.tech/post/3263324 Sorry for the alarming title but, Admins for real, go set up Anubis. For context, Anubis is essentially a gatekeeper/rate limiter for small services. From them: (Anubis) is designed to help protect the small internet from the endless storm of requests that flood in from AI companies. Anubis is as lightweight as possible to ensure that everyone can afford to protect the communities closest to them. It puts forward a challenge that must be solved in order to gain access, and judges how trustworthy a connection is. For the vast majority of real users they will never notice, or will notice a small delay accessing your site the first time. Even smaller scrapers may get by relatively easily. For big scrapers though, AI and trainers, they get hit with computational problems that waste their compute before being let in. (Trust me, I worked for a company that did "scrape the internet", and compute is expensive and a constant worry for them, so win win for us!) Anubis ended up taking maybe 10 minutes to set up. For Lemmy hosters you literally just point your UI proxy at Anubis and point Anubis to Lemmy UI. Very easy and slots right in, minimal setup.  These graphs are since I turned it on less than an hour ago. I have a small instance, only a few people, and immediately my CPU usage has gone down and my requests per minute have gone down. I have already had thousands of requests challenged, I had no idea I was being scraped this much! You can see they're backing off in the charts. (FYI, this only stops the web requests, so it does nothing to the API or federation. Those are proxied elsewhere, so it really does only target web scrapers). 
- 
cross-posted from: https://poptalk.scrubbles.tech/post/3263324 Sorry for the alarming title but, Admins for real, go set up Anubis. For context, Anubis is essentially a gatekeeper/rate limiter for small services. From them: (Anubis) is designed to help protect the small internet from the endless storm of requests that flood in from AI companies. Anubis is as lightweight as possible to ensure that everyone can afford to protect the communities closest to them. It puts forward a challenge that must be solved in order to gain access, and judges how trustworthy a connection is. For the vast majority of real users they will never notice, or will notice a small delay accessing your site the first time. Even smaller scrapers may get by relatively easily. For big scrapers though, AI and trainers, they get hit with computational problems that waste their compute before being let in. (Trust me, I worked for a company that did "scrape the internet", and compute is expensive and a constant worry for them, so win win for us!) Anubis ended up taking maybe 10 minutes to set up. For Lemmy hosters you literally just point your UI proxy at Anubis and point Anubis to Lemmy UI. Very easy and slots right in, minimal setup.  These graphs are since I turned it on less than an hour ago. I have a small instance, only a few people, and immediately my CPU usage has gone down and my requests per minute have gone down. I have already had thousands of requests challenged, I had no idea I was being scraped this much! You can see they're backing off in the charts. (FYI, this only stops the web requests, so it does nothing to the API or federation. Those are proxied elsewhere, so it really does only target web scrapers). Thank you for sharing 
- 
Thank you for sharing I'm probably going to save a good chunk of my hosting costs because of this, I'm happy to share it out. And happy to make big tech pay a bit more to access our content. 
- 
cross-posted from: https://poptalk.scrubbles.tech/post/3263324 Sorry for the alarming title but, Admins for real, go set up Anubis. For context, Anubis is essentially a gatekeeper/rate limiter for small services. From them: (Anubis) is designed to help protect the small internet from the endless storm of requests that flood in from AI companies. Anubis is as lightweight as possible to ensure that everyone can afford to protect the communities closest to them. It puts forward a challenge that must be solved in order to gain access, and judges how trustworthy a connection is. For the vast majority of real users they will never notice, or will notice a small delay accessing your site the first time. Even smaller scrapers may get by relatively easily. For big scrapers though, AI and trainers, they get hit with computational problems that waste their compute before being let in. (Trust me, I worked for a company that did "scrape the internet", and compute is expensive and a constant worry for them, so win win for us!) Anubis ended up taking maybe 10 minutes to set up. For Lemmy hosters you literally just point your UI proxy at Anubis and point Anubis to Lemmy UI. Very easy and slots right in, minimal setup.  These graphs are since I turned it on less than an hour ago. I have a small instance, only a few people, and immediately my CPU usage has gone down and my requests per minute have gone down. I have already had thousands of requests challenged, I had no idea I was being scraped this much! You can see they're backing off in the charts. (FYI, this only stops the web requests, so it does nothing to the API or federation. Those are proxied elsewhere, so it really does only target web scrapers). I tried 2 times, for hours each time, to set this up in a way that does not break federation, the api, or the web ui. It's not as easy as the OP thinks and problems aren't always obvious immediately. Any PieFed instance owners out there who are getting flooded with requests from bots - send me a PM and I'll let you know how I solved it for piefed.social. I'd rather not post it publicly because it could be circumvented if 'they' knew how I'm doing it. 
- 
cross-posted from: https://poptalk.scrubbles.tech/post/3263324 Sorry for the alarming title but, Admins for real, go set up Anubis. For context, Anubis is essentially a gatekeeper/rate limiter for small services. From them: (Anubis) is designed to help protect the small internet from the endless storm of requests that flood in from AI companies. Anubis is as lightweight as possible to ensure that everyone can afford to protect the communities closest to them. It puts forward a challenge that must be solved in order to gain access, and judges how trustworthy a connection is. For the vast majority of real users they will never notice, or will notice a small delay accessing your site the first time. Even smaller scrapers may get by relatively easily. For big scrapers though, AI and trainers, they get hit with computational problems that waste their compute before being let in. (Trust me, I worked for a company that did "scrape the internet", and compute is expensive and a constant worry for them, so win win for us!) Anubis ended up taking maybe 10 minutes to set up. For Lemmy hosters you literally just point your UI proxy at Anubis and point Anubis to Lemmy UI. Very easy and slots right in, minimal setup.  These graphs are since I turned it on less than an hour ago. I have a small instance, only a few people, and immediately my CPU usage has gone down and my requests per minute have gone down. I have already had thousands of requests challenged, I had no idea I was being scraped this much! You can see they're backing off in the charts. (FYI, this only stops the web requests, so it does nothing to the API or federation. Those are proxied elsewhere, so it really does only target web scrapers). To be honest, I'm not really a fan of fighting the AI scraper war. It's ultimately the same dynamics which turned the open internet into what it is today... 10 years ago I could just look up stuff... google whether my Thinkpad supports more or faster RAM than what the specs said, and I'd find some nice Reddit thread. And now it's all walled off, information isn't shared freely anymore, and I'm having a hard time. And we do the same thing. But... We do need to fight. I just wish there was some solution which doesn't add to the enshittification of the internet. I've been hit by them as well and the database load made the entire server grind to a halt. I've added some firewall rules and deny lists to my reverse proxy to specifically block the AI companies and so far it's looking good. I'll try to postpone solutions like Anubis until it's unavoidable. And I guess it won't work for everything. We need machine-readable information, APIs, servers to talk to each other... But that's just my 2 cents and I'm going to change my opinion and countermeasures once necessary. 
- 
To be honest, I'm not really a fan of fighting the AI scraper war. It's ultimately the same dynamics which turned the open internet into what it is today... 10 years ago I could just look up stuff... google whether my Thinkpad supports more or faster RAM than what the specs said, and I'd find some nice Reddit thread. And now it's all walled off, information isn't shared freely anymore, and I'm having a hard time. And we do the same thing. But... We do need to fight. I just wish there was some solution which doesn't add to the enshittification of the internet. I've been hit by them as well and the database load made the entire server grind to a halt. I've added some firewall rules and deny lists to my reverse proxy to specifically block the AI companies and so far it's looking good. I'll try to postpone solutions like Anubis until it's unavoidable. And I guess it won't work for everything. We need machine-readable information, APIs, servers to talk to each other... But that's just my 2 cents and I'm going to change my opinion and countermeasures once necessary. This still lets scrapers through, but it's more of an artificial throttle. There are several knobs and dials it looks like I can turn to make it more or less permissive, but I got a say that over 90% of my traffic was coming from not farms, and I a small instance admin was paying for all of that with database queries and compute. I think this is a good middle ground. You can access it, if you're willing to really try to get at it. From what I see most scrapers give up immediately rather than spending on compute 
- 
I tried 2 times, for hours each time, to set this up in a way that does not break federation, the api, or the web ui. It's not as easy as the OP thinks and problems aren't always obvious immediately. Any PieFed instance owners out there who are getting flooded with requests from bots - send me a PM and I'll let you know how I solved it for piefed.social. I'd rather not post it publicly because it could be circumvented if 'they' knew how I'm doing it. It isn't supposed to sit in front of everything, only the webui. The API and federation endpoints should pass through your proxy as they always have. If you are following the standard Lemmy setup it should be a one line change in your nginx conf. 
- 
It isn't supposed to sit in front of everything, only the webui. The API and federation endpoints should pass through your proxy as they always have. If you are following the standard Lemmy setup it should be a one line change in your nginx conf. Sir, this is a Wendys. 
- 
cross-posted from: https://poptalk.scrubbles.tech/post/3263324 Sorry for the alarming title but, Admins for real, go set up Anubis. For context, Anubis is essentially a gatekeeper/rate limiter for small services. From them: (Anubis) is designed to help protect the small internet from the endless storm of requests that flood in from AI companies. Anubis is as lightweight as possible to ensure that everyone can afford to protect the communities closest to them. It puts forward a challenge that must be solved in order to gain access, and judges how trustworthy a connection is. For the vast majority of real users they will never notice, or will notice a small delay accessing your site the first time. Even smaller scrapers may get by relatively easily. For big scrapers though, AI and trainers, they get hit with computational problems that waste their compute before being let in. (Trust me, I worked for a company that did "scrape the internet", and compute is expensive and a constant worry for them, so win win for us!) Anubis ended up taking maybe 10 minutes to set up. For Lemmy hosters you literally just point your UI proxy at Anubis and point Anubis to Lemmy UI. Very easy and slots right in, minimal setup.  These graphs are since I turned it on less than an hour ago. I have a small instance, only a few people, and immediately my CPU usage has gone down and my requests per minute have gone down. I have already had thousands of requests challenged, I had no idea I was being scraped this much! You can see they're backing off in the charts. (FYI, this only stops the web requests, so it does nothing to the API or federation. Those are proxied elsewhere, so it really does only target web scrapers). For piefed admins out there, a simpler way to basically kill scrapers dead is to simply enable the Private instanceoption in the Misc tab of the admin settings. This prevents non-logged in users from browsing your instance without blocking the/inboxroute for federation or the api routes.
- 
I tried 2 times, for hours each time, to set this up in a way that does not break federation, the api, or the web ui. It's not as easy as the OP thinks and problems aren't always obvious immediately. Any PieFed instance owners out there who are getting flooded with requests from bots - send me a PM and I'll let you know how I solved it for piefed.social. I'd rather not post it publicly because it could be circumvented if 'they' knew how I'm doing it. if it helps, i think slrpnk lemmy community did setup anubis and they still federate. Not sure if their are lemmy specific problems though. 
- 
cross-posted from: https://poptalk.scrubbles.tech/post/3263324 Sorry for the alarming title but, Admins for real, go set up Anubis. For context, Anubis is essentially a gatekeeper/rate limiter for small services. From them: (Anubis) is designed to help protect the small internet from the endless storm of requests that flood in from AI companies. Anubis is as lightweight as possible to ensure that everyone can afford to protect the communities closest to them. It puts forward a challenge that must be solved in order to gain access, and judges how trustworthy a connection is. For the vast majority of real users they will never notice, or will notice a small delay accessing your site the first time. Even smaller scrapers may get by relatively easily. For big scrapers though, AI and trainers, they get hit with computational problems that waste their compute before being let in. (Trust me, I worked for a company that did "scrape the internet", and compute is expensive and a constant worry for them, so win win for us!) Anubis ended up taking maybe 10 minutes to set up. For Lemmy hosters you literally just point your UI proxy at Anubis and point Anubis to Lemmy UI. Very easy and slots right in, minimal setup.  These graphs are since I turned it on less than an hour ago. I have a small instance, only a few people, and immediately my CPU usage has gone down and my requests per minute have gone down. I have already had thousands of requests challenged, I had no idea I was being scraped this much! You can see they're backing off in the charts. (FYI, this only stops the web requests, so it does nothing to the API or federation. Those are proxied elsewhere, so it really does only target web scrapers). I am pro setting up anubis, but if only it does not break some "intended" behaviour (federation and rss for example). 
- 
To be honest, I'm not really a fan of fighting the AI scraper war. It's ultimately the same dynamics which turned the open internet into what it is today... 10 years ago I could just look up stuff... google whether my Thinkpad supports more or faster RAM than what the specs said, and I'd find some nice Reddit thread. And now it's all walled off, information isn't shared freely anymore, and I'm having a hard time. And we do the same thing. But... We do need to fight. I just wish there was some solution which doesn't add to the enshittification of the internet. I've been hit by them as well and the database load made the entire server grind to a halt. I've added some firewall rules and deny lists to my reverse proxy to specifically block the AI companies and so far it's looking good. I'll try to postpone solutions like Anubis until it's unavoidable. And I guess it won't work for everything. We need machine-readable information, APIs, servers to talk to each other... But that's just my 2 cents and I'm going to change my opinion and countermeasures once necessary. Earlier, scrapers were not harming a website much. maybe 1 or search engine hits a day, and just a handful of search engines meant that they would not consume much resources. they would scrape your data, but just so search would work (otherwise all search would just work on your website's title). this did not harm the websites' business model (if they had any). it went downhill when google started giving dires "answers" to search queries. mostly based on wiki, but occasionaly other stuff like reddit. this was often just relevant bit from "high match" website and while it reduced some traffic to those webistes, it was not as bad as it is today - where all most all traffic is from scraping bots, and many just have stopped visiting sites directly. 
- 
Yeah I guess I shouldn't have bothered showing people things that worked well for me 
- 
I am pro setting up anubis, but if only it does not break some "intended" behaviour (federation and rss for example). I have it in front of Lemmy, but if piefed is similar at all then you can proxy the web requests through Anubis which is all of the scraping traffic. API and federation then would not be affected, that's how I set mine up 
- 
Yeah I guess I shouldn't have bothered showing people things that worked well for me I'm confused why you are talking talking about lemmy containers in a piefed-focused community. Piefed doesn't separate out the web ui into a separate container by design. So the solutions and difficulty of implementing are very different. 
- 
I'm confused why you are talking talking about lemmy containers in a piefed-focused community. Piefed doesn't separate out the web ui into a separate container by design. So the solutions and difficulty of implementing are very different. Not terribly. I posted this because I think it would help any fediverse site. It's just a proxy that sits in front of whatever traffic you choose to send to it. So whatever routes go to the web client (probably /) you just forward to Anubis, which forwards onto piefed. Lemmy, piefed, Mastodon, any web based app that's how you would do it. You can be granular and go route by route, or do all of it. It's not hard coded for any site. My puny site was getting hundreds of heavy requests per minute before I set this up from bots. I can't imagine what all fediverse sites are dealing with. I wanted to let fediverse admins know because I'm going to see a noticeable lessening of my bills I pay to host my instance, and I believe that would help other admins, which in turn will make the fediverse stronger. 
- 
Not terribly. I posted this because I think it would help any fediverse site. It's just a proxy that sits in front of whatever traffic you choose to send to it. So whatever routes go to the web client (probably /) you just forward to Anubis, which forwards onto piefed. Lemmy, piefed, Mastodon, any web based app that's how you would do it. You can be granular and go route by route, or do all of it. It's not hard coded for any site. My puny site was getting hundreds of heavy requests per minute before I set this up from bots. I can't imagine what all fediverse sites are dealing with. I wanted to let fediverse admins know because I'm going to see a noticeable lessening of my bills I pay to host my instance, and I believe that would help other admins, which in turn will make the fediverse stronger. Yes, we know. This is a blog post about Anubis I wrote a few months back, when I thought I had it working https://join.piefed.social/2025/07/09/an-anubis-config-for-piefed/ After writing that post I took another approach and moved the logic into nginx instead as the Anubis configuration language was impossible to debug. That seemed fine but then a different weird Anubis bug that had been languishing in their issue queue for months with no solution hit me and I gave up. It's just not ready. It's bad software. Their documentation is very misleading. I hope no one else loses as many days as I did. 
- 
It's not really a Wendy 
- 
This still lets scrapers through, but it's more of an artificial throttle. There are several knobs and dials it looks like I can turn to make it more or less permissive, but I got a say that over 90% of my traffic was coming from not farms, and I a small instance admin was paying for all of that with database queries and compute. I think this is a good middle ground. You can access it, if you're willing to really try to get at it. From what I see most scrapers give up immediately rather than spending on compute Hmmm. I mean my instance is small, almost all of my traffic comes from federation. There will be like 50 requests in the log to forward some upvotes, and then one or two from a user or crawler. Most of them seem to behave, it was just Alibaba and Tencent who did proper DDoS attacks on me. I've blocked most of their ASNs and so far it seems to do 100% what I was trying to do. I suppose the minor stream of requests from South America and other places of the world isn't humans either, but they mostly read articles and that's kind of a cheap request on PieFed and it's not a lot. I'll keep an eye on it, maybe that's an alternative solution. Though currently it's a manual process to look up the address ranges and write the firewall rules. It should probably be automated in a way before random people adopt it. 
- 
Yes, we know. This is a blog post about Anubis I wrote a few months back, when I thought I had it working https://join.piefed.social/2025/07/09/an-anubis-config-for-piefed/ After writing that post I took another approach and moved the logic into nginx instead as the Anubis configuration language was impossible to debug. That seemed fine but then a different weird Anubis bug that had been languishing in their issue queue for months with no solution hit me and I gave up. It's just not ready. It's bad software. Their documentation is very misleading. I hope no one else loses as many days as I did. It’s just not ready. It’s bad software. Their documentation is very misleading. I hope no one else loses as many days as I did. That's unfortunate 
















 

