Skip to content
0
  • Home
  • Piero Bosio
  • Blog
  • World
  • Fediverso
  • News
  • Categories
  • Old Web Site
  • Recent
  • Popular
  • Tags
  • Users
  • Home
  • Piero Bosio
  • Blog
  • World
  • Fediverso
  • News
  • Categories
  • Old Web Site
  • Recent
  • Popular
  • Tags
  • Users
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse

Piero Bosio Social Web Site Personale Logo Fediverso

Social Forum federato con il resto del mondo. Non contano le istanze, contano le persone
Scrubblesundefined

Scrubbles

@scrubbles@poptalk.scrubbles.tech
About
Posts
14
Topics
1
Shares
0
Groups
0
Followers
0
Following
0

Posts

Recent

  • Admins: Set up Anubis ASAP!
    Scrubblesundefined Scrubbles

    First step is check the logs in the Anubis container, if you see logs then it's intercepting requests! If you're able to access your site still, then congrats you have it set up!

    If you'd like I'm happy to look at your proxy config and let you know my thoughts. Either here or matrix I'm scrubbles@halflings.chat

    Uncategorized piefedmeta

  • Admins: Set up Anubis ASAP!
    Scrubblesundefined Scrubbles

    Yeah I'm seeing that too, which is what I thought but it didn't talk to much about the blocklists which is interesting. Overall very simple concept to me, if you want to access the site then great, prove that you're willing to work for it. I've worked for large scraping farms before and for the vast majority they would rather give up than keep doing that over and over. Compute for them is expensive. What takes a few seconds on our machine is tons of wasted compute, which I think is why I get so giddy over it - I love having them waste their money

    Uncategorized piefedmeta

  • Admins: Set up Anubis ASAP!
    Scrubblesundefined Scrubbles

    Fair, all of that. You know as someone who really likes pristine data, I really hate that actors used such basic clean data against all of us. What should have been a simple compatibility check was completely co-opted and now essentially serves no purpose.

    Anubis I know looks at much more than that, you've made me curious to go read through their code. It's working for my instance, but I also see it warrants more research as to how it decides

    Uncategorized piefedmeta

  • Admins: Set up Anubis ASAP!
    Scrubblesundefined Scrubbles

    That's absolutely true, looking at my logs there are definitely some weird ones:

    Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0b7) Gecko/20100101 Firefox/4.0b7
    

    Which.... if that's firefox is very out of date, and Windows 6.1 is Windows 7. Which I could believe that people are posting from Windows 7 compared to 10 or 11, but sus. A lot of them are kind of weird combinations. Anubis auto flagged that one as bot/huawei-cloud, but I'm really curious as to how or why it did. It's not authentic, that's for sure, but how does it know it isn't?

    Another one is Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.64 Safari/537.36 Edg/101.0.1210.47, is completely valid, but from an older Windows version and Edge, like if it was a snapshot from a few years ago. Idk it's all very interesting. That one was also flagged as bot/huawei-cloud.

    Uncategorized piefedmeta

  • Admins: Set up Anubis ASAP!
    Scrubblesundefined Scrubbles

    I'm not surprised, but I have noticed a lot of the bots are fake user agents as well. For example, for my instance I know for a fact no one else was using my instance when I was testing this, and I kept getting User Agent requests from Safari. Which I would be surprised if my users were using Apple too, but knowing they weren't on was a huge driver. I want to dive in and see how Anubis knows this, or if they were just tested and failed or didn't bother to complete the challenge. So I'm curious when you remove POSTs and api/federation endpoint calls what your traffic looks like

    Uncategorized piefedmeta

  • Admins: Set up Anubis ASAP!
    Scrubblesundefined Scrubbles

    See this message would have been better at the beginning of this thread, could have been a much better dialogue between us.

    I see in your script your doing the filtering at Anubis:

    request.path.startsWith("/api/")
    

    I did the opposite approach, I filter at my proxy/nginx and then only send web traffic to Anubis. With Lemmy since they're 2-containers for web/api it looks like this:

                    set $proxpass "http://anubis:8080/"; # this was the webui, but now it handles web traffic, passing into lemmy downstream
                    if ($http_accept ~ "^application/.*$") {
                      set $proxpass "http://lemmy:8536/"; #api
                    }
                    if ($request_method = POST) {
                      set $proxpass "http://lemmy:8536/"; #api
                    }
    

    This way everything that goes to Anubis is 100% okay for it to handle. Then also if there are endpoints that may not work (someone called out oauth flow), you can filter those out to go directly the the UI.

    For PieFed, even if you don't have a proxy in front now (which honestly would surprise me), I think it'd be better to add one then filter at that level. Let Anubis do what it does best, let Traefik/nginx/caddy/whatever do what it does best and route traffic.

    For safety you could do the reverse - allow everything and cut endpoints one by one.

    Uncategorized piefedmeta

  • Admins: Set up Anubis ASAP!
    Scrubblesundefined Scrubbles

    I thought so too with mine, divide your traffic though into the API vs Web requests. If you have a small instance most of your traffic should be federation traffic hitting the api endpoints, and posts. Scraping traffic will be GETs and requesting web content, not API content. That's what I noticed, that the vast, vast majority of my traffic wasn't federation at all, but web scrapes. Granted my instance has been around for almost 3 years now and I'm sure most of the bot farms know I exist.

    Uncategorized piefedmeta

  • Admins: Set up Anubis ASAP!
    Scrubblesundefined Scrubbles

    Not terribly. I posted this because I think it would help any fediverse site. It's just a proxy that sits in front of whatever traffic you choose to send to it. So whatever routes go to the web client (probably /) you just forward to Anubis, which forwards onto piefed. Lemmy, piefed, Mastodon, any web based app that's how you would do it. You can be granular and go route by route, or do all of it. It's not hard coded for any site.

    My puny site was getting hundreds of heavy requests per minute before I set this up from bots. I can't imagine what all fediverse sites are dealing with. I wanted to let fediverse admins know because I'm going to see a noticeable lessening of my bills I pay to host my instance, and I believe that would help other admins, which in turn will make the fediverse stronger.

    Uncategorized piefedmeta

  • Admins: Set up Anubis ASAP!
    Scrubblesundefined Scrubbles

    I have it in front of Lemmy, but if piefed is similar at all then you can proxy the web requests through Anubis which is all of the scraping traffic. API and federation then would not be affected, that's how I set mine up

    Uncategorized piefedmeta

  • Admins: Set up Anubis ASAP!
    Scrubblesundefined Scrubbles

    Yeah I guess I shouldn't have bothered showing people things that worked well for me

    Uncategorized piefedmeta

  • Admins: Set up Anubis ASAP!
    Scrubblesundefined Scrubbles

    It isn't supposed to sit in front of everything, only the webui. The API and federation endpoints should pass through your proxy as they always have. If you are following the standard Lemmy setup it should be a one line change in your nginx conf.

    Uncategorized piefedmeta

  • Admins: Set up Anubis ASAP!
    Scrubblesundefined Scrubbles

    This still lets scrapers through, but it's more of an artificial throttle. There are several knobs and dials it looks like I can turn to make it more or less permissive, but I got a say that over 90% of my traffic was coming from not farms, and I a small instance admin was paying for all of that with database queries and compute. I think this is a good middle ground. You can access it, if you're willing to really try to get at it. From what I see most scrapers give up immediately rather than spending on compute

    Uncategorized piefedmeta

  • Admins: Set up Anubis ASAP!
    Scrubblesundefined Scrubbles

    I'm probably going to save a good chunk of my hosting costs because of this, I'm happy to share it out. And happy to make big tech pay a bit more to access our content.

    Uncategorized piefedmeta

  • Admins: Set up Anubis ASAP!
    Scrubblesundefined Scrubbles

    cross-posted from: https://poptalk.scrubbles.tech/post/3263324

    Sorry for the alarming title but, Admins for real, go set up Anubis.

    For context, Anubis is essentially a gatekeeper/rate limiter for small services. From them:

    (Anubis) is designed to help protect the small internet from the endless storm of requests that flood in from AI companies. Anubis is as lightweight as possible to ensure that everyone can afford to protect the communities closest to them.

    It puts forward a challenge that must be solved in order to gain access, and judges how trustworthy a connection is. For the vast majority of real users they will never notice, or will notice a small delay accessing your site the first time. Even smaller scrapers may get by relatively easily.

    For big scrapers though, AI and trainers, they get hit with computational problems that waste their compute before being let in. (Trust me, I worked for a company that did "scrape the internet", and compute is expensive and a constant worry for them, so win win for us!)

    Anubis ended up taking maybe 10 minutes to set up. For Lemmy hosters you literally just point your UI proxy at Anubis and point Anubis to Lemmy UI. Very easy and slots right in, minimal setup.

    These graphs are since I turned it on less than an hour ago. I have a small instance, only a few people, and immediately my CPU usage has gone down and my requests per minute have gone down. I have already had thousands of requests challenged, I had no idea I was being scraped this much! You can see they're backing off in the charts.

    (FYI, this only stops the web requests, so it does nothing to the API or federation. Those are proxied elsewhere, so it really does only target web scrapers).

    Uncategorized piefedmeta
  • 1 / 1
  • Login

  • Login or register to search.
  • First post
    Last post