Skip to content

Piero Bosio Social Web Site Personale Logo Fediverso

Social Forum federato con il resto del mondo. Non contano le istanze, contano le persone

Admins: Set up Anubis ASAP!

Uncategorized
35 7 0
  • I'm not surprised, but I have noticed a lot of the bots are fake user agents as well. For example, for my instance I know for a fact no one else was using my instance when I was testing this, and I kept getting User Agent requests from Safari. Which I would be surprised if my users were using Apple too, but knowing they weren't on was a huge driver. I want to dive in and see how Anubis knows this, or if they were just tested and failed or didn't bother to complete the challenge. So I'm curious when you remove POSTs and api/federation endpoint calls what your traffic looks like

    I don't know how you're reading this, in case you didn't know, a lot of browsers have the word "Safari" in their user agent string.

    My Vanadium browser identifies as: "Mozilla/5.0 (Linux; Android 10; K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/142.0.0.0 Mobile Safari/537.36"
    So there's Safari in it, but it's a Chromium based browser on Android. And none of the information is correct. It's Android 16, not 10 and Vanadium or Chromium isn't even in it. Also doesn't use Mozilla's engine nor AppleWebkit. It is version 142.something however.

    My Firefox (LibreWolf) identifies as: "Mozilla/5.0 (X11; Linux x86_64; rv:144.0) Gecko/20100101 Firefox/144.0" that doesn't include anything else.

    But the desktop Chromium does the same thing again: "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/141.0.0.0 Safari/537.36"

  • I don't know how you're reading this, in case you didn't know, a lot of browsers have the word "Safari" in their user agent string.

    My Vanadium browser identifies as: "Mozilla/5.0 (Linux; Android 10; K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/142.0.0.0 Mobile Safari/537.36"
    So there's Safari in it, but it's a Chromium based browser on Android. And none of the information is correct. It's Android 16, not 10 and Vanadium or Chromium isn't even in it. Also doesn't use Mozilla's engine nor AppleWebkit. It is version 142.something however.

    My Firefox (LibreWolf) identifies as: "Mozilla/5.0 (X11; Linux x86_64; rv:144.0) Gecko/20100101 Firefox/144.0" that doesn't include anything else.

    But the desktop Chromium does the same thing again: "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/141.0.0.0 Safari/537.36"

    That's absolutely true, looking at my logs there are definitely some weird ones:

    Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0b7) Gecko/20100101 Firefox/4.0b7
    

    Which.... if that's firefox is very out of date, and Windows 6.1 is Windows 7. Which I could believe that people are posting from Windows 7 compared to 10 or 11, but sus. A lot of them are kind of weird combinations. Anubis auto flagged that one as bot/huawei-cloud, but I'm really curious as to how or why it did. It's not authentic, that's for sure, but how does it know it isn't?

    Another one is Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.64 Safari/537.36 Edg/101.0.1210.47, is completely valid, but from an older Windows version and Edge, like if it was a snapshot from a few years ago. Idk it's all very interesting. That one was also flagged as bot/huawei-cloud.

  • That's absolutely true, looking at my logs there are definitely some weird ones:

    Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0b7) Gecko/20100101 Firefox/4.0b7
    

    Which.... if that's firefox is very out of date, and Windows 6.1 is Windows 7. Which I could believe that people are posting from Windows 7 compared to 10 or 11, but sus. A lot of them are kind of weird combinations. Anubis auto flagged that one as bot/huawei-cloud, but I'm really curious as to how or why it did. It's not authentic, that's for sure, but how does it know it isn't?

    Another one is Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.64 Safari/537.36 Edg/101.0.1210.47, is completely valid, but from an older Windows version and Edge, like if it was a snapshot from a few years ago. Idk it's all very interesting. That one was also flagged as bot/huawei-cloud.

    Well, pretty much all browsers fake their user agent strings for a long time now. That's for privacy reasons. Because malicious actors can single you out with all that information, the combination of IP, operating system and version and exact version of the browser and used libraries. Also they don't want to advertise if you skimped on updates and are vulnerable for exploits. And they fake more because via Javascript a website can get your screen size, resolution and all kinds of details. That's business as usual. These pieces of information still serve some purpose, occasionally... But they're more a relic from the distant past when the internet was an entirely different place without an advertisement and surveillance economy.

    And sure, shady bots fake them as well. It's rare to see correct information with anything. The server to server communication in the Fediverse for example advertises the correct name and version number. Maybe some apps as well if they're not concerned with servers being malicious.

  • Well, pretty much all browsers fake their user agent strings for a long time now. That's for privacy reasons. Because malicious actors can single you out with all that information, the combination of IP, operating system and version and exact version of the browser and used libraries. Also they don't want to advertise if you skimped on updates and are vulnerable for exploits. And they fake more because via Javascript a website can get your screen size, resolution and all kinds of details. That's business as usual. These pieces of information still serve some purpose, occasionally... But they're more a relic from the distant past when the internet was an entirely different place without an advertisement and surveillance economy.

    And sure, shady bots fake them as well. It's rare to see correct information with anything. The server to server communication in the Fediverse for example advertises the correct name and version number. Maybe some apps as well if they're not concerned with servers being malicious.

    Fair, all of that. You know as someone who really likes pristine data, I really hate that actors used such basic clean data against all of us. What should have been a simple compatibility check was completely co-opted and now essentially serves no purpose.

    Anubis I know looks at much more than that, you've made me curious to go read through their code. It's working for my instance, but I also see it warrants more research as to how it decides

  • Fair, all of that. You know as someone who really likes pristine data, I really hate that actors used such basic clean data against all of us. What should have been a simple compatibility check was completely co-opted and now essentially serves no purpose.

    Anubis I know looks at much more than that, you've made me curious to go read through their code. It's working for my instance, but I also see it warrants more research as to how it decides

    I think the main point is to load a JavaScript and make the client perform a proof-of-work. That's also the mechanism they advertise with.

    The additional (default) heuristics seem to be here looks like they match for some known user agents and header strings and there are some IP address ranges in there. And some exception to allow-list important stuff.

  • I think the main point is to load a JavaScript and make the client perform a proof-of-work. That's also the mechanism they advertise with.

    The additional (default) heuristics seem to be here looks like they match for some known user agents and header strings and there are some IP address ranges in there. And some exception to allow-list important stuff.

    Yeah I'm seeing that too, which is what I thought but it didn't talk to much about the blocklists which is interesting. Overall very simple concept to me, if you want to access the site then great, prove that you're willing to work for it. I've worked for large scraping farms before and for the vast majority they would rather give up than keep doing that over and over. Compute for them is expensive. What takes a few seconds on our machine is tons of wasted compute, which I think is why I get so giddy over it - I love having them waste their money

  • Yeah I'm seeing that too, which is what I thought but it didn't talk to much about the blocklists which is interesting. Overall very simple concept to me, if you want to access the site then great, prove that you're willing to work for it. I've worked for large scraping farms before and for the vast majority they would rather give up than keep doing that over and over. Compute for them is expensive. What takes a few seconds on our machine is tons of wasted compute, which I think is why I get so giddy over it - I love having them waste their money

    Sure. It's a clever idea. I was mainly concerned with the side-effects like removing all information from Google as well, so the modern dynamics Cory Doctorow calls "enshittification" of the internet. And I'm having a hard time. I do development work and occasionally archive content, download videos or transform the stupid website with local events into an RSS feed. I'm getting rate-limited, excluded and blocked left and right. Doing automatic things with websites has turned from 10 lines of Python into an entire ordeal with loading a headless Chromium into an entire Gigabyte or more of RAM, having some auto-clickers dismiss the cookie banners and overlays, do the proof-of-work... I think it turns the internet from an open market of information into something where information isn't indexed, walled and generally unavailable. It's also short-lived and cant be archived or innovated upon any more. Or at least just with lots of limitations. I think that's the flipside. But it has two sides. It's a clever approach. And it works well for what it's intended to do. And it's a welcome alternative to everyone doing the same thing with Cloudflare as the central service provider. And I guess it depends on what exactly people do. Limiting a Fediverse instance isn't the same as doing it with other platforms and websites. They have another channel to spread information, at least amongst themselves. So I guess lots of my criticism doesn't apply as harshly as I've worded it. But I'm generally a bit sad about the general trend.

  • cross-posted from: https://poptalk.scrubbles.tech/post/3263324

    Sorry for the alarming title but, Admins for real, go set up Anubis.

    For context, Anubis is essentially a gatekeeper/rate limiter for small services. From them:

    (Anubis) is designed to help protect the small internet from the endless storm of requests that flood in from AI companies. Anubis is as lightweight as possible to ensure that everyone can afford to protect the communities closest to them.

    It puts forward a challenge that must be solved in order to gain access, and judges how trustworthy a connection is. For the vast majority of real users they will never notice, or will notice a small delay accessing your site the first time. Even smaller scrapers may get by relatively easily.

    For big scrapers though, AI and trainers, they get hit with computational problems that waste their compute before being let in. (Trust me, I worked for a company that did "scrape the internet", and compute is expensive and a constant worry for them, so win win for us!)

    Anubis ended up taking maybe 10 minutes to set up. For Lemmy hosters you literally just point your UI proxy at Anubis and point Anubis to Lemmy UI. Very easy and slots right in, minimal setup.

    These graphs are since I turned it on less than an hour ago. I have a small instance, only a few people, and immediately my CPU usage has gone down and my requests per minute have gone down. I have already had thousands of requests challenged, I had no idea I was being scraped this much! You can see they're backing off in the charts.

    (FYI, this only stops the web requests, so it does nothing to the API or federation. Those are proxied elsewhere, so it really does only target web scrapers).

    Mm, this pushed me toward installing Anubis for activitypub.space.

    I set it up, and reloaded the nginx config... And I have no idea whether it's working or not lol

    I'm not being challenged 🤔

  • First step is check the logs in the Anubis container, if you see logs then it's intercepting requests! If you're able to access your site still, then congrats you have it set up!

    If you'd like I'm happy to look at your proxy config and let you know my thoughts. Either here or matrix I'm scrubbles@halflings.chat

  • First step is check the logs in the Anubis container, if you see logs then it's intercepting requests! If you're able to access your site still, then congrats you have it set up!

    If you'd like I'm happy to look at your proxy config and let you know my thoughts. Either here or matrix I'm scrubbles@halflings.chat

    I installed Anubis via the .deb file so it's not in a container. I just don't know where the logs are being saved.

    Journalctl maybe...


Gli ultimi otto messaggi ricevuti dalla Federazione
Post suggeriti
  • Anarres del 17 ottobre.

    Uncategorized
    1
    0 Votes
    1 Posts
    0 Views
    Anarres del 17 ottobre. Sharm el sheik: l’incoronazione di Trump. Migranti: un genocidio Made in Italy. Francesco dAssisi e il fascismo…@anarchia ll podcast del nostro viaggio del venerdì su Anarres, il pianeta delle utopie concrete. Dalle 11 alle 13 sui 105,250 delle libere frequenze di Blackout. Anche in
  • 0 Votes
    1 Posts
    0 Views
    di Idea Vilarino (Montevideo 1920 – 2009)Tra le tue bracciatra le mie bracciatra le lenzuola morbidetra la notteteneri ... https://cctm.website/idea-vilarino-tra-le-tue-braccia/#ideavilarino #poesia #amore #cctmwebsite #anoipiaceleggere #leggere
  • 0 Votes
    18 Posts
    0 Views
    @devol @storiespettinate @_elena @informapirata @max @TeresaPotenza E certo. Come si può pensare che un'associazione pro-aborto abbia seguaci su un servizio (l'ex twitter) ormai regno dei MAGA?
  • 0 Votes
    2 Posts
    0 Views
    @stefano Using cloud directly for controlling / managing home electronics looks technically silly for me.On the other hand, using cloud for backups, authentications for to-be-regintered managing device (i.e., PCs and/or smartphones. Ah, of course relyable / dependable mainframes whenever possible!) that the device really owned and handled by the owner of managed devices would be nice.And of course, the home electronic devices should be controlled independently for safety (if, for example, if PID control of the temperature of bathes is done fully via cloud...! Should be nightmare on cloud outages).