Skip to content

Piero Bosio Social Web Site Personale Logo Fediverso

Social Forum federato con il resto del mondo. Non contano le istanze, contano le persone

Great, my home server is being hammered and my connection has dropped dramatically in performance.

Uncategorized
17 5 0
  • @oblomov RIght, if one assume it's 146.174.128.0/17 and 202.76.128.0/17 - they are both Huawei cloud.

    Which I note I was already banning and I wonder why.

    @Uilebheist OK maybe I exaggerated for this 8-D

  • Oblomovundefined Oblomov shared this topic
  • I've given the fail2ban conf doc a quick read, but this doesn't seem to be easy to detect, and it would be of limited use probably, unless the detection is done at a wider subnet level.

    @oblomov my approach involves a nodejs service and applies a chain of fairly complicated rules to categorize each one.

    Depending on how different requests are classified it then writes offending IPs to a different log which fail2ban follows. I don't think I could accomplish the same with fail2ban alone, or at least if I could it would be much less readable.

    Still, the write-to-a-log-to-ban is a nice API and I appreciate that fail2ban handles the rest of the details with so little attention.

  • @oblomov my approach involves a nodejs service and applies a chain of fairly complicated rules to categorize each one.

    Depending on how different requests are classified it then writes offending IPs to a different log which fail2ban follows. I don't think I could accomplish the same with fail2ban alone, or at least if I could it would be much less readable.

    Still, the write-to-a-log-to-ban is a nice API and I appreciate that fail2ban handles the rest of the details with so little attention.

    @ansuz that's very useful information, thanks.

  • Gone for a manual ban for the time being, but I'm going to have to look into something more sophisticated. I'll probably take some ideas from @ansuz
    https://cryptography.dog/blog/AI-scrapers-request-commented-scripts/
    but with all the IP hopping they do I wonder how effective it could be.

    The pattern between a real user and a bot is actually pretty easy to detect “in principle”: UAs for actual users after fetching a web page also fetch the associated auxiliary files (CSS, possibly JS). These bots don't even do that.

    @oblomov @ansuz It's even easier than that, and most bots can be caught on the first request: if the user-agent contains Firefox/ or Chrome/, and you're serving on HTTPS, the request will1 contain a sec-fetch-mode header too, when coming from a real browser. Bots don't send it.

    Pair it with blocking agents listed in ai.robots.txt, and ~90% of your bot traffic is gone. If you can afford to block Huawei's and Alibaba's ASNs, you pretty much got rid of all of them.

    Many of the bots do download CSS, and some even fetch the JS too, by the way. And images? Some of them love 'em.


    1. Exceptions apply: if you put a page in Reader Mode in Firefox, and reload while in reader mode, no sec-fetch-mode is sent. There are also some applications like gnome-podcasts that uses a Firefox user-agent, but doesn't send sec-fetch-mode. While there will be false positives, most of them can be worked around, and the gain of catching all the lame bots far outweights the cons, imo. ↩︎

  • @oblomov @ansuz It's even easier than that, and most bots can be caught on the first request: if the user-agent contains Firefox/ or Chrome/, and you're serving on HTTPS, the request will1 contain a sec-fetch-mode header too, when coming from a real browser. Bots don't send it.

    Pair it with blocking agents listed in ai.robots.txt, and ~90% of your bot traffic is gone. If you can afford to block Huawei's and Alibaba's ASNs, you pretty much got rid of all of them.

    Many of the bots do download CSS, and some even fetch the JS too, by the way. And images? Some of them love 'em.


    1. Exceptions apply: if you put a page in Reader Mode in Firefox, and reload while in reader mode, no sec-fetch-mode is sent. There are also some applications like gnome-podcasts that uses a Firefox user-agent, but doesn't send sec-fetch-mode. While there will be false positives, most of them can be worked around, and the gain of catching all the lame bots far outweights the cons, imo. ↩︎

    @algernon @ansuz that's useful information too, thanks. I'm actually considering collecting more information about the request headers in general to see if there's other subtle hints about them. Is there a way to tell apache to log all request headers for every request? At least while debugging it'd come in handy.

  • @algernon @ansuz that's useful information too, thanks. I'm actually considering collecting more information about the request headers in general to see if there's other subtle hints about them. Is there a way to tell apache to log all request headers for every request? At least while debugging it'd come in handy.

    @oblomov @ansuz I'm not an Apache person, but this module might do the trick.

    I also have about a week's worth of logs from mid-April this year, iirc, with full headers, but I'll have to double check. The bots haven't changed much since. If that'd be useful for you, I'll go and figure out where I put them... they're somewhere on my storage server, just gotta find which bucket.

  • @oblomov @ansuz I'm not an Apache person, but this module might do the trick.

    I also have about a week's worth of logs from mid-April this year, iirc, with full headers, but I'll have to double check. The bots haven't changed much since. If that'd be useful for you, I'll go and figure out where I put them... they're somewhere on my storage server, just gotta find which bucket.

    @algernon @ansuz thanks, that looks exactly like what I needed. I think I have enough scrapers attacking me these days that I hopefully won't need other people's logs ;-)

  • @oblomov @ansuz It's even easier than that, and most bots can be caught on the first request: if the user-agent contains Firefox/ or Chrome/, and you're serving on HTTPS, the request will1 contain a sec-fetch-mode header too, when coming from a real browser. Bots don't send it.

    Pair it with blocking agents listed in ai.robots.txt, and ~90% of your bot traffic is gone. If you can afford to block Huawei's and Alibaba's ASNs, you pretty much got rid of all of them.

    Many of the bots do download CSS, and some even fetch the JS too, by the way. And images? Some of them love 'em.


    1. Exceptions apply: if you put a page in Reader Mode in Firefox, and reload while in reader mode, no sec-fetch-mode is sent. There are also some applications like gnome-podcasts that uses a Firefox user-agent, but doesn't send sec-fetch-mode. While there will be false positives, most of them can be worked around, and the gain of catching all the lame bots far outweights the cons, imo. ↩︎

    @algernon @oblomov this aligns closely with my experience

    So far I don't block on the absence of those specific headers because I want RSS readers to be able to get through. For the most part they should mainly be fetching the feed URL and maybe the site's favicon, but there are exceptions as you noted.

    Some reader software will fetch arbitrary pages (at the user's request) and check for the existence of a <link rel="alternate"> tag. Since I strongly encourage readers to follow via RSS I'd hate to ban them when they try to do so 😅

  • I've given the fail2ban conf doc a quick read, but this doesn't seem to be easy to detect, and it would be of limited use probably, unless the detection is done at a wider subnet level.

    @oblomov i have created a /llm and the bots love it.

    Behind that /llm is quixotic and link-maze
    https://marcusb.org/hacks/quixotic.html

    DM if you want me to share my setup and how I poison the well.

  • @algernon @oblomov this aligns closely with my experience

    So far I don't block on the absence of those specific headers because I want RSS readers to be able to get through. For the most part they should mainly be fetching the feed URL and maybe the site's favicon, but there are exceptions as you noted.

    Some reader software will fetch arbitrary pages (at the user's request) and check for the existence of a <link rel="alternate"> tag. Since I strongly encourage readers to follow via RSS I'd hate to ban them when they try to do so 😅

    @ansuz @oblomov Looking at my logs, most RSS readers are unaffected: they either use their own user agent, and don't try to pretend to be Firefox or Chrome, or they are running within a real browser, in which case the expected headers will be there.

    Quick look at my logs from yesterday:

    • 2582 total requests against atom.xml on my blog.

    • 105 unique user agents

    • Only 24 of those user agents had Chrome/ or Firefox/ in their user agent

    • These 24 made 165 requests total.

    • Out of that 165, 54 did not have sec-fetch-mode.

    • Out of those 54, the majority came from either Cloudflare or Amazon, or another cloud provider.

    • Still out of those 54, 21 pretended to be Firefox, but the user agent wasn't what a real browser sends: Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/115.0 - in real browsers, rv and the Firefox/ version match. (All 21 were from Cloudflare IPs too)

    • Still out of the 54, 27 pretended to be Chrome, but did not send a sec-ch-ua header, nor sec-fetch-mode, and they said they're Chrome/84.0.4147.105 from 2020 - coming from Amazon AWS. I don't believe for a second these would be real browsers.

    This leaves us with 6 requests that may have come from legit browsers. Five of those were Chrome on Android, coming from a DigitalOcean IP, without sec-fetch-mode or sec-ch-ua. I don't think those were legit.

    There was one Firefox/, coming from an American residential IP, without sec-fetch-mode... that might have been legit, maybe?

    But out of 2.5k requests, 1 false positive1 is, imo, acceptable.

    Of course, what's acceptable varies a lot, and the people who visit (or rather, subscribe to) my blog are likely a bit atypical.

    What I'm trying to convey here is that the majority of RSS readers don't pretend to be Firefox or Chrome, or - because they're running in one - send the appropriate headers anyway.


    1. It is likely a false positive, that IP made a single request the entire day. ↩︎

  • @oblomov i have created a /llm and the bots love it.

    Behind that /llm is quixotic and link-maze
    https://marcusb.org/hacks/quixotic.html

    DM if you want me to share my setup and how I poison the well.

    @mxfraud very interesting, thanks. I am interested in these kinds of setup. Have you also considered throttling those connections too, as in only having quixotic send at a rate of like 60 bytes per second or so?

  • filobusundefined filobus shared this topic

Gli ultimi otto messaggi ricevuti dalla Federazione
Post suggeriti
  • Bending Spoons , they own WeTransfer.

    Uncategorized
    1
    0 Votes
    1 Posts
    0 Views
    Bending Spoons , they own WeTransfer. And others! Seehttps://monodes.com/predaelli/2025/11/01/bending-spoons/
  • Throwback to when I had a brace on my wrist

    Uncategorized
    1
    1
    0 Votes
    1 Posts
    0 Views
    Throwback to when I had a brace on my wrist
  • Building A Clamshell Writer Deck

    Uncategorized
    1
    1
    0 Votes
    1 Posts
    0 Views
    Building A Clamshell Writer DeckMost of us do our writing on computers these days, but the modern computing environment does present a lot of distractions. That’s let to the concept of the writer deck, a simplified device intended more specifically for word processing tasks. [Ashtf] has built a great example of the form with a modified version of the PocketMage device.The PocketMage is a clamshell PDA device that [Ashtf] has been working on for some time. It’s powered by an ESP32, hooked up to a nice e-ink display. In its basic form, it’s not the ideal device for doing serious writing work, mostly because of its tiny keyboard. However, [Ashtf] has since added external keyboard support, which completely changes the game. With the use of a small USB C to USB A adapter, you can hook up any conventional USB keyboard that you like to best attain your maximum typing speed.The result is a compact, simple device that lets you type away without distractions. If your latest fanfic isn’t coming along quickly enough because you keep losing focus to social media, perhaps this is a route you might like to go. [Ashtf] also included Markdown support so you can create richer documents on the device while operating in what is still fundamentally a text-only environment.It’s neat to build custom devices that suit your own personal productivity needs. If you dig the PocketMage, you might like to check out the design files on Github. We’ve featured some other fun writer decks before, too. Video after the break.youtube.com/embed/TXqvZpyvmZY?…hackaday.com/2025/10/31/buildi…
  • 0 Votes
    3 Posts
    0 Views
    @stefano Yeah, glad I keep my IoT stuff very isolated. It can't speak to the rest of the network at all.