Skip to content
0
  • Home
  • Piero Bosio
  • Blog
  • World
  • Fediverso
  • News
  • Categories
  • Old Web Site
  • Recent
  • Popular
  • Tags
  • Users
  • Home
  • Piero Bosio
  • Blog
  • World
  • Fediverso
  • News
  • Categories
  • Old Web Site
  • Recent
  • Popular
  • Tags
  • Users
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse

Piero Bosio Social Web Site Personale Logo Fediverso

Social Forum federato con il resto del mondo. Non contano le istanze, contano le persone
algernon in a ChatGPT costume (it's pure garbage)undefined

algernon in a ChatGPT costume (it's pure garbage)

@algernon@come-from.mad-scientist.club
About
Posts
3
Topics
0
Shares
0
Groups
0
Followers
0
Following
0

View Original

Posts

Recent

  • Great, my home server is being hammered and my connection has dropped dramatically in performance.
    algernon in a ChatGPT costume (it's pure garbage)undefined algernon in a ChatGPT costume (it's pure garbage)

    @ansuz @oblomov Looking at my logs, most RSS readers are unaffected: they either use their own user agent, and don't try to pretend to be Firefox or Chrome, or they are running within a real browser, in which case the expected headers will be there.

    Quick look at my logs from yesterday:

    • 2582 total requests against atom.xml on my blog.

    • 105 unique user agents

    • Only 24 of those user agents had Chrome/ or Firefox/ in their user agent

    • These 24 made 165 requests total.

    • Out of that 165, 54 did not have sec-fetch-mode.

    • Out of those 54, the majority came from either Cloudflare or Amazon, or another cloud provider.

    • Still out of those 54, 21 pretended to be Firefox, but the user agent wasn't what a real browser sends: Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/115.0 - in real browsers, rv and the Firefox/ version match. (All 21 were from Cloudflare IPs too)

    • Still out of the 54, 27 pretended to be Chrome, but did not send a sec-ch-ua header, nor sec-fetch-mode, and they said they're Chrome/84.0.4147.105 from 2020 - coming from Amazon AWS. I don't believe for a second these would be real browsers.

    This leaves us with 6 requests that may have come from legit browsers. Five of those were Chrome on Android, coming from a DigitalOcean IP, without sec-fetch-mode or sec-ch-ua. I don't think those were legit.

    There was one Firefox/, coming from an American residential IP, without sec-fetch-mode... that might have been legit, maybe?

    But out of 2.5k requests, 1 false positive1 is, imo, acceptable.

    Of course, what's acceptable varies a lot, and the people who visit (or rather, subscribe to) my blog are likely a bit atypical.

    What I'm trying to convey here is that the majority of RSS readers don't pretend to be Firefox or Chrome, or - because they're running in one - send the appropriate headers anyway.


    1. It is likely a false positive, that IP made a single request the entire day. ↩︎

    Uncategorized

  • Great, my home server is being hammered and my connection has dropped dramatically in performance.
    algernon in a ChatGPT costume (it's pure garbage)undefined algernon in a ChatGPT costume (it's pure garbage)

    @oblomov @ansuz I'm not an Apache person, but this module might do the trick.

    I also have about a week's worth of logs from mid-April this year, iirc, with full headers, but I'll have to double check. The bots haven't changed much since. If that'd be useful for you, I'll go and figure out where I put them... they're somewhere on my storage server, just gotta find which bucket.

    Uncategorized

  • Great, my home server is being hammered and my connection has dropped dramatically in performance.
    algernon in a ChatGPT costume (it's pure garbage)undefined algernon in a ChatGPT costume (it's pure garbage)

    @oblomov @ansuz It's even easier than that, and most bots can be caught on the first request: if the user-agent contains Firefox/ or Chrome/, and you're serving on HTTPS, the request will1 contain a sec-fetch-mode header too, when coming from a real browser. Bots don't send it.

    Pair it with blocking agents listed in ai.robots.txt, and ~90% of your bot traffic is gone. If you can afford to block Huawei's and Alibaba's ASNs, you pretty much got rid of all of them.

    Many of the bots do download CSS, and some even fetch the JS too, by the way. And images? Some of them love 'em.


    1. Exceptions apply: if you put a page in Reader Mode in Firefox, and reload while in reader mode, no sec-fetch-mode is sent. There are also some applications like gnome-podcasts that uses a Firefox user-agent, but doesn't send sec-fetch-mode. While there will be false positives, most of them can be worked around, and the gain of catching all the lame bots far outweights the cons, imo. ↩︎

    Uncategorized
  • 1 / 1
  • Login

  • Login or register to search.
  • First post
    Last post