Web design in the early 2000s: Every 100ms of latency on page load costs visitors.
-
Web design in the early 2000s: Every 100ms of latency on page load costs visitors.
Web design in the late 2020s: Let's add a 10-second delay while Cloudflare checks that you are capable of ticking a checkbox in front of every page load.
@david_chisnall Don't forget the oh so lovely "select all the squares with cars in them" with a picture of one single car and you're not quite sure if you should select the squares in the bottom middle or not.
I'm given to understand that those actually do jack squat for stopping bots... They just use tokens to straight up bypass or something.
I personally don't mind t he delay so much, but I do hate having to deal with that crap — especially when it fails and declares that I'm supposedly not human. (How the F was I supposed to know that the blurry red glob in the bottom right was supposed to be a lion? It wasn't even the right color!) The one with the catgirl and a loading bar is fine I guess. But all that other crap can take a flying leap.
-
@autiomaa So the bots have an option to bypass the captchas meant to catch bots but the humans don't. That tracks. 😩 @mark @david_chisnall
@internic That's not a bug, that's a feature!
I guess... -
@mark @david_chisnall I don't think that's actually the case, at least not entirely. The main issue is that the Internet is currently being inundated with LLM content crawlers to the point that it overwhelms websites or scrapes content some sites don't want sucked into AI training data. It has caused a massive number of sites to serve those bot-detection pages to everyone. So it's not quite an issue of too many visitors but actually "too many non-human visitors"
@danherbert @mark @david_chisnall Sadly, that is our reality. One siteʼs traffic was 75–80 per cent scraper (even back in 2023) so up went the Cloudflare blocks and challenges. (Before anyone @s me about this, Iʼm not a computer whiz so this is the only thing I know how to use.) And itʼs finally worked after figuring out which ASNs and IP addresses are the worst, with traffic on that site back to pre-2023 levels (which I know means an overall drop in ranking).
-
@david_chisnall I remember optimizing thumbnail-images to within kilobytes of their lives...
...and now apparently nobody thinks twice about requiring many MB of JS code per page-load.
(TLDR: this current nonsense is nonsense.)
@woozle @david_chisnall I still do! Old habits.
-
@hex0x93 I know nothing about Cloudflare's data practices. But I do know a lot of sites have been forced to go with Cloudflare because so many AI bots are incessantly scraping their site that the site goes down and humans can't access it - essentially AI is doing a DDOS, and when that's sustained for weeks/months/more then the Cloudflare-type system seems to be the only way to have the site actually available to humans.
I hate it but those f---ing AI bots, seriously, they are ruining the net.
@zeborah @hex0x93 @david_chisnall This pretty much describes us. Scrapers as well as brute-force hackers multiple times per hour (even literally per second). One siteʼs traffic was 75–80 per cent scraper.
-
@david_chisnall "Please wait while we check that your Browser is safe" while my laptop goes for a minute or two into full load and screaming hot
Perhaps ending in "We are sorry but we could not verify you are an actual human, your machine shows suspect behaviour, sent an e-mail to admin to get access"
@Laberpferd @david_chisnall proof of work is such a bad CAPTCHA. Like, who thought bots couldn't evaluate JS
-
@zeborah @hex0x93 @david_chisnall This pretty much describes us. Scrapers as well as brute-force hackers multiple times per hour (even literally per second). One siteʼs traffic was 75–80 per cent scraper.
@jackyan @zeborah @david_chisnall and it is totally understandable to protect yourself against that. It is just super annoying for ppl like me, who value and protect their privacy.
An I am no webscraper, nor am I a hacker.... -
@jackyan @zeborah @david_chisnall and it is totally understandable to protect yourself against that. It is just super annoying for ppl like me, who value and protect their privacy.
An I am no webscraper, nor am I a hacker....@hex0x93 @zeborah @david_chisnall I hear you as I get annoyed, too. I believe ours is the one with the tick box, so no stupid 'Choose the bicycles' or rejection because you use a VPN.
-
@hex0x93 @zeborah @david_chisnall I hear you as I get annoyed, too. I believe ours is the one with the tick box, so no stupid 'Choose the bicycles' or rejection because you use a VPN.
@jackyan @zeborah @david_chisnall I love that!❤️❤️
-
@jackyan @zeborah @david_chisnall I love that!❤️❤️
@hex0x93 I try to use the "Managed Challenge" on CF which tests the browser and often "solves itself" within a second or so (wiggling the mouse might help with that, I'm not sure). The checkbox only appears when that fails. I try to not block anything except for the worst, known offenders. Reddit, Yelp & others are blocking me entire when I use my ad-blocking VPN on the phone — just stupid...
-
@hex0x93 I try to use the "Managed Challenge" on CF which tests the browser and often "solves itself" within a second or so (wiggling the mouse might help with that, I'm not sure). The checkbox only appears when that fails. I try to not block anything except for the worst, known offenders. Reddit, Yelp & others are blocking me entire when I use my ad-blocking VPN on the phone — just stupid...
@alexskunz @jackyan @zeborah @david_chisnall that's cool, and those do work sometimes. What you say about reddit and stuff not working is my everyday, online life. I chose it, still annoying, but I guess it is like in life...the few bad people ruin it for everyone😜😜
Sometimes I think I am just paranoid...can't help it😅 -
@hex0x93 I try to use the "Managed Challenge" on CF which tests the browser and often "solves itself" within a second or so (wiggling the mouse might help with that, I'm not sure). The checkbox only appears when that fails. I try to not block anything except for the worst, known offenders. Reddit, Yelp & others are blocking me entire when I use my ad-blocking VPN on the phone — just stupid...
@alexskunz @hex0x93 @zeborah @david_chisnall Yes, thatʼs the one I use.
-
Web design in the early 2000s: Every 100ms of latency on page load costs visitors.
Web design in the late 2020s: Let's add a 10-second delay while Cloudflare checks that you are capable of ticking a checkbox in front of every page load.
@david_chisnall and the same for all software. Layers and layers of crap
-
Web design in the early 2000s: Every 100ms of latency on page load costs visitors.
Web design in the late 2020s: Let's add a 10-second delay while Cloudflare checks that you are capable of ticking a checkbox in front of every page load.
@david_chisnall
But i LOVE finding which of 12 images has a zebra crossing in... 😳😱🤣 -
The thing is, you don't a CAPTCHA. Just three if statements on the server will do it:
1. If the user agent is chrome, but it didn't send a "Sec-Ch-Ua" header: Send garbage.
2. If the user agent is a known scraper ("GPTBot", etc): Send garbage.
3. If the URL is one we generated: Send garbage.
4. Otherwise, serve the page.The trick is that instead of blocking them, serve them randomly generated garbage pages.
Each of these pages includes links that will always return garbage. Once these get into the bot's crawler queue, they will be identifiable regardless of how well they hide themselves.
I use this on my site: after a few months, it's 100% effective. Every single scraper request is being blocked. At this point, I could ratelimit the generated URLs, but I enjoy sending them unhinged junk. (... and it's actually cheaper then serving static files!)
This won't do anything about vuln scanners and other non-crawler bots, but those are easy enough to filter out anyway. (URL starts with /wp/?)
@nothacking
Wdyt of this approach?> Connections are dropped (status code 444), rather than sending a 4xx HTTP response.
> Why waste our precious CPU cycles and bandwidth? Instead, let the robot keep a connection open waiting for a reply from us. -
Web design in the early 2000s: Every 100ms of latency on page load costs visitors.
Web design in the late 2020s: Let's add a 10-second delay while Cloudflare checks that you are capable of ticking a checkbox in front of every page load.
@david_chisnall yep 💯 frustrating 😞
-
Web design in the early 2000s: Every 100ms of latency on page load costs visitors.
Web design in the late 2020s: Let's add a 10-second delay while Cloudflare checks that you are capable of ticking a checkbox in front of every page load.
@david_chisnall crying emoji
-
@david_chisnall This was when the tech bros realized that it is all in comparison to everything else.
If you just make EVERYTHING worse then it doesn't matter that you're bad.
The real story of computing (and perhaps all consumer goods)
@hp @david_chisnall Sounds like finding a candidate to vote for, to be honest...
-
@Laberpferd @david_chisnall proof of work is such a bad CAPTCHA. Like, who thought bots couldn't evaluate JS
@vendelan
The idea is not that they can't, it's that they won't.
If you're a human visiting a website, evaluating some JS at worst costs you a few seconds. If you're a scraper bot trying to get millions of sites a second, it slows you down. -
Web design in the early 2000s: Every 100ms of latency on page load costs visitors.
Web design in the late 2020s: Let's add a 10-second delay while Cloudflare checks that you are capable of ticking a checkbox in front of every page load.
@david_chisnall and then webpages that load a dummy front end, because the real front end takes 15s to load. So then you click the search box and start typing type, and the characters end up in a random order when the real search box loads