Web design in the early 2000s: Every 100ms of latency on page load costs visitors.
-
@autiomaa So the bots have an option to bypass the captchas meant to catch bots but the humans don't. That tracks. đ© @mark @david_chisnall
@internic That's not a bug, that's a feature!
I guess... -
@mark @david_chisnall I don't think that's actually the case, at least not entirely. The main issue is that the Internet is currently being inundated with LLM content crawlers to the point that it overwhelms websites or scrapes content some sites don't want sucked into AI training data. It has caused a massive number of sites to serve those bot-detection pages to everyone. So it's not quite an issue of too many visitors but actually "too many non-human visitors"
@danherbert @mark @david_chisnall Sadly, that is our reality. One siteÊŒs traffic was 75â80 per cent scraper (even back in 2023) so up went the Cloudflare blocks and challenges. (Before anyone @s me about this, IÊŒm not a computer whiz so this is the only thing I know how to use.) And itÊŒs finally worked after figuring out which ASNs and IP addresses are the worst, with traffic on that site back to pre-2023 levels (which I know means an overall drop in ranking).
-
@david_chisnall I remember optimizing thumbnail-images to within kilobytes of their lives...
...and now apparently nobody thinks twice about requiring many MB of JS code per page-load.
(TLDR: this current nonsense is nonsense.)
@woozle @david_chisnall I still do! Old habits.
-
@hex0x93 I know nothing about Cloudflare's data practices. But I do know a lot of sites have been forced to go with Cloudflare because so many AI bots are incessantly scraping their site that the site goes down and humans can't access it - essentially AI is doing a DDOS, and when that's sustained for weeks/months/more then the Cloudflare-type system seems to be the only way to have the site actually available to humans.
I hate it but those f---ing AI bots, seriously, they are ruining the net.
@zeborah @hex0x93 @david_chisnall This pretty much describes us. Scrapers as well as brute-force hackers multiple times per hour (even literally per second). One siteÊŒs traffic was 75â80 per cent scraper.
-
@david_chisnall "Please wait while we check that your Browser is safe" while my laptop goes for a minute or two into full load and screaming hot
Perhaps ending in "We are sorry but we could not verify you are an actual human, your machine shows suspect behaviour, sent an e-mail to admin to get access"
@Laberpferd @david_chisnall proof of work is such a bad CAPTCHA. Like, who thought bots couldn't evaluate JS
-
@zeborah @hex0x93 @david_chisnall This pretty much describes us. Scrapers as well as brute-force hackers multiple times per hour (even literally per second). One siteÊŒs traffic was 75â80 per cent scraper.
@jackyan @zeborah @david_chisnall and it is totally understandable to protect yourself against that. It is just super annoying for ppl like me, who value and protect their privacy.
An I am no webscraper, nor am I a hacker.... -
@jackyan @zeborah @david_chisnall and it is totally understandable to protect yourself against that. It is just super annoying for ppl like me, who value and protect their privacy.
An I am no webscraper, nor am I a hacker....@hex0x93 @zeborah @david_chisnall I hear you as I get annoyed, too. I believe ours is the one with the tick box, so no stupid 'Choose the bicycles' or rejection because you use a VPN.
-
@hex0x93 @zeborah @david_chisnall I hear you as I get annoyed, too. I believe ours is the one with the tick box, so no stupid 'Choose the bicycles' or rejection because you use a VPN.
@jackyan @zeborah @david_chisnall I love that!â€ïžâ€ïž
-
@jackyan @zeborah @david_chisnall I love that!â€ïžâ€ïž
@hex0x93 I try to use the "Managed Challenge" on CF which tests the browser and often "solves itself" within a second or so (wiggling the mouse might help with that, I'm not sure). The checkbox only appears when that fails. I try to not block anything except for the worst, known offenders. Reddit, Yelp & others are blocking me entire when I use my ad-blocking VPN on the phone â just stupid...
-
@hex0x93 I try to use the "Managed Challenge" on CF which tests the browser and often "solves itself" within a second or so (wiggling the mouse might help with that, I'm not sure). The checkbox only appears when that fails. I try to not block anything except for the worst, known offenders. Reddit, Yelp & others are blocking me entire when I use my ad-blocking VPN on the phone â just stupid...
@alexskunz @jackyan @zeborah @david_chisnall that's cool, and those do work sometimes. What you say about reddit and stuff not working is my everyday, online life. I chose it, still annoying, but I guess it is like in life...the few bad people ruin it for everyoneđđ
Sometimes I think I am just paranoid...can't help itđ -
@hex0x93 I try to use the "Managed Challenge" on CF which tests the browser and often "solves itself" within a second or so (wiggling the mouse might help with that, I'm not sure). The checkbox only appears when that fails. I try to not block anything except for the worst, known offenders. Reddit, Yelp & others are blocking me entire when I use my ad-blocking VPN on the phone â just stupid...
@alexskunz @hex0x93 @zeborah @david_chisnall Yes, thatÊŒs the one I use.
-
Web design in the early 2000s: Every 100ms of latency on page load costs visitors.
Web design in the late 2020s: Let's add a 10-second delay while Cloudflare checks that you are capable of ticking a checkbox in front of every page load.
@david_chisnall and the same for all software. Layers and layers of crap
-
Web design in the early 2000s: Every 100ms of latency on page load costs visitors.
Web design in the late 2020s: Let's add a 10-second delay while Cloudflare checks that you are capable of ticking a checkbox in front of every page load.
@david_chisnall
But i LOVE finding which of 12 images has a zebra crossing in... đłđ±đ€Ł -
The thing is, you don't a CAPTCHA. Just three if statements on the server will do it:
1. If the user agent is chrome, but it didn't send a "Sec-Ch-Ua" header: Send garbage.
2. If the user agent is a known scraper ("GPTBot", etc): Send garbage.
3. If the URL is one we generated: Send garbage.
4. Otherwise, serve the page.The trick is that instead of blocking them, serve them randomly generated garbage pages.
Each of these pages includes links that will always return garbage. Once these get into the bot's crawler queue, they will be identifiable regardless of how well they hide themselves.
I use this on my site: after a few months, it's 100% effective. Every single scraper request is being blocked. At this point, I could ratelimit the generated URLs, but I enjoy sending them unhinged junk. (... and it's actually cheaper then serving static files!)
This won't do anything about vuln scanners and other non-crawler bots, but those are easy enough to filter out anyway. (URL starts with /wp/?)
@nothacking
Wdyt of this approach?> Connections are dropped (status code 444), rather than sending a 4xx HTTP response.
> Why waste our precious CPU cycles and bandwidth? Instead, let the robot keep a connection open waiting for a reply from us. -
Web design in the early 2000s: Every 100ms of latency on page load costs visitors.
Web design in the late 2020s: Let's add a 10-second delay while Cloudflare checks that you are capable of ticking a checkbox in front of every page load.
@david_chisnall yep đŻ frustrating đ
-
Web design in the early 2000s: Every 100ms of latency on page load costs visitors.
Web design in the late 2020s: Let's add a 10-second delay while Cloudflare checks that you are capable of ticking a checkbox in front of every page load.
@david_chisnall crying emoji
-
@david_chisnall This was when the tech bros realized that it is all in comparison to everything else.
If you just make EVERYTHING worse then it doesn't matter that you're bad.
The real story of computing (and perhaps all consumer goods)
@hp @david_chisnall Sounds like finding a candidate to vote for, to be honest...
-
@Laberpferd @david_chisnall proof of work is such a bad CAPTCHA. Like, who thought bots couldn't evaluate JS
@vendelan
The idea is not that they can't, it's that they won't.
If you're a human visiting a website, evaluating some JS at worst costs you a few seconds. If you're a scraper bot trying to get millions of sites a second, it slows you down. -
Web design in the early 2000s: Every 100ms of latency on page load costs visitors.
Web design in the late 2020s: Let's add a 10-second delay while Cloudflare checks that you are capable of ticking a checkbox in front of every page load.
@david_chisnall and then webpages that load a dummy front end, because the real front end takes 15s to load. So then you click the search box and start typing type, and the characters end up in a random order when the real search box loads
-
@nothacking
Wdyt of this approach?> Connections are dropped (status code 444), rather than sending a 4xx HTTP response.
> Why waste our precious CPU cycles and bandwidth? Instead, let the robot keep a connection open waiting for a reply from us.@bertkoor Well, the advantage of sending junk is it makes crawlers trivially identifiable. That avoids the need for tricks like these:
> Other user-agents (hopefully all human!) get a cookie-check. e.g. Chrome, Safari, Firefox.
That still increases loading time. Even if the "CAPTCHA" is small, it'll still take several round trips to deliver.
... of course once they've been feed poisoned URLs, they you can start blocking.