Web design in the early 2000s: Every 100ms of latency on page load costs visitors.
-
@hex0x93 @zeborah @david_chisnall I hear you as I get annoyed, too. I believe ours is the one with the tick box, so no stupid 'Choose the bicycles' or rejection because you use a VPN.
@jackyan @zeborah @david_chisnall I love that!❤️❤️
-
@jackyan @zeborah @david_chisnall I love that!❤️❤️
@hex0x93 I try to use the "Managed Challenge" on CF which tests the browser and often "solves itself" within a second or so (wiggling the mouse might help with that, I'm not sure). The checkbox only appears when that fails. I try to not block anything except for the worst, known offenders. Reddit, Yelp & others are blocking me entire when I use my ad-blocking VPN on the phone — just stupid...
-
@hex0x93 I try to use the "Managed Challenge" on CF which tests the browser and often "solves itself" within a second or so (wiggling the mouse might help with that, I'm not sure). The checkbox only appears when that fails. I try to not block anything except for the worst, known offenders. Reddit, Yelp & others are blocking me entire when I use my ad-blocking VPN on the phone — just stupid...
@alexskunz @jackyan @zeborah @david_chisnall that's cool, and those do work sometimes. What you say about reddit and stuff not working is my everyday, online life. I chose it, still annoying, but I guess it is like in life...the few bad people ruin it for everyone😜😜
Sometimes I think I am just paranoid...can't help it😅 -
@hex0x93 I try to use the "Managed Challenge" on CF which tests the browser and often "solves itself" within a second or so (wiggling the mouse might help with that, I'm not sure). The checkbox only appears when that fails. I try to not block anything except for the worst, known offenders. Reddit, Yelp & others are blocking me entire when I use my ad-blocking VPN on the phone — just stupid...
@alexskunz @hex0x93 @zeborah @david_chisnall Yes, thatʼs the one I use.
-
Web design in the early 2000s: Every 100ms of latency on page load costs visitors.
Web design in the late 2020s: Let's add a 10-second delay while Cloudflare checks that you are capable of ticking a checkbox in front of every page load.
@david_chisnall and the same for all software. Layers and layers of crap
-
Web design in the early 2000s: Every 100ms of latency on page load costs visitors.
Web design in the late 2020s: Let's add a 10-second delay while Cloudflare checks that you are capable of ticking a checkbox in front of every page load.
@david_chisnall
But i LOVE finding which of 12 images has a zebra crossing in... 😳😱🤣 -
The thing is, you don't a CAPTCHA. Just three if statements on the server will do it:
1. If the user agent is chrome, but it didn't send a "Sec-Ch-Ua" header: Send garbage.
2. If the user agent is a known scraper ("GPTBot", etc): Send garbage.
3. If the URL is one we generated: Send garbage.
4. Otherwise, serve the page.The trick is that instead of blocking them, serve them randomly generated garbage pages.
Each of these pages includes links that will always return garbage. Once these get into the bot's crawler queue, they will be identifiable regardless of how well they hide themselves.
I use this on my site: after a few months, it's 100% effective. Every single scraper request is being blocked. At this point, I could ratelimit the generated URLs, but I enjoy sending them unhinged junk. (... and it's actually cheaper then serving static files!)
This won't do anything about vuln scanners and other non-crawler bots, but those are easy enough to filter out anyway. (URL starts with /wp/?)
@nothacking
Wdyt of this approach?> Connections are dropped (status code 444), rather than sending a 4xx HTTP response.
> Why waste our precious CPU cycles and bandwidth? Instead, let the robot keep a connection open waiting for a reply from us. -
Web design in the early 2000s: Every 100ms of latency on page load costs visitors.
Web design in the late 2020s: Let's add a 10-second delay while Cloudflare checks that you are capable of ticking a checkbox in front of every page load.
@david_chisnall yep 💯 frustrating 😞
-
Web design in the early 2000s: Every 100ms of latency on page load costs visitors.
Web design in the late 2020s: Let's add a 10-second delay while Cloudflare checks that you are capable of ticking a checkbox in front of every page load.
@david_chisnall crying emoji
-
@david_chisnall This was when the tech bros realized that it is all in comparison to everything else.
If you just make EVERYTHING worse then it doesn't matter that you're bad.
The real story of computing (and perhaps all consumer goods)
@hp @david_chisnall Sounds like finding a candidate to vote for, to be honest...
-
@Laberpferd @david_chisnall proof of work is such a bad CAPTCHA. Like, who thought bots couldn't evaluate JS
@vendelan
The idea is not that they can't, it's that they won't.
If you're a human visiting a website, evaluating some JS at worst costs you a few seconds. If you're a scraper bot trying to get millions of sites a second, it slows you down. -
Web design in the early 2000s: Every 100ms of latency on page load costs visitors.
Web design in the late 2020s: Let's add a 10-second delay while Cloudflare checks that you are capable of ticking a checkbox in front of every page load.
@david_chisnall and then webpages that load a dummy front end, because the real front end takes 15s to load. So then you click the search box and start typing type, and the characters end up in a random order when the real search box loads
-
@nothacking
Wdyt of this approach?> Connections are dropped (status code 444), rather than sending a 4xx HTTP response.
> Why waste our precious CPU cycles and bandwidth? Instead, let the robot keep a connection open waiting for a reply from us.@bertkoor Well, the advantage of sending junk is it makes crawlers trivially identifiable. That avoids the need for tricks like these:
> Other user-agents (hopefully all human!) get a cookie-check. e.g. Chrome, Safari, Firefox.
That still increases loading time. Even if the "CAPTCHA" is small, it'll still take several round trips to deliver.
... of course once they've been feed poisoned URLs, they you can start blocking.
-
Web design in the early 2000s: Every 100ms of latency on page load costs visitors.
Web design in the late 2020s: Let's add a 10-second delay while Cloudflare checks that you are capable of ticking a checkbox in front of every page load.
@david_chisnall The horrible delays were there way before CloudFlare. I use a lot of big company web services at work daily, most of them load 10+ seconds even with a gigabit Internet and a fast computer. They're totally miserable with a mobile connection. Every time I look the page sources, it just get sad and angry how relatively simple web GUIs have been implemented by pouring all kinds of libraries and frameworks to cause the browser tab to suck a gigabyte to show me couple of dropdowns and entry fields.
-
Web design in the early 2000s: Every 100ms of latency on page load costs visitors.
Web design in the late 2020s: Let's add a 10-second delay while Cloudflare checks that you are capable of ticking a checkbox in front of every page load.
@david_chisnall And another 10 seconds because somebody had the great idea that it would be smart to load something like 500 MB of JavaScript for a page with just text.
-
Web design in the early 2000s: Every 100ms of latency on page load costs visitors.
Web design in the late 2020s: Let's add a 10-second delay while Cloudflare checks that you are capable of ticking a checkbox in front of every page load.
@david_chisnall we notice you are using an ad blocker
-
@the_wub They check user-agent and challenge anything that claims to be Mozilla (because that's what the majority of bots masquerade as).
Also, weird that Seamonkey can't pass it – I just tried with Servo, and it had no problems.
@jernej__s 1/n SeaMonkey is still based on an ancient version of the Firefox codebase.
I love the email client, and the browser has the tabs in the right place, but the browser fails to work on features in modern web sites. Ones that do not fall back gracefully.
I presume that this is what causes Abubis challenges to fail when using SeaMonkey.
I can get into Mozillazine without any Anubis challenge appearing using Netsurf. Which has a limited implementation of javascript.
-
@jernej__s 1/n SeaMonkey is still based on an ancient version of the Firefox codebase.
I love the email client, and the browser has the tabs in the right place, but the browser fails to work on features in modern web sites. Ones that do not fall back gracefully.
I presume that this is what causes Abubis challenges to fail when using SeaMonkey.
I can get into Mozillazine without any Anubis challenge appearing using Netsurf. Which has a limited implementation of javascript.
@jernej__s 2/n So I installed NoScript in SeaMonkey to see if it is a javascript issue.
With javascript turned off I get this message.
"Sadly, you must enable JavaScript to get past this challenge. This is required because AI companies have changed the social contract around how website hosting works. A no-JS solution is a work-in-progress."
So being blocked now from using an add-on to protect myself from malicious scripts on websites.
OK so I will now whitelist Mozillazine.
-
@jernej__s 2/n So I installed NoScript in SeaMonkey to see if it is a javascript issue.
With javascript turned off I get this message.
"Sadly, you must enable JavaScript to get past this challenge. This is required because AI companies have changed the social contract around how website hosting works. A no-JS solution is a work-in-progress."
So being blocked now from using an add-on to protect myself from malicious scripts on websites.
OK so I will now whitelist Mozillazine.
@jernej__s 3/n
Aha! A message I did not get the last time I got stuck trying to get into Mozillazine using SeaMonkey."Your browser is configured to disable cookies. Anubis requires cookies for the legitimate interest of making sure you are a valid client. Please enable cookies for this domain."
(But SM is set to accept all cookies.)
So in order for websites to protect themselves from AI scraping users have to reduce the level of security they are prepared to accept as safe when browsing.
-
@jernej__s 3/n
Aha! A message I did not get the last time I got stuck trying to get into Mozillazine using SeaMonkey."Your browser is configured to disable cookies. Anubis requires cookies for the legitimate interest of making sure you are a valid client. Please enable cookies for this domain."
(But SM is set to accept all cookies.)
So in order for websites to protect themselves from AI scraping users have to reduce the level of security they are prepared to accept as safe when browsing.
@jernej__s n/n
Or to go through processes of whitelisting all of the relevant sites that you wish to visit as safe so that Anubis can validate your browser whilst otherwise disabling cookies and javascript for other sites.Or just go find other sites to visit that do not assume you are a bot and block you from viewing content.
As regards SM being set to accept cookies and Anubis not recognising that maybe my pihole is blocking something that Anubis expects to find in a valid client?