@TodePond what have you done
https://github.com/TodePond/GulfOfMexico
jonny (good kind)
Posts
-
@TodePond what have you donehttps://github.com/TodePond/GulfOfMexico -
sysadmins/webmasters of fedi:@ollibaba
@ansuz
Yeah some of the bots get stuck in nepenthes and never come out. There have been some tencent bots rattling around in there for months last I checked. We get very little bot traffic on sciop, when I watch the request logs its mostly requests for RSS feeds from torrent clients and I very rarely see the kind of scraping activity I see on my other sites.I think because
- the bots hit the domain root first which only has the hidden crawler link in it and top-level nav links
- each of the content-bearing index pages they would find are two-step lazy loads, where htmx triggers the load of more links after an initial page load, and only a subset of the crawlers seem to always have/start with full-browser emulation
- many of the crawlers seem to do a "second pass" with a browser emulator if they complete the domain quickly or hit some block, I can't tell if they always do this or if its only below some threshold page count or something.
- however since they are still crawling the sweet sweet tarpit, which is served under the same domain from a different machine so it just looks like normal pages, they seem perfectly content to just chow on that and dont seem to try and come back to the main site at least for awhile.
The tarpit is quite soothing, we have it trained on a combination of WWE announcer transcripts and Kropotkin's mutual aid among some other texts: https://sciop.net/crawlers/
Anecdotally, and I haven't tested this in a serious way, but having any kind of block seems to make it worse, since active countermeasures are a decent signal that you have some juicy human text in there you're trying to protect. When I put user agent blocks on my forgejo instance I noticed a substantial increase in traffic.
Also, p much all of the Anubis stuff was done by @ashley , I just watch the logs on sciop
-
sysadmins/webmasters of fedi:@ansuz
We found that blocking them just leads them to return with another IP but not follow the bait, redirecting to a tarpit seems to work, however. -
where is Hunter S. Thompson when you need a "The chatGPT girlboss dinner is decadent and depraved"the ghoulish hollowness of soul to even formulate the idea "i want to be the kind of person who has written a book about how to date me, but i don't want to write it, but i do want to sell it for $4.99"
-
where is Hunter S. Thompson when you need a "The chatGPT girlboss dinner is decadent and depraved"RE: https://neuromatch.social/@jonny/115409736697961808
where is Hunter S. Thompson when you need a "The chatGPT girlboss dinner is decadent and depraved"
-
a late #Rootpost for #monsterdonyou know this room smells dusty as fuck, vampires literally never clean their houses because dust and cobwebs and old lanterns and shit are part of their whole culture #monsterdon
-
a late #Rootpost for #monsterdonsome people just live for 5,000 years, that is not a crime #monsterdon
-
a late #Rootpost for #monsterdonpeople say that vampires can't have friends because friends would notice you not aging and eventually be like "what gives" but honestly if one of my friends kept looking young forever i would just be like "nice, none of my business." you never know what someone's situation is and it's not my place to judge. #monsterdon
-
a late #Rootpost for #monsterdonthe question this movie poses is whether or not it is worth living forever trapped in a coffin in exchange for 200 years with Catherine Deneuve #monsterdon
-
a late #Rootpost for #monsterdonjust quietly watching this like it's a movie, i got nothing to add there is just good vampire stuff happening #monsterdon
-
a late #Rootpost for #monsterdonoh my fuck i love the hunger #monsterdon
-
Wow this sucks so bad.'na person posted on /r/datahoarder that they have created an archive of the Epstein files with added metadata like mentioned people and etc. But everything is LLM-generated....y'all I am done for
-
Wow this sucks so bad.'na person posted on /r/datahoarder that they have created an archive of the Epstein files with added metadata like mentioned people and etc. But everything is LLM-generated...."AI" is just so convenient. Finally we are no longer beholden to traditional OCR which requires a few clicks or commands to yield highly accurate text with predictable failure modes.
Now all you have to do is explain the entire nature of what OCR is, how "reading" works, and how "documents" as a representation of language work. If you remember to insist repeatedly that the position in which text is laid out in two dimensions impacts its representation as a string, you may yield ?????
-
Wow this sucks so bad.'na person posted on /r/datahoarder that they have created an archive of the Epstein files with added metadata like mentioned people and etc. But everything is LLM-generated....@mschfr
Ah is that the new tesseract? Can't find benchmarks (thanks ai search results), but in your experience it's more accurate? -
Wow this sucks so bad.'na person posted on /r/datahoarder that they have created an archive of the Epstein files with added metadata like mentioned people and etc. But everything is LLM-generated....Its so cool how LLMs are always finding new ways to do misinformation even when you think you are creating very meticulous archive-grade information
-
Wow this sucks so bad.'na person posted on /r/datahoarder that they have created an archive of the Epstein files with added metadata like mentioned people and etc. But everything is LLM-generated....Wow this sucks so bad.
a person posted on /r/datahoarder that they have created an archive of the Epstein files with added metadata like mentioned people and etc. But everything is LLM-generated.... Including the "full text" of the documents.
Rather than OCRing them, they were fed to chatGPT with a system prompt that told it that it was an expert at OCR.
-
since the masto dev team is shipping highly-requested features lately...since the masto dev team is shipping highly-requested features lately...
-
University campuses are so cool, there are so many cool lookin things and for some reason the doors are unlocked and you just get to go inside of them.University campuses are so cool, there are so many cool lookin things and for some reason the doors are unlocked and you just get to go inside of them. Like you can see a castle mf like this and get in it