I'm trying to work on the fluconf website, but I keep getting nerd-sniped by the behaviour of the scrapers that are already hitting the not-yet-officially-public site.
In my follow-up to the AI-scrapers article I mentioned that there are actors who monitor certificate transparency logs for new domains to crawl/probe, but I've never really paid close attention to the resources that get targeted first.
It's hard to know for sure because they spoof their user-agent strings and do their best to look like legitimate traffic, but it looks like OpenAI, Nomic.ai, and some other minor AI companies are using this technique.