Wow. For a few months, I was wondering why I suddenly have bandwidth issues when activating my camera in MS Teams meetings, so others can't understand me any more.
A look into my #nginx logs seems to clarify. Bots are eagerly fetching my (partially pretty large) #poudriere build logs. 🧐 (#AI "watching shit scroll by"?)
I see GPTBot at least occassionally requests robots.txt, which I don't have so far. Other bots don't seem to be interested. Especially PetalBot is hammering my server. And there are others (bytedance, google, ...)
Now what? Robots.txt would actually *help* well-behaved bots here (I assume build logs aren't valuable for anything). The most pragmatic thing here would be to add some http basic auth in the reverse proxy for all poudriere stuff. It's currently only public because there's no reason to keep it private....
Have to admit I feel inclined to try one of the tarpitting/poisoning approaches, too. 😏