AGR Risk Intelligence

agr@pleroma.envs.net

AGR is a boutique business intelligence firm, first established in Asia, with over 30 years experience in providing strategic intelligence and risk management solutions to SMEs and multinationals.

We protect cyber, physical and financial assets and mitigate operational and information security risks to prevent fraud, economic crime, bribery, corruption and criminal business disruption. Part of our small team is now based in France.

That makes us the _good_ guys, I suppose.

Interested in #CyberSecurity, #InfoSec, #DataPrivacy, #FOSS, #Risk, #Intelligence, #Fraud, #Corruption, #Arch, #Linux, #Asia

Posts in EN/FR

akkoma

AGR Risk Intelligence

11 months ago

AGR Risk Intelligence
11 months ago

How to (try to) block #IA scrapers with a tailored `robots.txt`:

You may want to use:
[Dark Visitor](darkvisitors.com/)

And:

[ai.robots.txt](github.com/ai-robots-txt/ai.ro…)

Adding to the list the `robots.txt` by #VLC:

[VLC robots.txt](videolan.org/robots.txt)

Source: **Bloquer les gaveurs d'IA** by @lord (in French)

lord.re/fast-posts/76-bloquer-…

#ChatGPT #IA #Robots #Scraper

Bloquer les gaveurs d'IA

Vous avez un joli site ouaib avec vorte ptit contenu écrit main. C'est votre blog, votre espace de réflexion, votre zone de création, votre espace rien qu'à vous partagé au monde, votre rejeton… C'est super chouette mais bon maintenant en 2024, ça ve…

^/home/lord

#VLC #chatgpt #ia #robots #scraper @Lord

in reply to AGR Risk Intelligence

Seirdy

in reply to AGR Risk Intelligence 11 months ago

robots.txt references are unfortunately quite prone to cargo-culting. I find ai.robots.txt to be generally misleading. Lots of the UAs they list aren’t used to train generative AI.

AdsBot-Google, for instance, doesn’t do anything if you don’t use Google Ads; Google uses GoogleBot-Extended to train its GenAI offerings.

For image2dataset, it’s better to use a noai meta robots directive and allow it to crawl to properly opt out of indexing, should it stumble upon a cached copy of your page.

Blocking certain bots

I don’t want my content on those sites in any form and I don’t want my content to feed their algorithms. Using robot.txt assumes they will ‘obey’ it. But they

^{Seirdy’s Home}

⇧

AGR Risk Intelligence

AGR Risk Intelligence 11 months ago • •

Bloquer les gaveurs d'IA

Seirdy

Blocking certain bots

AGR Risk Intelligence
11 months ago