Skip to main content


Yesterday I deleted whole #FediSearch index and started crawling the #fediverse from scratch.

So many new accounts should be discovereable now.

fedisearch.skorpil.cz

This entry was edited (2 years ago)
in reply to Štěpán Škorpil

The index was spoiled by huge amount of fake domains leading to a badly configured Mastodon instance. This combination of problems overloaded crawler, thus after 14 days new accounts still were not discovered.
in reply to Štěpán Škorpil

I need to improve crawler to be able to handle this situation, but I don't have time right now.
For now I added the badly configured instance to the blacklist.
in reply to Štěpán Škorpil

Remember, that it uses public APIs for account discovery and that means you have to set your account discoverable to be, well, discoverable.
in reply to Štěpán Škorpil

And also remember that to be discoverable you have to fill your bio with info you want to be discoverble by...
in reply to Štěpán Škorpil

Is there any way to force the crawler to get the whole list of users for some instances? (Like all uodated CZ instances)
I'm not sure how it works on pre-4.0, but now the user directory is limited to 80 records per page and it can't be overridden with the 'limit' argument.
in reply to Štěpán Škorpil

If I understand it correctly, it sets the limit to 500 users per page, but the server has it's internal limit on 80.
So if the instance has < 500 users, it only shows the first 80 and with >500 it probably undercounts by a lot if the same limit is everywhere.

github.com/Stopka/fedicrawl/bl…

in reply to Štěpán Škorpil

Does this respect robots.txt and opt-outs, and limit itself to profiles?

Would be nice to have a statement about this on the site.

Unknown parent

Štěpán Škorpil
its "FediCrawl/1.0"
This entry was edited (2 years ago)
Unknown parent

in reply to Štěpán Škorpil

@admin What is the diference between regular engines and specialized fediverse engine? Why would you like to allow google and disallow fedisearch?