Skip to main content


Yesterday I deleted whole #FediSearch index and started crawling the #fediverse from scratch.

So many new accounts should be discovereable now.

https://fedisearch.skorpil.cz
This entry was edited (1 year ago)
in reply to Štěpán Škorpil :skorpil_cz:

The index was spoiled by huge amount of fake domains leading to a badly configured Mastodon instance. This combination of problems overloaded crawler, thus after 14 days new accounts still were not discovered.
in reply to Štěpán Škorpil :skorpil_cz:

I need to improve crawler to be able to handle this situation, but I don't have time right now.
For now I added the badly configured instance to the blacklist.
in reply to Štěpán Škorpil :skorpil_cz:

Remember, that it uses public APIs for account discovery and that means you have to set your account discoverable to be, well, discoverable.
in reply to Štěpán Škorpil :skorpil_cz:

And also remember that to be discoverable you have to fill your bio with info you want to be discoverble by...
in reply to Štěpán Škorpil :skorpil_cz:

Is there any way to force the crawler to get the whole list of users for some instances? (Like all uodated CZ instances)
I'm not sure how it works on pre-4.0, but now the user directory is limited to 80 records per page and it can't be overridden with the 'limit' argument.
in reply to Štěpán Škorpil :skorpil_cz:

If I understand it correctly, it sets the limit to 500 users per page, but the server has it's internal limit on 80.
So if the instance has < 500 users, it only shows the first 80 and with >500 it probably undercounts by a lot if the same limit is everywhere.

https://github.com/Stopka/fedicrawl/blob/29acce39063d1dbfbe69bab22348855ff5ca21c2/application/src/Fediverse/Providers/Mastodon/retrieveLocalPublicUsersPage.ts#L9
in reply to Štěpán Škorpil :skorpil_cz:

Does this respect robots.txt and opt-outs, and limit itself to profiles?

Would be nice to have a statement about this on the site.
in reply to stop genocide in gaza

@nikodemus I've opted out from search engine indexing and I could still find my profile.

Do. Not. Like. This.
Unknown parent

its "FediCrawl/1.0"
This entry was edited (1 year ago)
Unknown parent

in reply to Štěpán Škorpil :skorpil_cz:

@admin What is the diference between regular engines and specialized fediverse engine? Why would you like to allow google and disallow fedisearch?