There will also be the ability to filter to top links for a particular tag - so for example you could say "show me the top links people have posted in the last 12 hours for #sports"
@ummjackson will it cache the data, or actively query against the instances?
@colossus It caches the links, tags and languages but no identifying data about who shared it. All from public timeline APIs.
@ummjackson that's a pretty neat idea, I like that this makes the data accessible without just being a crawler feeding full text search. In your screenshot, is that a query to be executed by your tool or running against the cache?
@colossus Yeah the goal is to do it in a way that’s still useful for discovery, without compromising privacy or storing any data tied to an individual. It’s just a SQL query executed directly against the cache, after it crawled for a few minutes.
@ummjackson @colossus
I appreciate your efforts are well intentioned, but I can't help but wonder if #analysis tools end up being bad (socially) for #fediverse.
You mention #privacy a lot, which implies consideration for the users data you are going to analyse, but "#trending" tools will still possibly (probably?) herd fediverse members into filter bubbles.
Can you consider that what you are doing is replicating the existing "social media" structures in the fediverse?
It seems to me that an important part of the #fediverse is the ability to de-federate.
As someone involved in attempts to build #community networks, I find this feature attractive. I support the creation of small instances, where most users would have some kind of real world connection and can carry on their business without having to participate in the #myrmecology aspects of the internet.
@colossus @fragrancesensitive
As @ummjackson mentions elsewhere on this thread, disabling the public timeline is a potential workaround to avoid getting scraped, as #http is not #activitypub as doesn't have federation rules.
However, I don't see why disabling this feature for all should be the answer to avoiding a few bad actors. iptables block lists perhaps? Or maybe a network wide request to respect something more HTTP-ish like a robots.txt?, and reserve firewall blocking for violators?