There will also be the ability to filter to top links for a particular tag - so for example you could say "show me the top links people have posted in the last 12 hours for #sports"
@ummjackson will it cache the data, or actively query against the instances?
@colossus It caches the links, tags and languages but no identifying data about who shared it. All from public timeline APIs.
@ummjackson that's a pretty neat idea, I like that this makes the data accessible without just being a crawler feeding full text search. In your screenshot, is that a query to be executed by your tool or running against the cache?
@colossus Yeah the goal is to do it in a way that’s still useful for discovery, without compromising privacy or storing any data tied to an individual. It’s just a SQL query executed directly against the cache, after it crawled for a few minutes.
@ummjackson @colossus
I appreciate your efforts are well intentioned, but I can't help but wonder if #analysis tools end up being bad (socially) for #fediverse.
You mention #privacy a lot, which implies consideration for the users data you are going to analyse, but "#trending" tools will still possibly (probably?) herd fediverse members into filter bubbles.
Can you consider that what you are doing is replicating the existing "social media" structures in the fediverse?
@fragrancesensitive I'm not sure @ummjackson 's intent is to build those tools, but they're going to get built.
Someone is going to make sites that replicate those existing structures on data pulled from the #fediverse . E.g., Google's already indexing a bunch of instances, and they'll decide what to prioritize in search results
As @ummjackson mentions elsewhere on this thread, disabling the public timeline is a potential workaround to avoid getting scraped, as #http is not #activitypub as doesn't have federation rules.
However, I don't see why disabling this feature for all should be the answer to avoiding a few bad actors. iptables block lists perhaps? Or maybe a network wide request to respect something more HTTP-ish like a robots.txt?, and reserve firewall blocking for violators?