@dansup We're stronger together, Dan. It's not worth throwing stones.
-
It would seem like this 'global scale' difficulty relates to the aforesaid 'quadratic scaling' issue raised by @cwebber
If, in fact this is true, it is very hard to see how the protocol is actually viable as a broadly decentralized protocol.
Would love to have someone knowledgeable address this.
https://mastodon.online/@mastodonmigration/116064809568107112
@mastodonmigration @baralheia @cwebber no, I mean, processing 2.4 billion posts, 3.4 billion follows, and 13.6 billion likes is a metric shittone of data to process. Serving up feeds to 42 million users (10-15 million monthly active) requires a lot of processing.
Stats from: https://bsky.jazco.dev/stats
It's not even talking about communication at a network layer between PDSes, Relays, and AppViews. That's a different matter, which is where Christine was mostly talking, iirc.
-
@thisismissem @evan @dansup @quillmatiq and for my MassiveWiki project I want both AT and AP interop. I think we just need a few bridges to use and experiment with. (nonetheless, he persists)
@band @evan @dansup @quillmatiq well, @quillmatiq is involved in Bridgy Fed, and it's open source, so, there's maybe a starting point. You could probably also re-use the https://standard.site lexicon
-
@evan @boris @reflex @dansup @quillmatiq will you forgive me cos I asked Gemini😂: Destroy as suitable. Under dependency challenges, it says:
- Identity Dependency: did:plc directory Bsky owned
- "Centralized Indexing: users can host their own PDS, but rely on "relays" to discover other users. Currently, the main relay is operated by Bky. Replacing this requires significant compute power."
- "Atproto's adoption depends on it having a "killer app" other than the initial microblogging client"@sheislaurence did:plc is spinning out to an independent org, relays are only necessary for things at scale (& aren’t used for user discovery), and relays currently cost $20/month for 42M accounts.
I presented at Fedicon last year about a selection of the many apps being built https://bmannconsulting.com/notes/beyond-microblogging-atproto/
For completeness, because of account architecture, ATProto doesn’t have a private data option today.
-
@mastodonmigration @baralheia @cwebber no, I mean, processing 2.4 billion posts, 3.4 billion follows, and 13.6 billion likes is a metric shittone of data to process. Serving up feeds to 42 million users (10-15 million monthly active) requires a lot of processing.
Stats from: https://bsky.jazco.dev/stats
It's not even talking about communication at a network layer between PDSes, Relays, and AppViews. That's a different matter, which is where Christine was mostly talking, iirc.
@thisismissem @baralheia @cwebber
Hmmm... please excuse the layman's understanding of these matters, but it does seem like this relates to traffic at the network layer. What you are saying is that the difficulty for the smaller instance in scaling is "serving up feeds for 42 million users." With ActivityPub the first request would transfer a single copy to a cache on the requesting instance and subsequent requests would not generate any traffic. This is why AP is able to scale linearly.
-
@thisismissem @baralheia @cwebber
Hmmm... please excuse the layman's understanding of these matters, but it does seem like this relates to traffic at the network layer. What you are saying is that the difficulty for the smaller instance in scaling is "serving up feeds for 42 million users." With ActivityPub the first request would transfer a single copy to a cache on the requesting instance and subsequent requests would not generate any traffic. This is why AP is able to scale linearly.
@mastodonmigration @baralheia @cwebber so ingesting all the data for Bluesky does take time & resources, but it is doable: @FedicaHQ have actually done this, as have Blacksky. There's probably others too.
The AppView acts as a cache for this data. The cost is due to the sheer scale of the dataset, and in computing the feeds & notifications for however many million users.
The other cost is CDN and Moderation, which are kinda expensive at scale, however, definitely aren't costs unfamiliar for AP servers too.
Mastodon does an interesting design choice by stopping producing feeds for users that haven't been active for a while.
There's also been plenty of fediverse applications that have had issues with feed generation (Firefish, Hollo, and others have had issues in the past if memory serves).
So yeah, if you only need to serve feeds for say a dozen users and don't need the full network's worth of data, then it's cheaper.
But the article that Christine wrote was more about the network bandwidth between the components and how that scales. Which is a very different matter.
-
@mastodonmigration @baralheia @cwebber so ingesting all the data for Bluesky does take time & resources, but it is doable: @FedicaHQ have actually done this, as have Blacksky. There's probably others too.
The AppView acts as a cache for this data. The cost is due to the sheer scale of the dataset, and in computing the feeds & notifications for however many million users.
The other cost is CDN and Moderation, which are kinda expensive at scale, however, definitely aren't costs unfamiliar for AP servers too.
Mastodon does an interesting design choice by stopping producing feeds for users that haven't been active for a while.
There's also been plenty of fediverse applications that have had issues with feed generation (Firefish, Hollo, and others have had issues in the past if memory serves).
So yeah, if you only need to serve feeds for say a dozen users and don't need the full network's worth of data, then it's cheaper.
But the article that Christine wrote was more about the network bandwidth between the components and how that scales. Which is a very different matter.
@mastodonmigration @baralheia @cwebber An AppView typically consumes data from a single full-network relay, with failover. A PDS typically has subscriptions from 1 or more relays for data. There are also some relays that just consume other relays.
Adding more PDSes means more connections for relays, adding more relays means more subscriptions to individual PDSes for data.
There's like, a dozen or so relays operating in full-network mode, as far as I know, and relays don't do archival anymore, which was the largest cost.
-
@mastodonmigration @baralheia @cwebber An AppView typically consumes data from a single full-network relay, with failover. A PDS typically has subscriptions from 1 or more relays for data. There are also some relays that just consume other relays.
Adding more PDSes means more connections for relays, adding more relays means more subscriptions to individual PDSes for data.
There's like, a dozen or so relays operating in full-network mode, as far as I know, and relays don't do archival anymore, which was the largest cost.
@thisismissem @mastodonmigration @cwebber is there a list or directory of independent Bluesky relays and AppViews somewhere?
-
@sheislaurence did:plc is spinning out to an independent org, relays are only necessary for things at scale (& aren’t used for user discovery), and relays currently cost $20/month for 42M accounts.
I presented at Fedicon last year about a selection of the many apps being built https://bmannconsulting.com/notes/beyond-microblogging-atproto/
For completeness, because of account architecture, ATProto doesn’t have a private data option today.
@boris thank you!
-
@mastodonmigration @baralheia @cwebber An AppView typically consumes data from a single full-network relay, with failover. A PDS typically has subscriptions from 1 or more relays for data. There are also some relays that just consume other relays.
Adding more PDSes means more connections for relays, adding more relays means more subscriptions to individual PDSes for data.
There's like, a dozen or so relays operating in full-network mode, as far as I know, and relays don't do archival anymore, which was the largest cost.
@thisismissem @baralheia @cwebber
Still not seeing it. On the ingest side, traffic should only be proportional to total users on that node. If a node is smaller it should only generate network data traffic to service its own users. Actually less than that due to caching.
The beauty of AP seems to be precisely that it uses caching to enable individual nodes to scale storage and processing linearly both as regards serving and ingesting data.
-
@thisismissem @baralheia @cwebber
Still not seeing it. On the ingest side, traffic should only be proportional to total users on that node. If a node is smaller it should only generate network data traffic to service its own users. Actually less than that due to caching.
The beauty of AP seems to be precisely that it uses caching to enable individual nodes to scale storage and processing linearly both as regards serving and ingesting data.
@mastodonmigration @baralheia @cwebber right, so on the ingest side, if you want to build an application that is ingesting all the data from bluesky, then you'd be asking the relay for all records targeting the app.bsky.* NSID and all events about repositories that contain the app.bsky.actor.profile record.
That's 42 million accounts across however many PDSes.
That's specifically for an AppView where you *want* a full network copy of all microblogging data. That's obviously going to be expensive.
You can also build a system where you say "Actually, only give me data from these accounts" (partial network copy). Konbini is one such project: https://github.com/whyrusleeping/konbini
Doll's Aurora Prism is another project in this space: https://github.com/dollspace-gay/Aurora-Prism
If I build an app with my own lexicon, I don't need to process all that bluesky data. I process only the data for accounts using my application.
-
-
@dansup I'm laughing out loud at how bad the Nostr homepage copy is. What's that law where techbros building communication tools don't understand that communication is important....?
-
@mastodonmigration @baralheia @cwebber right, so on the ingest side, if you want to build an application that is ingesting all the data from bluesky, then you'd be asking the relay for all records targeting the app.bsky.* NSID and all events about repositories that contain the app.bsky.actor.profile record.
That's 42 million accounts across however many PDSes.
That's specifically for an AppView where you *want* a full network copy of all microblogging data. That's obviously going to be expensive.
You can also build a system where you say "Actually, only give me data from these accounts" (partial network copy). Konbini is one such project: https://github.com/whyrusleeping/konbini
Doll's Aurora Prism is another project in this space: https://github.com/dollspace-gay/Aurora-Prism
If I build an app with my own lexicon, I don't need to process all that bluesky data. I process only the data for accounts using my application.
@mastodonmigration @baralheia @cwebber in Christine's article (and I've just spoken with her about it), it assumes a network topology that does not exist in the real world.
It assumes that every user is on a different pds, and every user runs a full network relay. The reality is that multiple users are usually on a single PDS, and there's only like 12 relays.
- 2 from bluesky (+ 1 deprecated)
- 2 from hose.cam
- 1 from blacksky
- 1 from upcloud
- 3 from firehose.networkplus a few more from various people.
In the ActivityPub ecosystem for every user to message every other user, you need connections between 30,000 servers.
For the same in AT Protocol, you need connections between N PDS to one or more relays (most use the bluesky relay, which others get their list of PDSes from).
-
@mastodonmigration @baralheia @cwebber in Christine's article (and I've just spoken with her about it), it assumes a network topology that does not exist in the real world.
It assumes that every user is on a different pds, and every user runs a full network relay. The reality is that multiple users are usually on a single PDS, and there's only like 12 relays.
- 2 from bluesky (+ 1 deprecated)
- 2 from hose.cam
- 1 from blacksky
- 1 from upcloud
- 3 from firehose.networkplus a few more from various people.
In the ActivityPub ecosystem for every user to message every other user, you need connections between 30,000 servers.
For the same in AT Protocol, you need connections between N PDS to one or more relays (most use the bluesky relay, which others get their list of PDSes from).
@mastodonmigration @baralheia @cwebber on activitypub, if I have 30,000 followers (1 follower per server), and I want to post a message, my server has to send out 30,000 messages.
In AT Protocol, if I want to do the same write operation, I send one http request to my PDS, the PDS then publishes that message to N connected relays (where N =< 12)
-
@quillmatiq @evan @dansup I have been thinking about trying to do some sort of protocol bridging with my project Fedi+ but then that runs the risk of people like FediTips getting on the wrong side of things being like oh Fedi+ interacts with fashists or whatever all because of the protocol being associated with Bluesky, which verified ICE and other US government accounts, and so on. My goal with Fedi+ is to not only create that vibe people loved when Google+ was around, but also to make it super easy for people who don't care about Mastodon or ActivityPub or whatever to join on and not even need to think about the protocols behind the scenes.
@alexchapman @quillmatiq @evan @dansup
Have you looked at WAFRN for inspiration on the dual protocol side of things
-
@alexchapman @quillmatiq @evan @dansup
Have you looked at WAFRN for inspiration on the dual protocol side of things
@gbargoud @quillmatiq @evan @dansup Ye they do something where you enable the integration and it sets up some sort of additional account thing, kinda complex.
-
@mastodonmigration @baralheia @cwebber right, so on the ingest side, if you want to build an application that is ingesting all the data from bluesky, then you'd be asking the relay for all records targeting the app.bsky.* NSID and all events about repositories that contain the app.bsky.actor.profile record.
That's 42 million accounts across however many PDSes.
That's specifically for an AppView where you *want* a full network copy of all microblogging data. That's obviously going to be expensive.
You can also build a system where you say "Actually, only give me data from these accounts" (partial network copy). Konbini is one such project: https://github.com/whyrusleeping/konbini
Doll's Aurora Prism is another project in this space: https://github.com/dollspace-gay/Aurora-Prism
If I build an app with my own lexicon, I don't need to process all that bluesky data. I process only the data for accounts using my application.
@thisismissem @mastodonmigration @cwebber Side tangent: if you build your own app that uses a different Lexicon than Bluesky, can you still interoperate with BlueSky users? Because for many, the desire is to have things operate like the Fediverse, where (for one example) a post on Loops can be seen in the home timeline for a Mastodon user, et al.
-
@thisismissem @mastodonmigration @cwebber Side tangent: if you build your own app that uses a different Lexicon than Bluesky, can you still interoperate with BlueSky users? Because for many, the desire is to have things operate like the Fediverse, where (for one example) a post on Loops can be seen in the home timeline for a Mastodon user, et al.
@baralheia @mastodonmigration @cwebber you'd need to write a record in your own lexicon and then write a cross-post record in the Bluesky lexicon, for the post to show up on bluesky feeds.
For instance, I wrote a review on popfeed.social: https://popfeed.social/review/at:/did:plc:5w4eqcxzw5jv5qfnmzxcakfy/social.popfeed.feed.review/3mezfspxcbk2j
And when I did that, I opted to cross-post to bluesky: https://bsky.app/profile/thisismissem.social/post/3mezfv3ydp22j
However, such a conversion is inherently lossy. This is true for ActivityPub as well.
You can also write an application that uses the bluesky social graph whilst writing records to your own lexicon without doing bluesky posts.
Or you can have your own social graph. Maybe instead of following people (actors in AP) you're actually following topics, or hashtags, or a website. The AP concept of "following" is limited to following an Actor, which is something that can send and receive activities, where as on AT Protocol, "following" is an application concern where you work with links between data.
-
@mastodonmigration @baralheia @cwebber on activitypub, if I have 30,000 followers (1 follower per server), and I want to post a message, my server has to send out 30,000 messages.
In AT Protocol, if I want to do the same write operation, I send one http request to my PDS, the PDS then publishes that message to N connected relays (where N =< 12)
@thisismissem @baralheia @cwebber
Really appreciate you engaging in this discussion. Will take some time to reflect on what you are saying and try to frame it in generic network model language along the lines of what Christine has put forward.
For this purpose propose we agree that independent nodes be defined as fully autonomous, capable of operating on the network independent of all other components and interacting with all users of on the network (except the PLC directory).
Again, thanks.
-
@baralheia @mastodonmigration @cwebber you'd need to write a record in your own lexicon and then write a cross-post record in the Bluesky lexicon, for the post to show up on bluesky feeds.
For instance, I wrote a review on popfeed.social: https://popfeed.social/review/at:/did:plc:5w4eqcxzw5jv5qfnmzxcakfy/social.popfeed.feed.review/3mezfspxcbk2j
And when I did that, I opted to cross-post to bluesky: https://bsky.app/profile/thisismissem.social/post/3mezfv3ydp22j
However, such a conversion is inherently lossy. This is true for ActivityPub as well.
You can also write an application that uses the bluesky social graph whilst writing records to your own lexicon without doing bluesky posts.
Or you can have your own social graph. Maybe instead of following people (actors in AP) you're actually following topics, or hashtags, or a website. The AP concept of "following" is limited to following an Actor, which is something that can send and receive activities, where as on AT Protocol, "following" is an application concern where you work with links between data.
@baralheia @mastodonmigration like, say I publish a song on Bandwagon, maybe I publish it with album art, I include the track listing, the credits for songwriting, production, etc. Maybe I also include the lyrics for each track.
If Bandwagon cross-posts that to Mastodon, or wants to publish an activity in a form that Mastodon understands, then that data obviously can't all be sent to Mastodon, so instead you post something like:
> Introducing our new album “Music for the soul“ available now on our bandwagon: https://...
And that's actually perfectly fine. In fact, the anti-pattern in ActivityPub is the reduction of literally everything to a Note, just to be compatible with Mastodon.
Like, you'd expect Loops to publish a Video object, but no, it publishes a Note: https://blog.joinloops.org/loops-joins-the-fediverse/#:~:text=Smart%20Content%20Representation
This is an anti-pattern that's been repeated across the fediverse ad-infinitum, and reduces all our content to what can be represented in a Note, which is designed for microblogging.