I suspect/hope most of reddits infra costs mostly come from massive processes they run to consume and correlate user data into sellable data, or the massive moderator tools using full-text search they probably use to hunt down undesirables.
I feel like just serving up text based information shouldn’t be that intensive if done right. But I definitely don’t have the experience to say so for a program handling millions of requests.
Even “just text” as a sufficient scale introduces significant technical challenges. I’m sure some of Reddit’s resources go to deal with ads and some scraping of user data, but even just the basic user experience at the scale of Reddit takes thousands of servers… and that was back in 2018 when Reddit’s infrastructure team did an AMA. I’m sure it’s grown substantially since then.
Back then, on average, Reddit was sending out 32 gigabytes per second to support all of the users connecting. That text, at Reddit scale, becomes incredibly substantial.
And as you grow beyond single server capability, you get into clusters, load balancing, availability, consistency, and all kinds of other things that pop up to make a single application like Reddit operate at the scale it does.
I suspect/hope most of reddits infra costs mostly come from massive processes they run to consume and correlate user data into sellable data, or the massive moderator tools using full-text search they probably use to hunt down undesirables.
I feel like just serving up text based information shouldn’t be that intensive if done right. But I definitely don’t have the experience to say so for a program handling millions of requests.
Even “just text” as a sufficient scale introduces significant technical challenges. I’m sure some of Reddit’s resources go to deal with ads and some scraping of user data, but even just the basic user experience at the scale of Reddit takes thousands of servers… and that was back in 2018 when Reddit’s infrastructure team did an AMA. I’m sure it’s grown substantially since then.
Back then, on average, Reddit was sending out 32 gigabytes per second to support all of the users connecting. That text, at Reddit scale, becomes incredibly substantial.
And as you grow beyond single server capability, you get into clusters, load balancing, availability, consistency, and all kinds of other things that pop up to make a single application like Reddit operate at the scale it does.
Very insightful, thanks