Consider the messaging problem:
Nothing is as easy as it looks. When Robert Scoble writes a simple “I’m hanging out with…” message, Twitter has about two choices of how they can dispatch that message:
- PUSH the message to the queue’s of each of his 6,864 followers, or
- Wait for the 6,864 followers to log in, then PULL the message.
The trouble with #2 is that people like Robert also follow 6,800 people. And it’s unacceptable for him to login and then have to wait for the system to open records on 6,800 people (across multiple db shards), then sort the records by date and finally render the data. Users would be hating on the HUGE latency.
So, the twitter model is almost certainly #1. Robert’s message is copied (or pre-fetched) to 6,864 users, so when those users open their page/client, Scoble’s message is right there, waiting for them. The users are loving the speed, but Twitter is hating on the writes. All of the writes.
How many writes?
A 6000X multiplication factor:
Do you see a scaling problem with this scenario?
Scoblewrites something–boom–6,800 writes are kicked off. 1 for each follower.
Michael Arrington replies–boom–another 6,600 writes.
Jason Calacanis jumps in –boom–another 6,500 writes.
Beyond the 19,900 writes, there’s a lot of additional overhead too. You have to hit a DB to figure out who the 19,900 followers are. Read, read, read. Then possibly hit another DB to find out which shard they live on. Read, read, read. Then you make a connection and write to that DB host, and on success, go back and mark the update as successful. Depending on the details of their messaging system, all the overhead of lookup and accounting could be an even bigger task than the 19,900 reads + 19,900 writes. Do you even want to think about the replication issues (multiply by 2 or 3)? Watch out for locking, too.
And here’s the kicker: that giant processing & delivery effort–possibly a combined 100K disk IOs– was caused by 3 users, each just sending one, tiny, 140 char message. How innocent it all seemed.
Now, are there any questions why twitter goes down when there’s any kind of event?
See, this is the difference between "small pieces loosely joined," e.g., lots of blogs on various platforms connected by RSS feeds, and a centralized service that does everything, for free no less. Keeping the centralized system going gets harder and harder, until they finally sell out to megacorp, who try to make some money off the thing, and it starts to suck and you leave if you can.