Friday, May 23, 2008

You Need To Understand This, At Least A Little

Assetbar:

Consider the messaging problem:

Nothing is as easy as it looks. When Robert Scoble writes a simple “I’m hanging out with…” message, Twitter has about two choices of how they can dispatch that message:

  1. PUSH the message to the queue’s of each of his 6,864 followers, or
  2. Wait for the 6,864 followers to log in, then PULL the message.

The trouble with #2 is that people like Robert also follow 6,800 people. And it’s unacceptable for him to login and then have to wait for the system to open records on 6,800 people (across multiple db shards), then sort the records by date and finally render the data. Users would be hating on the HUGE latency.

So, the twitter model is almost certainly #1. Robert’s message is copied (or pre-fetched) to 6,864 users, so when those users open their page/client, Scoble’s message is right there, waiting for them. The users are loving the speed, but Twitter is hating on the writes. All of the writes.

How many writes?

A 6000X multiplication factor:

Do you see a scaling problem with this scenario?

Scoblewrites something–boom–6,800 writes are kicked off. 1 for each follower.

Michael Arrington replies–boom–another 6,600 writes.

Jason Calacanis jumps in –boom–another 6,500 writes.

Beyond the 19,900 writes, there’s a lot of additional overhead too. You have to hit a DB to figure out who the 19,900 followers are. Read, read, read. Then possibly hit another DB to find out which shard they live on. Read, read, read. Then you make a connection and write to that DB host, and on success, go back and mark the update as successful. Depending on the details of their messaging system, all the overhead of lookup and accounting could be an even bigger task than the 19,900 reads + 19,900 writes. Do you even want to think about the replication issues (multiply by 2 or 3)? Watch out for locking, too.

And here’s the kicker: that giant processing & delivery effort–possibly a combined 100K disk IOs– was caused by 3 users, each just sending one, tiny, 140 char message. How innocent it all seemed.

Now, are there any questions why twitter goes down when there’s any kind of event?

See, this is the difference between "small pieces loosely joined," e.g., lots of blogs on various platforms connected by RSS feeds, and a centralized service that does everything, for free no less. Keeping the centralized system going gets harder and harder, until they finally sell out to megacorp, who try to make some money off the thing, and it starts to suck and you leave if you can.

6 comments:

kellan said...

What drives me nuts is that Twitter is both a harder problem than the nattering classes can conceive, and an easier problem than TwitterHQ keeps claiming it is.

Sylvia said...

Thanks, that's very helpful.

Anonymous said...

Great post: a simple observation, with significant consequences. This should be mandatory reading for everyone complaining about Twitter--which seems to be pretty much everyone.

I like your point about the implications of the the type of service Twitter is offering. The distinction between free as in beer or speech often, I think, sounds a bit fussy to folks outside the Free Software community (and sometimes to those in it). I think that Twitter may provide a great object lesson in how relevant that difference can be.

Maybe in the future, users of 'free' services like Twitter will keep in mind
The Dude's advice: "you look for the person who will benefit..."

Graham Wegner said...

Tom, as a techno-layman, your explanation was extremely helpful. Pointing out that twitter is so not "small pieces loosely joined" was also a great point because so many edubloggers are calling it the showpiece of their online existence and they don't realise that it could all be gone tomorrow (or in the next five minutes) while blogs and other RSS enabled and distributed content will be recoverable in most cases. (Correct me if I'm wrong - layman speaking, remember.)

Tom Hoffman said...

Graham,

It isn't so much that Twitter might disappear and that your blog host won't. The difference is that there is only one Twitter host -- Twitter, so when it becomes more popular, the burdens are placed entirely on its shoulders, and the entire community is dependent on how it handles that and responds.

Also, Kellan works at flickr, so I listen to what he says about scaling.

Gnuosphere said...

Thanks Tom, I think I now understand (at least a little). Here's a related link.