Saturday, February 28, 2009

Herdict Web -- Mapping Web Filtering

Let me preface this by emphasizing that the subject of this post is of extreme importance to the implementation of all web technologies in schools, and to encourage my ed-tech colleagues to give this service some linky love. If you don't want to slog through my verbose rambling, just watch the video below.

Long-time readers with excellent memories may remember a little Python script I wrote a few years ago called Filtr Chckr, which basically just tried to access a list of URL's and printed out a report of which ones were blocked. I set up a list of about 100 sites, mostly made of news and blogs, and it would quickly map out which ones were and weren't blocked. Ideally, this would be the first step in getting a handle on what kind of blocking is actually taking place in schools. It never got beyond this step because:

  1. The whole project would be a lot of work.
  2. You would have to be some kind of respectable entity to get a lot of people to participate, as opposed to whatever kind of entity I prefer present myself as.
  3. I realized the right way to do it was using Javascript to access the sites from within a browser window, meaning the user doesn't have to install a script. But then I'd have to learn more Javascript.

Happily, some folks at Harvard, that most respectable entity, have finally realized my vision with a site called Herdict.

What we need people to do is use Herdict behind school firewalls to explore and report what sites are blocked. When testing sites you can specify that you're at a school, and add additional notes. Right now, nobody knows what sites are being blocked across the country, what the patterns are, how much political speech is being blocked, etc. Getting a handle on what's actually being implemented on the ground in schools is the first step.

In my initial fiddlings, the only problem I see is that it looks like some legitimate 404-File Not Found's, that is, where the site is not blocked but someone is looking for a file that isn't there, are being counted as blocked sites. Also, since getting on their list of sites to check will make a lot of people see your url, try it, etc., there is an incentive to spam your site onto the check list. Those are things they'll have to sort out.

No comments: