Sunday, March 30, 2008

Benchmarking HTML/XML Parsing in Python

If you do any HTML or XML wrangling in Python, you should check out Ian Bicking's comparison of the performance of various libraries. His methodology seems a bit loose, but the differences in performance is sufficiently wide that the results are clear cut:

So in conclusion: lxml kicks ass. You can use it in ways you couldn’t use other systems. You can parse, serialize, parse, serialize, and repeat the process a couple times with your HTML before the performance will hurt you. With high-level constructs many constructs can happen in very fast C code without calling out to Python. As an example, if you do an XPath query, the query string is compiled into something native and traverses the native libxml2 objects, only creating Python objects to wrap the query results. In addition, things like the modest memory use make me more confident that lxml will act reliably even under unexpected load.

Also, lxml is really easy to use, very Pythonic. The only drawback is that it can be a headache to install on Macs.

Probably Not Good News

Ivan Krstić

For a couple of years, Andres has been OLPC’s kernel maintainer extraordinaire. Now he’s decided to move on.

Friday, March 28, 2008

Blowing Off NECC

Incidentally, I've decided to skip NECC this year. I just need to find another excuse for having a beer with Lauer.

Noted Without Comment

Casey Adams (via email):

We are officially announcing organizational changes specific to Glen Abbey Software and The Miller Group, represented primarily by Mr. Jack Miller. Glen Abbey Software was an original partner with Jack Miller providing the primary representation for that firm. Earlier this month, Mr. Miller decided to officially withdraw himself and Glen Abbey Software from the OS4Ed organization. We were sad to see Mr. Miller leave and wish him the best of luck in his personal endeavors as he continues to foster the development of Centre SIS. If you are engaged in conversations or negotiations with Mr. Miller currently, please understand you are not speaking with a representative of OS4Ed. If you are discussing products listed on the website with Mr. Miller, they are being misrepresented.

It has recently come to our attention that many of you have received one or more emails from a mailing list controlled by The Miller Group with marketing content specific to the Centre SIS, which in our opinion contains some dubious claims about solution capability. We wanted to bring it to your attention, that if you have not personally registered on the website or placed your email address conspicuously in the public domain, that your email address may have been added to that list without your permission. In his time with OS4Ed, Mr. Miller had complete access to our OS4Ed registered user database and to our Sugar CRM database where we actively kept our sales leads and public relations contact information. In our investigation, we have come to a confident conclusion that Mr. Miller may have likely added your email addresses to his mailing list without ours or your permission. We have asked Mr. Miller directly if he has taken those email addresses and used them inappropriately without permission, but he will not give us a direct answer nor will he provide evidence that the emails were obtained in a different manner. This refusal to give us a direct answer makes us even more suspicious that he has done just that without any of our consent. We have asked Mr. Miller to remove and delete your email addresses, sent him a cease and desist email and are following up with a formal cease and desist letter.

Visiting Threadless and Other Gifts

Pictured above, Ignas and me at the beginning of our sprint, sans easel.

After the SchoolTool sprint at PyCon, I had the opportunity to take our four high school interns into Chicago to visit the headquarters of Threadless. Their CTO, Harper Reed, had gotten in touch with me about hosting community Jabber servers and invited us over for a tour. It is pretty much the wacky hip startup environment you've read about many times, except paid for with several years of actual profits instead of rapidly burning VC money. It was good for the kids to see; certainly more glamorous than the bowels of the Crowne Plaza O'Hare, where we'd been hacking away all week. Made me feel old and square though. I think blurting out that I played drums for Unrest at the Czar Bar in 1990 (or something like that) would only make things worse.

Anyhow, thanks to Harper for the tour, and it looks like skinnyCorp is going to do some community server hosting for the OLPC community.

In other news, I've received a prototype active antenna from OLPC. I think I'm going to clear a partition on my ThinkPad for the XS server image so I can see what this thing can do.

Thursday, March 27, 2008

Alan Kay on Balancing Kernel and Community

Alan Kay was gracious enough to write me an email last year that addresses some points relevant to yesterday's OLPC rant, but more eloquently:

Our experience at Viewpoints Research Institute (a non-profit 501(c)(3) org) is that there are at least two very distinct processes with regard to "open source" (and these are often phased in time) . The first is to make something interesting in its own right that can also serve as a kernel. The second is to empower a community of interest...

We did Squeak, EToys, and Croquet pretty much the same way. First, we used our small research and development team of "extreme talent, knowledge and skill" to design and build and test the kernels. Because design at this level is not a simple matter of deciding what to do, but is much more error-prone and requires quite a bit of fast iteration and close communication, it is not possible or desirable to have a large open source community of interest kibitzing during this phase (even though a small percentage of the advice and help would be useful). Basically, there is too much noise of various kinds to permit real progress to be made. And we've found in our group that we can't really do design within our own process over the Internet, we have to get together physically for long periods of time to be effective.

It takes a few years to make a deep kernel.

We have put artifacts out on the net for people to play with early on, and this has sometimes led to misunderstandings and cloggings. The cloggings come when people want to develop on a kernel that is still unfinished. They want to do new stuff but get very upset if the kernel developers also want to do new stuff. Social pressure often leads to premature poor to bad defacto standards that impede progress (the current web and PC world -- and Java, etc. -- is almost completely at a conservative incremental standstill because of this).

Basically, my observation is that Sugar development is "clogged" (in case it is not obvious, Dr. Kay's comments were not in reference to OLPC).

The Future of the Internet and How To Stop It -- This is Not a Metaphor

On Will's recommendation I picked up The Future of the Internet and How to Stop It and just read the first couple chapters in the tub. It is very good so far, I'd just caution Will and other ed-tech types from reading with this orientation toward the text:

I’m not sure yet how much this parallels the lock down vs. open up choice that schools are facing right now, but I have a feeling the conversations will parallel in many ways.

My reading thus far is that it is directly focused on the issues central to education IT, its failings over the past couple decades and where we may end up if we just give up and use cell phones. Just don't read this looking to construct glib metaphors to things you already understand.

Wednesday, March 26, 2008

More Idle Speculation on Something that Won't Happen: How Microsoft could p0wn OLPC

I was impressed by the demos I saw at PyCon of IronPython and Silverlight. IronPython is Microsoft's open source implementation of Python for the .Net Common Language Runtime (CLR). Silverlight is, essentially, Microsoft's Flash rival. Silverlight is not open source, but Novell is working on an open source implementation called Moonlight.

I can't help but wonder how an implementation of Sugar based on Silverlight would work. In particular, it is hard not to also wonder what an XO running a CLR based operating system would act like, as Chris Dawson has suggested. Of course, such a thing doesn't exist, although there are some very immature attempts (Singularity, Cosmos, SharpOS). It seems to me that Microsoft needs a greenfield OS project. They need to mothball XP, and all these cheap laptops that can't run Vista will just drag them down further.

The tricky thing here is that Microsoft could work with Novell to provide both proprietary Microsoft and open source versions of this stack, virtually ensuring their tools remain at the center of this not terribly profitable, but potentially strategic market segment.

Back in reality, Sugar is already demonstrating the difficulty of pulling off a greenfield desktop environment, even without writing a new OS at the same time, and Microsoft seems no more capable of pulling off this sort of move than, say, General Motors does, so I don't expect it. Just had to get the idea off my chest.

Where Does Doug Johnson Get These Ideas?


In general, when it comes to intellectual property, a difficult concept for most people, especially younger ones, is that the creator does have the right to control the use of his product. He does not have an obligation to sell it or make it available for use if he chooses not to. If I make a chair and even let you look at my chair, there is nothing that requires me sell you my chair, a copy of my chair or the design plans for my chair. I can legally stop you from making a copy of my chair if the design has been copyright/trademark protected, I believe.

"The creator does have the right to control the use of his product!?" Based on what? If I go over to your house and like the new chair you created, of course I can't compel you to sell it to me or give me the plans, but I can certainly go home and make one just like it. Whether or not I can sell or redistribute the copies would depend on if you patented the chair, but getting a patent on a chair can't be easy at this point. Regardless, you can't stop me from making a chair just like it and putting it in my house, and if the chair can't be patented, what's the basis for preventing me from making and selling them? You can't copyright a chair. Or a piece of clothing or a recipe dish, for that matter.

I find it difficult to understand why school librarians seem so anxious to ape Raymond Ty and promote the strictest possible interpretations of intellectual property law. I just don't get the motivation.

Also, I didn't take enough philosophy courses to really get in a argument beyond the common sense level about the definition of "ethics," but I think people who like to talk about digital ethics are conflating "ethical" and "legal" behavior. Don't ethics transcend law? For example, most people would agree that going into Doug's house and taking his chair without his permission is both illegal and unethical. If I take the chair and leave money equal to the replacement cost of the chair, plus some compensation for his trouble, it is still illegal and unethical. If he has insurance that covers the replacement cost of the chair, it is still illegal and unethical. If a law is passed that says it is ok to steal chairs from people named Doug, then taking his chair would be legal, but most people would still consider it unethical, at least those with some perspective on the scenario. If my baby was dying for want of a chair, and Doug had extra chairs, stealing one might be ethically justifiable, but still completely illegal.

Here's the thing about, say, file sharing copyrighted music. Right now, it is clearly illegal and many people argue it is unethical. If tomorrow a law is passed that makes file sharing legal and tracks what files are downloaded and compensates artists from a common pool of money, then file sharing would not only become recognized as legal but also ethical, right? The problem people have with file sharing music isn't ethical but economic. If the economic issues were resolved, people would stop pretending there were ethical ones.

One final example. It is ethically wrong to plagiarize. It continues to be ethically wrong to plagiarize regardless of the economics of the situation. It is not less wrong to plagiarize if I compensate the original author. It is not less wrong if I have his or her permission. It is just ethically wrong, period. That's the way ethics works; it doesn't depend on who is getting paid.

Time for Rambling OLPC Speculation

What's going on at OLPC? I have no idea.

Actually, that's not quite true. We've got insight into what's going on in the software development process; not so much in the overarching internal politics of OLPC. In particular, the most important thing: the fundamental financial pressures that they and their partners -- funders, producers, commercial spin-offs -- are unknowable to me. Obviously, Google alone has the capacity to keep the lights on at OLPC as long as they feel like it.

Also, it remains difficult to tell just how much the experience of an XO deployment in the developing world is different from that of a geek in the first world. Obviously, the answer is very, but it is difficult to say in what precise ways it is different, and what kinds of bugs can and can't be tolerated in different environments, and by both the people who use the devices and those who pay for them.

How much time OLPC has is the schwerpunkt here, because the one thing we do have plenty of insight into is OLPC's software development process, and that effort is demonstrably in the weeds. The software problems will not be solved quickly.

Brief pause for a quote by Mike Fletcher:

In case you missed it, this project is rather visible. Every little sneeze, every cough, is used by people standing on the sidelines to declare the death of the project. There are commentators who take every opportunity to seem supportive, while attempting to twist in a knife, cackling in schadenfreude and veiling all of their potentially useful critiques in a veil of false friendship.

The effects of that toxic critique is to shut down the channels of communication, so that all external critique is rejected far too easily. External developers and commentators who are not universally positive are seen as, at best, not "getting it", and at worst, as being intent upon derailing the project via false motives. The project needs some way to separate out the critiques intended to derail it from those intent on righting the path.

I want to at least make sure I'm being consistent here. I believe my position over the past couple years has been this:

  1. There are some brilliant, long overdue ideas in the design of Sugar.
  2. It doesn't look like there is anywhere near enough time to implement them.
  3. Maybe there'll be another OLPC miracle (a la the XO display).

I've felt the goals of Sugar were too important to simply say "They're foolish to try." I still can't quite bring myself to say it wasn't worth a shot. And there is still a chance it'll all work out, depending on how much time is left on the clock. I don't think they've done a bad job; they've just failed to pull off the nigh impossible.

However, that leaves them in a bad spot, and it is one that is not unfamiliar to me as project manager of SchoolTool. Frederick Brooks is famous for pointing out that adding programmers to a late project will only make it later. It is also true that prematurely releasing a framework for building other applications, what Brooks calls a "programming systems product," will only make the eventual stable release of the framework later. When you're forced to do this, you end up with end users who have to be supported, require documentation, and bugfixes; and hopefully you also have third party developers who want to write applications (or "activities") using your framework, who also require support, documentation and bugfixes. The increasing problem becomes that on both levels, you either have to maintain backward compatibility with your admittedly premature release, and/or you have to impose a significant load on users and developers to keep up with the changes you incur in finishing your programming system product. Also, at each step you have to keep user and developer documentation up to date. This won't necessarily wreck your project, but it will slow it down.

In particular, you lose a lot of the advantages of the open source process and open API's. Developers who try to build applications on your API's will get frustrated, complain, and probably leave and not come back until they hear that you've straightened out your problems. Developers who come along at this point who are interested in your core product will see that it is incomplete, behind schedule, and in flux, and will have a strong natural tendency to spend their time discussing what you've doing wrong and instead of sitting down and implementing feature or optimization X for application Y, they'll be discussing the nature of application Y. If they do write code, since the core vision hasn't been realized in code, it is more likely that their vision will be out of sync and peripheral or useless to the project. I'd note that these are observations based on a number of projects, including my own, not just OLPC, and not aimed at anyone specific at OLPC.

In my opinion, what a project has to do at this point is hunker down with its core team, narrow its vision, and plow through. Stay rigorously focused on stabilizing the core API's and clearing up serious user facing bugs. The thing is, there are no quick turnarounds in software. I can't say "Oh, maybe they'll pull a FooSoft," where "FooSoft" is the canonical example of a seemingly doomed project that was quickly saved by enlightened management. There is no such canonical example. There are certainly examples of seemingly failed projects which turned around -- Mozilla/Firefox leaps to mind -- but the time span is measured in years, not months.

Tuesday, March 18, 2008


The wheels appear to be coming off OLPC. Ivan Krstić:

I cannot subscribe to the organization’s new aims or structure in good faith, nor can I reconcile them with my personal ethic. Having exhausted other options, three weeks ago I resigned my post at OLPC.

OTOH, it is clear significant changes in software development are necessary at OLPC, so, I guess we'll see...

The PyCon-OLPC Zeitgeist

Titus Brown:

My OLPC interactions were interesting:

On Sunday, I gave a talk on automated testing and the OLPC GUI, Sugar.I'll post slides and a screencast later, but a brief summary goes like this: Sugar development is a bit of a disaster, with very little in the way of any software engineering principles being applied. In particular, there's my particular bugaboo: they have no automated tests, at all. My talk discussed the situation and talked a bit about using technology to remedy the situation; ultimately, though, the choice the OLPC people have to make is whether or not their software is going to suck. (This version of my argument is intentionally provocative, but I strongly believe that this is indeed the choice they face. See "jwz CADT" and also my future posts on this topic.) In particular, their testing plans consist of this: "really hope that other people step up and test our shit." In stark contrast to some of their other detractors, I'm trying to become one of those people that does test their shit, but it also seems to me that without a sea change in the focus of the software management layer at OLPC, I will be wasting my time.

Anyway, so that's a mildly obnoxious talk to give and I did my best to leaven it with humor and some rilly rilly cool testing tech. What was interesting to me, though, was the private advice from a number of people -- there appears to be a large undercurrent of dissatisfaction with the OLPC project in the Python community. In particular, one group of people basically said "burn the f$$!ckers to the ground". (I largely ignored this advice and tried to focus on the positive.) These are not normally mean-spirited people, so from this, if nothing else, I conclude that the OLPC has mismanaged its interactions with the Python community. I'm not sure exactly where things have gone awry, but I hope it's not too late to get back some community luuuuuurve: for all their software failings, the OLPC is an awesome awesome project that has changed, and hopefully will continue to change, this world we live in. Advice and thoughts on this issue welcome; I will post (or re-post) those that I think are especially worthy of attention.

One interesting idea: one person suggested that after having done so many impossible things already, the OLPC folk think that software is going to be one more example where they have to break the mold. Well, guys, if you think you can break out of the Software Death Spiral without building in any automated testing, I think you're batshit crazy...

Sunday, March 16, 2008

Math Report in a Nutshell

Region 19 BOE Gazette:

The report decides that the existing education regime has failed and recommends yet more of it.

Saturday, March 15, 2008


Laura Miller

In the concluding pages of "Superclass" it becomes increasingly difficult to dispel the impression that you have just read what amounts to a 380-page business card. Many recent nonfiction books on "current affairs" are little more than that. Organized around a catchy concept and extensively researched by underlings, they win their authors jobs in think tanks and speaking engagements at corporate workshops and conferences -- all of which pay much, much more than anyone can expect to make on a book. There are a handful of important ideas in "Superclass," it's true, but many of them have been gleaned from other, more original thinkers. There are also a lot of facts and statistics, presumably gathered by Rothkopf's assistants.

Friday, March 14, 2008

At PyCon

I'm at PyCon for the next week. It seems to be over twice as big as last year, which was significantly bigger than the previous year, so overall I'm feeling pretty good about the bet I placed on Python almost a decade ago. This will also be a significant meeting of XO hackers. I'm going to use the XO as my main computer over the next couple of days, which I haven't done for a while...

Gruber on "One App at a Time"

I suspect Sugar ought to take a few pages from the iPhone:

Why has Apple imposed this limitation? Easy: the iPhone is severely resource constrained. Battery, RAM, and CPU cycles are all severely limited. If third-party apps could run in the background, all three could suffer. RAM would suffer for sure; all running apps consume memory. The iPhone has just 128 MB of RAM, and no swap space. CPU performance and battery life would suffer when background apps do something — and if they’re not doing anything, what’s the point of keeping them running? I noticed a significant increase in battery life after I switched the Mail app’s auto-checking interval from 15 minutes to 60 minutes. That’s just one app...
And, the iPhone engineering team has gone to extraordinary lengths to make sure these background apps have a minimal CPU/battery/memory footprint while they are running. Call them hypocrites if you will for disallowing third-parties from creating background-capable apps, but Apple only uses background processes itself for a handful of flagship apps. (The iPod app, for example, only runs in the background if you switch to another app while it’s playing an audio track. Otherwise, it too quits on switch.)

In particular, the XO would benefit from somebody taking some extraordinary lengths to optimize its Browser's footprint.

Wednesday, March 12, 2008

Ethan Zuckerman's Cute Cat Theory

Read the whole thing:

Blocking banal content on the internet is a self-defeating proposition. It teaches people how to become dissidents - they learn to find and use anonymous proxies, which happens to be a key first step in learning how to blog anonymously. Every time you force a government to block a web 2.0 site - cutting off people’s access to cute cats - you spend political capital. Our job as online advocates is to raise that cost of censorship as high as possible.

Tuesday, March 11, 2008

Luckily, I'm not in the Basement Anymore


Students at Nathanael Greene, the district’s largest middle school with 820 pupils, are feeling the effects of the so-called leadership vacuum. Since Thomas left, there have been three major food fights, during which apples and milk cartons were hurled across the cafeteria, and a couple of fights after school, one of which was videotaped and posted on a popular social networking site, according to Mary Beth Calabro, a special education teacher and vice president of the Providence Teachers Union.

It was my year in a room off the cafeteria at Nathanael Greene that broke my will to teach English. Don't recall having any food fights going on outside the room, though. Unfortunately, there doesn't seem to be any video of the food fight online.

Advocating for Boring

I'm about half-way through Clay Shirky's Here Comes Everybody. In my dreams I'll write a proper review, but for the moment I'll just say if you want to understand what's going on on the internet today, you can't do better than this book. It is an optimistic, but distinctly post-utopian and largely hype-free explanation of the landscape. Much better than A Whole New World is Miscellaneous.

This quote leaped out at me last night:

Communication tools don't get socially interesting until they get technologically boring. The invention of a tool doesn't create change; it has to have been around long enough that most of society is using it. It's when a technology becomes normal, then ubiquitous, and finally so pervasive as to be invisible, that the really profound changes happen, and for young people today, our new social tools have passed normal and are heading to ubiquitous, and invisible is coming.

The problem is, this cuts against the conventional wisdom about how to do computing in schools, which is to have an ambitious, expensive program front loaded with professional development, aiming to meet short term goals. I've long argued that we would be better off aiming for inexpensive, sustainable, ubiquitous computing that is sufficiently cheap that it doesn't carry significant short-term performance pressures.

I believe that many initiatives in the current 1:3 or 1:4 student to computer ratio schools have failed not because of the lack of training, but the lack of ubiquity and consistency. A teacher must feel that the computers are here, they work, and they aren't going away, ever. EVER. If they can't believe all three, how can you argue that they aren't wasting their time? Why should they care about your professional development? The process has to be bootstrapped by putting technology in the hands of teachers and students. This is obvious, but we prefer to pretend it isn't true, mostly, I suppose, because it opens the door for charlatans to just sell loads of crappy computers to schools. Nonetheless, we just have to get over it and buy better technology, or we'll just continue walking around in smaller and smaller circles.

Some might say that cell phones fit the role of ubiquitous computers for kids today. They do socially, but it is harder to be institutionally ubiquitous. If I can't say, "ok kids, take out your X and do Y," with reasonably expectation that every student should have X and Y, then aren't ubiquitous in a school. Cell phone technology and its commercial implementation in the US is a long way from that point.

Thursday, March 06, 2008

I Prefer a Closed Primary


The decisive swing vote will come in the Philadelphia suburbs, which is just about the only growing area of the state. It is also a region trending blue very quickly, and has a lot of independents and Republicans who are voting for Democrats in general elections, but who can't vote in the primary unless they register as Democrats by March 24th. This three-week registration period could decide the election, since these "new" Democrats will favor Obama, while the currently registered Democrats probably favor Clinton.

Wednesday, March 05, 2008

Sugar Redesign Proposal

I put off looking at this set of proposals on the OLPC wiki for refining the Sugar UI, in part because it is hard to say what the best case timeframe for getting them done would be, in part because the amount of things that ought to be changed makes me vaguely nauseous. But, having looked at the proposal, it does seem like a good set of improvements that wouldn't be too difficult to implement.

I'd be curious to see if Mark's kids could grok this and offer kid's point of view feedback. Of course, arguing about discussing UI elements is hard enough for adults -- we've been doing it for the past couple of weeks for SchoolTool -- so I'm not sure what a reasonable expectation is from kids.

Kill Me Now


whiz, wit.

Tuesday, March 04, 2008


Michael Connery:

Ironically, while progressives talk about a more fair and equal society, the progressive movement itself prior to 2003 was structured to the benefit of the children of the wealthy and well-connected.

The longer context might be necessary though:

While progressive youth activism existed, it was disorganized, disconnected from the party apparatus, understaffed and underfunded with little strategic vision. Ironically, while progressives talk about a more fair and equal society, the progressive movement itself prior to 2003 was structured to the benefit of the children of the wealthy and well-connected. With so little money or institutional support, what few training opportunities existed often came at the expense of the trainee. Internships at progressive institutions (think tanks, magazines), which provide valuable experience and networking connections, are frequently non-paying, with little in the way of economic help for those not from privileged families. Other entry points into the progressive movement – like canvassing operations – offered slave-labor wages and little in the way of upward mobility or skills training.


Chris Melissinos:

This is the first generation of gamers raising gamers.



A telecom and/or cable company is basically a PAC attached to a billing service...

Does This Include the Profession of Teaching?

Cathy Stasch:

For MacArthur, cohesion between projects is important because we believe that one of the primary barriers to the growth of the field of digital media and learning is the fragmentation that persists between both academic disciplines and professional fields.

Saturday, March 01, 2008

Meanwhile, My Other Team...

Ursha'Kahn gets a win in the first round of the Eve alliance tournament.

Note that I'm still way too much of a noob to particpate at this level.