Archive for the ‘sipb’ tag

Strace: The Lost Chapter


I wrote another post last week for the Ksplice blog: Strace — The Sysadmin’s Microscope. If you’re running a Linux system, or just writing or maintaining a complex program, sometimes strace is indispensable — it’s the tool that tells you what a program is really doing. My post explains why strace is so good at showing the interesting events in a program (hint: it gets to sit between the program and everything else in the universe), describes some of its key options, and shows a few ways you can use it to solve problems.

Unfortunately there’s only so much you can say in a blog post of reasonable length, so I had to cut some of my favorite uses down to bullet points. Here’s one such use, which I can’t bear to keep off of the Web, just because I thought I was so clever when I came up with it in real life a couple of months ago.

(If you haven’t already, I encourage you to go read the main post first. I’ll be here when you come back.)

Strace As A Progress Bar

Sometimes you start a command, and it turns out to take forever. It’s been three hours, and you don’t know if it’s going to be another three hours, or ten minutes, or a day.

This is what progress bars were invented for. But you didn’t know this command was going to need a progress bar when you started it.

Strace to the rescue. What work is your program doing? If it’s touching anything in the filesystem while it works, or anything on the network, then strace will tell you exactly what it’s up to. And in a lot of cases, you can deduce how far into its job it’s gotten.

For example, suppose our program is walking a big directory tree and doing something slow. Let’s simulate that with a synthetic directory tree and a find that just sleeps for each directory:

  $ mkdir tree && cd tree
  $ for i in $(seq 1000); do mktemp -d -p .; done >/dev/null
  $ find . -exec sleep 1 \;

Well, this is taking a while. Let’s open up another terminal and ask strace what’s going on:

  $ pgrep find
  $ strace -p 2714
  fstat(5, {st_mode=S_IFDIR|0700, st_size=4096, ...}) = 0
  fchdir(5)                               = 0
  getdents(5, /* 2 entries */, 4096)      = 48
  getdents(5, /* 0 entries */, 4096)      = 0
  close(5)                                = 0
  fstat(5, {st_mode=S_IFDIR|0755, st_size=36864, ...}) = 0
  fchdir(5)                               = 0
  close(5)                                = 0
  newfstatat(AT_FDCWD, "tmp.MiHDWiBURu", {st_mode=02, st_size=17592186044416, ...}, AT_SYMLINK_NOFOLLOW) = 0
  clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fb19c92a770) = 13044
  wait4(13044, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 13044
  --- SIGCHLD (Child exited) @ 0 (0) ---
  fstat(5, {st_mode=S_IFDIR|0700, st_size=4096, ...}) = 0
  fchdir(5)                               = 0

The find just looked at tmp.HvbzfbbWSa, and now it’s going into tmp.MiHDWiBURu. How far is that into the total? ls will tell us the list of directories that the find is working from; we just have to tell it to give them to us in the raw, unsorted order that the parent directory lists them in, with the -U flag. And then grep -n will tell us where in that list the entry tmp.HvbzfbbWSa appears:

  $ ls -U | grep -n tmp.HvbzfbbWSa
  $ ls -U | grep -n tmp.MiHDWiBURu
  $ ls -U | wc -l

So tmp.HvbzfbbWSa is entry 258 out of 1000 entries in this directory — we’re 25.8% of the way there. If it’s been four minutes so far, then we should expect about twelve more minutes to go.

(But With The Benefit Of Foresight…)

I’d be remiss if I taught you this hackish approach without mentioning that if you realize you want a progress bar before you start the command, you can do it much better — after all, the ‘progress bar’ above doesn’t even have a bar, except in your head.

Check out pv, the pipe viewer. In my little example, you’d have the command itself print out where it is, like so:

  $ find . -exec sh -c 'echo $1 && sleep 1' -- \{\} \;

and then you could get a real, live, automatically-updated progress bar, like so:

  $ find . -exec sh -c 'echo $1 && sleep 1' -- \{\} \; \
     | pv --line-mode --size=$(find . | wc -l) >/dev/null
   175 0:02:57 [0.987/s ] [=====>                               ] 17% ETA 0:13:55

Here we’ve passed --line-mode to make pv count lines instead of its default of bytes, and --size with an argument to tell it how many lines to expect in total. Even if you can’t estimate the size, pv will cheerfully tell you how far you’ve gone, how long it’s been, and how fast it’s moving right now, which can still be handy. pv is a pretty versatile tool in its own right — explaining all the ways to use it could be another whole blog post. But the pv man page is a good start.

That’s Just One

There’s lots of other ways to use strace — starting with the two I described in my main post, and the three more, besides this one, that I only mentioned there. I don’t really know anymore how I used to manage without it.

Liked this post? Subscribe and keep ‘em coming.

Written by Greg Price

August 16th, 2010 at 12:50 am

Preventing SQL Undeath (Killing a MySQL Query For Real)


Sometimes a MySQL query doesn’t die when you think it does.

If you’ve spent much time with MySQL, you’ve probably tried a query from the mysql command line, changed your mind after it didn’t return for a while, and hit C-c:

$ mysql -h -u guest youtomb

mysql> SELECT COUNT(DISTINCT status) FROM artifacts;
^CCtrl-C -- sending "KILL QUERY 183" to server ...
Ctrl-C -- query aborted.
ERROR 1317 (70100): Query execution was interrupted

That kills the query. But notice what the mysql program is telling you after you hit that C-c: it’s sending a separate command, namely the “query” “KILL QUERY 183“, to the server.

In fact, that KILL query is the only way to get the MySQL server to stop running a query. In particular, the MySQL server is really bad at noticing when a client goes away. Suppose instead of hitting C-c, which the mysql program traps and handles in a smart way, I simply kill the program by hitting C-\:

mysql> SELECT COUNT(DISTINCT status) FROM artifacts;

Then in fact the server keeps running the query. If I fire up the MySQL client anew and issue the query SHOW PROCESSLIST, I can see the query still chugging away:

$ mysql -h -u guest youtomb

*************************** 1. row ***************************
     Id: 183
   User: guest
   Host: OPUS.MIT.EDU:37938
     db: youtomb
Command: Query
   Time: 4
  State: Sending data
   Info: SELECT COUNT(DISTINCT status) FROM artifacts
*************************** 2. row ***************************
     Id: 185
   User: guest
   Host: OPUS.MIT.EDU:37940
     db: youtomb
Command: Query
   Time: 0
  State: NULL
2 rows in set (0.00 sec)

That \G is an alternative to the semicolon that makes the format more readable here.

If I want to actually kill the query, I can do it with the same meta-query my client used automatically upon C-c:

mysql> KILL QUERY 183;
Query OK, 0 rows affected (0.02 sec)

Et voilĂ :

*************************** 1. row ***************************
     Id: 185
   User: guest
   Host: OPUS.MIT.EDU:37940
     db: youtomb
Command: Query
   Time: 0
  State: NULL
1 row in set (0.00 sec)

Now, why would you or I care? After all, nobody in their right mind goes about hitting control-backslash or employing equally messy means to kill their MySQL clients. And control-C behaves just as you’d hope — so long as you are using the mysql command-line client.

Where the story isn’t so good is on a typical other client program. The KILL behavior on control-C is a feature of the mysql program, not of the MySQL C API. (If you think about it, it involves installing a signal handler — not something a well-behaved library will just do.) And because it’s not a feature of the MySQL C API, it’s probably not a feature of your favorite language’s MySQL bindings, which wrap that API. In particular, I know it’s not a feature of MySQLdb, the leading Python bindings.

So suppose you write a Python script to do some MySQL queries… and you have a big honking table in your database, and you write an inefficient query… and the query planner resorts to copying most of the table to a temporary table… and after a couple of hours you kill the Python script with control-C or kill or some other means because it’s taking forever. The query will keep running. And the next day maybe it’s copied enough that it fills up your disk, and the database has an outage.

I wish that were a hypothetical. Fortunately, the MySQL server will then remove the temporary table and the disk will have space again. If you’re lucky, the server will even come back up.

Lesson: when you want to kill a MySQL query, make sure it dies. Use SHOW PROCESSLIST to check and KILL QUERY to kill.

Written by Greg Price

June 28th, 2010 at 12:35 am

Posted in Uncategorized

Tagged with , ,

Ksplice and the intern army meet the Internet

leave a comment

This week I wrote a post on the Ksplice blog, our first substantive post, following an intro post by Waseem. As I mentioned last month, we swelled from 8 to 20 people this January with interns, and were triumphant in making the whole scheme work productively. If you want to know how we did it, read the post. In fact, just go read it. I’ll wait.

The crackerjack Ksplice PR team (*) got my post to show prominently all day Wednesday on Reddit and Hacker News, and then it went up on Slashdot all Wednesday evening and Thursday during the day. Traffic numbers were much, much more than anything else I’ve ever written, except YouTomb.

Naturally, we learned some things about interacting with your average comment-leaving reader on the Internet. The first wave of comments, a few both on link aggregators and on the post itself, were vicious denunciations of us for the (apparently) illegal practice of employing unpaid interns to do real work. These commenters were of course wrong—you can’t get any intern in software for free, let alone the kind of people we wanted, and we paid as much or more than they could make with their skills in research jobs on campus. I clarified that, I and others replied, and the comments shifted to mostly positive. Then when we landed on Slashdot, the text was a classic opposite-of-the-article Slashdot item: we had claimed to “bust” Fred Brooks’ pioneering observations on software project management. Dozens of commenters poured in to grouch that we hadn’t disproved his law, only sidestepped it—which was of course our point.

Fortunately, not all commenters are just being wrong. We had several good comments, but this afternoon came one last comment from a source far beyond any response I imagined. I feel a twinge of regret now for comparing the OS/360 project to Windows Vista, apt though it was. Prof. Brooks, of couse, did far better than the Vista managers in the end, in that he learned lessons from the experience and put them in a book that the whole profession learned from.

How we’re going to top that comment in our next post, I don’t know—it might be tough, for example, to get a comment from a man who hasn’t used email since before blogging was invented.

(*) Namely, us and our friends on zephyr/twitter lending a few upvotes to our posts. Several others at Ksplice made substantial comments and edits before the post was published, too, which greatly improved it.

[Update, 2010-03-18: there is now a straight-up newspaper-style article about... the comment threads on my post. The Internet never ceases to amaze me.]

Written by Greg Price

March 15th, 2010 at 3:23 am

How to pretend bash is a real programming language, tip #13


I wrote some throwaway shell code tonight that looked like this:

   for oo in $(cd .git/objects/ && ls ??/*); do
     # do something horrible with the Git object $o, which is in the file $oo

It doesn’t matter now exactly what the code was for. But a collaborator wrote back to me:

> > o=${oo%/*}${oo#*/}
> How does this line work/what is it supposed to accomplish?  In
> particular not sure what the %foo and #foo do.

Stop for a moment: do you know how that line works? I wouldn’t have in my first years writing shell scripts.

This line demonstrates one of a repertoire of tricks I’ve picked up to get some things done in bash that might otherwise require invoking a separate program. None of these will be news to shell-programming experts, but I sure didn’t know all of them when I started writing in shell. Here’s a little braindump on one of my favorite tricks, and where to read about more.

The best documentation for Bash is the info page—the specific pages I find myself referring to most often are under “info bash” -> “Basic Shell Features” -> “Shell Expansions”. (If you’ve never tried it, you’ve been missing out! Type “info bash” at your favorite prompt. But not on a Debian or Ubuntu machine, where the info page is missing due to a stupid licensing dispute. Info is the home of the best documentation available for Bash, GCC, GDB, Emacs, miscellaneous GNU utilities, and Info itself.)

This feature is under “Shell Parameter Expansion” there.

     The WORD is expanded to produce a pattern just as in filename
     expansion (*note Filename Expansion::).  If the pattern matches
     the beginning of the expanded value of PARAMETER, then the result
     of the expansion is the expanded value of PARAMETER with the
     shortest matching pattern (the `#' case) or the longest matching
     pattern (the `##' case) deleted.

The % and %% features work similarly, with “beginning” substituted with “end”.

My mnemonic for # versus % is that $ is for the variable; # is to the left of $, so it strips from the left, and % is to the right, so it strips from the right. I suspect this is the actual motivation for the choice of # and %, though I’m curious to see evidence to confirm or refute that thought.

So after my line o=${oo%/*}${oo#*/}, o consists of the part of oo to the left of the last slash, and then the part of oo to the right of the first slash. Since there should be just one slash in oo, it has the effect of making o be everything but the slash.

That makes one trick I use all the time. There’s plenty more, and those Info pages explain many of them. I’m not sure all these tricks are a good thing on balance—they serve as a crutch to make the shell go further, when maybe I should just be quicker to switch to a real programming language. But they sure come in handy.

Written by Greg Price

February 27th, 2010 at 4:56 am

Posted in Uncategorized

Tagged with , ,

Unlocking the Clubhouse, part 1: it’s not about innate differences

leave a comment

If you work on computing in school, on the side, or in industry and you’ve been paying attention to the people around you, you’ve probably wondered why so many fewer women than men enter our field and stay in it.

This is no immutable law. In fact, the proportion of women in computer science in the United States was once much higher. Of people receiving bachelor’s degrees in computer science, women made up nearly 40% in the mid-1980s, declining to 20% in 2006. (graphs, NSF data.) And it varies among cultures, too—in Malaysia, women actually outnumber men in computer science. (data, analysis)

So the natural way to ask the question is in this form: What are we doing in computer science that causes so many fewer women than men to enter our field and to stay in it? And what can we do differently to change that?

Recently I picked up a book on this subject. Unlocking the Clubhouse is the product of a collaboration between Jane Margolis, a social scientist studying gender and education, and Allan Fisher, the founding dean of the undergraduate program in computer science at Carnegie Mellon University.

The authors gather scores of previous studies, and they did their own work from a privileged position at the helm of the undergraduate program at Carnegie Mellon University. Their success at answering these questions may be indicated by the reversal they achieved of national trends at CMU in the five years of their research:

  • Before the authors’ work, the proportion of women among entering freshmen ranged from 5% to 7% over the five years 1991-1995. At the conclusion of their project in 2000, this proportion had reached 42%.
  • Graph of percentage of women in computer science freshman class at CMU, 1989-2000

  • Of students entering the program at the start of the project in 1995, only 42% of women remained after two years. This rate rose to 80% for women entering in 1996, and stabilized at nearly 90%. The rate among men was steady around 90%.
  • Graph of persistence rate for women and men in undergraduate computer science at CMU, 1995-1998

With that kind of success in practice, it’s clear their scientific findings and their recommendations have earned serious consideration. In a future post I’ll say more about those, and I’ll also look at what some other people have found on the subject. Ironically, it turns out one result of Margolis and Fisher’s success may have been to invalidate some of their findings in the new environment they created.

Written by Greg Price

February 15th, 2010 at 1:59 am

How many MIT students does it take to change computing?


A new SIPB chair and vice-chair are taking office tomorrow, and the other night several of their predecessors took an evening to give them an orientation.

SIPB has two priorities: people and projects. Each active project has its own organizers, maintainers, and/or developers who move it forward and make its decisions, so the role of the chair and vice-chair is about keeping track of how things go, helping connect the project to outside resources and connect new contributors to the project, mediating shared resources like the machine room, and making sure that key projects get passed on from year to year.

It makes sense, then, that we spent most of our time talking about people—bringing people in the door at SIPB, making the office a welcoming place for them, drawing them into our community, and electing them as members. We hear in almost every membership election about how the organization could do better at this. Here’s a quick version of why it’s so important:

Every year, about 1/3 of student SIPB members graduate.

Put another way, in steady state:

Size of SIPB = 3 * (# new members / year)

For example, right now SIPB has 26 student members, and by my count 9 are planning to leave MIT in June. So the only way SIPB can stay as strong as it is is to get 9 new members this year, and about as many again the next year, and the next year, and so on. Fewer new members ⇒ fewer members ⇒ fewer awesome projects, fewer people to learn from, fewer people to hire away to Ksplice (ahem, maybe not everyone shares that motivation).

Fortunately, we built a good track record over the last few years:

    academic year          freshmen &
      starting     total   sophomores

        2010        ???       ???
        2009         7+        4+
        2008         9         6
        2007         8         2
        2006        10         3
        2005        10         2
        2004         5         3
        2003         4         2
        2002        10         3
        2001         8         2
        2000         3         2

From those numbers in the last five years, it’s not hard to see how we got the organization to the point where three strong candidates stood at the last election for chair, and where the office is full to crowding at nearly every Monday’s meeting. It’s also clear how it wasn’t always this way—the numbers from the 2004 and 2003 academic years led directly to the election of 2005 in which the nine-member EC comprised every student member of the SIPB.

But my favorite aspect of these numbers is in the column on the right. When I was the chair in 2008-9, I put an emphasis on getting people involved in SIPB in their first and second years. I’ve heard a lot of people’s stories over the years of showing up at SIPB as a freshman or sophomore, going away for a variety of reasons, and finally coming back two or three or more years later and becoming members. Some of them went on to become highly active and valued contributors, and it’s too bad for everyone that we didn’t succeed in bringing them in the first time around. With the record 6 freshman and sophomore new members in the 2008 academic year, I think we succeeded in turning a lot of those stories around into members who will be active students for a long time. Edward and Evan have gotten this 2009 year to outpace 2008 so far, so the new team of Jess and Greg have the chance to finish it at another record. 2010 will be theirs to create, and I wish them the best of luck in outdoing 2008 and 2009 both.

Written by Greg Price

February 15th, 2010 at 1:58 am

Posted in Uncategorized

Tagged with ,

Web design goes global


Last month, we decided at Ksplice that it was time to redesign our website. We had a very clean website that we had developed in-house, but we’re finally selling our Linux security product directly on the Web, we’re beginning to seek greater publicity, and it was time to make a website focused completely on selling Ksplice-Uptrack-the-product rather than explaining Ksplice-the-company and Ksplice-the-technology. This time, we also wanted to get the design from a professional web designer, to see what they do differently.

In the olden days, I gather the way a company might have done this is to find a web design firm in SoHo (or maybe Beacon Hill or Brookline for us) and pay them $20K to make an array of gorgeous mockups on their 30″ Apple Cinema Displays in their loft offices, display them in a presentation for our admiration and our selection, and then turn out perfect XHTML.

I’m not sure it ever worked quite like that in the real world. But in any case it turns out that’s not the world we live in anymore—a good thing, too, because as a bootstrapped startup it’d be hard for us to justify spending $20K on a website at this stage. Instead of a firm, we hired a freelancer. Instead of SoHo, he lives in Sri Lanka. Instead of $20K, we paid on the order of $1K all told—and it would have been under $1K if we had started with a clear sense of how much things should cost in this market. And instead of a theatrically delivered client presentation in our office, the mockups were delivered in a string of emails as I communicated with our designer—let’s call him “Sajith”—over Google Talk to iterate through designs while North America was deep in the night. It’s not hard to see the logic that drives this new way of doing things: $1K is six months’ worth of per capita income in Sri Lanka. Sajith is making a nice living for himself on fees that even a cash-conserving startup like us can easily pay.

Here’s how it works:

  1. Post a project on a freelance job board—I used
  2. Wait for bids. We were running on our characteristic tight schedule, so I set a 24-hour deadline, and in that interval 26 people and firms offered to do the project.
  3. Pick someone. Most of the bidders’ portfolios were terrible, some qualified as mediocre. I went for the only one whose portfolio looked good.
  4. Get mockups and iterate. This was the next-to-longest stage of the process. I spent many late hours with Sajith as he presented a mockup image of the page, I made comments, he went off for a few minutes to implement them and I half-worked on something else until he came back, etc. One of the pitfalls: English language skills may not be what you hope for. I learned to give instructions in short sentences, and gave up on trying to get text right in favor of us correcting it later.
  5. Slice. This is apparently the word for taking a mockup and producing an HTML document, CSS, etc, that implements it. More on this step below.
  6. Integrate and polish. Our sites comprised a Django application with a handful of distinct pages, and our mostly-static main website with a large number of pages, of which we were only redesigning/adding a few of the most critical for selling our product. Taking Sajith’s static HTML and turning it into Django templates that behave in all the right ways, getting all the right text in place, and dealing with final improvements to the HTML and the CSS to the point where we were happy, consumed about three person-weeks of engineering time on our end. I knew there would be work to do here, but we were completely unprepared for how much work it was.

I’ll close with the episode in this process that really made me feel I was living inside a Tom Friedman book. We were humming along on our aggressive schedule, and one day Sajith failed to deliver on a deadline: he was to get a page of sliced HTML to me that morning, his evening. I pressed him as the hours passed—OK, I understand some details aren’t finished, can you get me what you have? We want to start on integrating it. Eventually he confessed that he wasn’t doing this work himself; rather, he had himself gone out and found a subcontractor, somewhere else in the world, who had promised to do the slicing, and the subcontractor had failed to come through. (Apparently he had done the job and then accidentally deleted it—someone needs to learn about source control.) After another failed subcontract attempt the next day, Sajith gave up and did the slicing himself. Maybe the outcome of Sajith’s experiment suggests we don’t live in Friedman-flattopia just yet.

Written by Greg Price

February 8th, 2010 at 5:46 am

Posted in Uncategorized

Tagged with , , ,

A simple code review script for Git


This January, Ksplice swelled from 8 people to 20 people. You can imagine what that did to the office—it’s a good thing that Tim and Jeff have been practicing the art of rearranging a space to fit more people than ever thought possible since their days in the SIPB office. Fortunately, because we have at our disposal a computer-systems training and recruitment machine of awesome effectiveness, our interns defied Fred Brooks and produced a great deal of useful code.

The problem: how to keep track of all that new code and get it all reviewed smoothly? Our ad hoc practices relying on one-on-one exchanges clearly were not going to scale. The solution: on the first day the new interns showed up, I took a couple of hours and threw together a script to request code reviews. The key design considerations were

  • Public visibility. The script sends mail, and CCs a list going to the whole team.
  • Non-diffusion of responsibility. The user must identify someone to be the reviewer, and the request is addressed to them.
  • Git friendliness. Being a kernel shop, we use Git for everything, so the script assumes a basic Git workflow: you make some commits, and then you request a whole series of commits to be reviewed at once.

We looked at some existing code review tools like Gerrit and Rietveld, but we weren’t happy with any of them because the ones we found don’t work on branches—they work on individual commits—and we have drunk too deeply of Git to be satisfied working that way.

On the other hand, being the product of a few hours’ work, there’s several things that could be made better. The interaction with the user could be better to prevent mis-sends. The script could do better at detecting what the repository in question is called, it could take advantage of commit messages for better subject lines, and it could try to give the reviewer a command line that will show the commits in question. (Until then: it’s usually git log -p --reverse origin/master..origin/branchname.) Someday we may also want a system that tracks what commits have been reviewed by whom and blocks commits from going in without review; that will be a bigger project.

Apparently we did something right with this script, because I heard a couple of people say they’d like to use it outside of Ksplice. So the other day we decided we were happy to release it as free software. As of last night you can get it from Github—enjoy.

Let me know if you use it, and patches welcome, of course.

Written by Greg Price

February 1st, 2010 at 3:56 am

Posted in Uncategorized

Tagged with , , ,

Read-Write Software

leave a comment

My favorite moments with free software are when I get annoyed with some manual task that a tool leaves me to do for myself, and then invent a feature that the tool should have to handle the task for me.

With any software free or proprietary, if I’m lucky the tool might have a configuration system powerful enough to let me effectively add the feature from the outside. But with free software, I don’t need the authors to have anticipated my needs—I can reach into the guts of the software itself and change it to work the way I want. If it’s a friendly codebase or if I’ve hacked on it before, I may be able to add my change in a few minutes. And hey presto: software that does exactly what I wanted. It’s a lot more fun than praying to the vendor and waiting a few years, and it’s faster and more reliable too.

So it went with Git one night last October. I was repeatedly revising a branch with git rebase -i. A couple of points along the branch were marked as branches of their own, so every time I changed something I would have to either

  • rebase the full branch, then do a dance with checkout and reset to update the sub-branches, carefully typing the correct new commit IDs;
  • rebase the full branch, then muck with update-ref with the same care about getting commit IDs right; or
  • rebase the first sub-branch, then use rebase --onto to move the next sub-branch on top of it, then rebase --onto again for the main branch

What I really wanted to do was just

  • rebase the full branch, and tell the sub-branches to come along for
    the ride.

Fortunately I’d worked on the code for Git’s interactive rebase before—at Ksplice we push Git to its limits in six different directions, and rebase -i we push beyond the limits of stock Git—so I knew where to find the moving parts that could do what I needed. Four minutes after having the idea, I was happily using the new feature.

If you want the feature too, it’s up on my Git git repo. Or you can wait until I get it upstream. Why haven’t I done that already? That’s another old story about software. My 4-minute, 4-line patch turned into 29 lines with documentation and with proper error handling, then 147 lines to make the feature easy to invoke, and then 231 lines with test cases. So I just finished all that work today. Maybe you’ll see the feature in Git 1.7.1 this spring.

Written by Greg Price

January 25th, 2010 at 2:30 am

Posted in Uncategorized

Tagged with , , ,

The Soul of a New Machine

leave a comment

At Ksplice, we put a lot of effort and discussion into how we manage projects, in part because we know we aren’t as good at it yet as we’d like to be. Books by managers and books for managers lie scattered around the office and employees’ rooms. So imagine my surprise and delight the day after Christmas when I opened The Soul of a New Machine, Tracy Kidder’s 1981 classic about the machines that made the computer age and the geeks who built them, and discovered it was about project management.

Keith sometimes remarks that Ksplice needs a documentarian. The Soul of a New Machine is the result of a project that had a documentarian, one who produced prose. The Eagle project at Data General set out to build a new computer, a 32-bit version of the existing 16-bit Eclipse line, just as DEC raced into the lead in the minicomputer market with the 32-bit VAX. Tracy Kidder, a writer for The Atlantic looking to write a story about technology, connected with his editor’s old college roommate, the leader of the Eagle project, Tom West.

The story that Kidder tells is full of implicit lessons that look as current today as they were in the computing projects of thirty years ago.

Schedules. Everyone knows that computing projects run slow. When the Eagle project started in July of 1978, it set an insanely fast timeline to have the whole computer architected, designed, built, and debugged by April 1979. They didn’t make it, of course, and many team members expected that from the start. But at every stage West, the engineer in charge, insisted on treating the schedule seriously—”come on, this schedule’s real”—setting intermediate deadlines as if the April date would be met. And though April slipped, the project was done in October, just fifteen months after it started and still a rapid turnaround by anyone’s count.

Delegation. On a big project, the person in charge can’t do everything themselves, or even keep a close eye on all of the work. They have to rely on others to do it right. Good managers know this, but it’s nerve-wracking to actually put it into practice. West itched to get into the lab and start debugging the prototype himself, telling Kidder, “Rocking back here in my chair and talking about doing it is one thing, but it makes me worry. It gives me a nauseous feeling, because I’m not doing it.” Eventually, as one lieutenant puts it, he “gripped the arms of his chair and decided to trust” the engineer leading the hardware team.

Perfect vs. done. Before Eagle, some of the same engineers had worked on projects to build a 32-bit machine from scratch. None of these projects were completed. Eagle would be tied by backward compatibility to the 16-bit Eclipse, and at the outset some engineers saw the idea as “a kludge on a kludge on a kludge”, or “a paper bag on the side of the Eclipse”, and wanted nothing to do with it. Yet West convinced them all to sign on to the project anyway, and in the end it was the compatible computer that was completed, sold well, and rescued Data General.

It’s not only the lessons that seem not to have changed. Some of the engineers on the Eagle project recall how as undergraduates they would “stay up all night and experience … the game of programming”, and we can all think of people thirty years later who, like a few of them, “started sleeping days and missed all their classes, thereby ruining their grades.” One comforting thought: Carl Alsing, the engineer in charge of the microcode team, was one of those who actually flunked out of school.

Finally, a word about the writing. The technical exposition is incredible. On the one hand, the reviewer for the New York Times heaped praise on the prose that enabled him to “follow every step” despite knowing nothing about computers (and a reviewer writing in 1981 could mean that in a much stronger sense than any reviewer typing into their laptop today.). From my very different perspective, I was fascinated to learn details about the faraway architectures and design constraints of a different era. And in 291 pages delving frequently into technical aspects of computer architecture, digital logic, and software, I never felt condescended to and I found not one mistake.

Maybe Ksplice should get a documentarian after all.

Written by Greg Price

January 11th, 2010 at 1:27 am

Posted in Uncategorized

Tagged with , ,