price.mit.edu/blog

Archive for February, 2010

How to pretend bash is a real programming language, tip #13

2 comments

I wrote some throwaway shell code tonight that looked like this:

   for oo in $(cd .git/objects/ && ls ??/*); do
     o=${oo%/*}${oo#*/}
     # do something horrible with the Git object $o, which is in the file $oo
   done

It doesn’t matter now exactly what the code was for. But a collaborator wrote back to me:

> > o=${oo%/*}${oo#*/}
> How does this line work/what is it supposed to accomplish?  In
> particular not sure what the %foo and #foo do.

Stop for a moment: do you know how that line works? I wouldn’t have in my first years writing shell scripts.

This line demonstrates one of a repertoire of tricks I’ve picked up to get some things done in bash that might otherwise require invoking a separate program. None of these will be news to shell-programming experts, but I sure didn’t know all of them when I started writing in shell. Here’s a little braindump on one of my favorite tricks, and where to read about more.

The best documentation for Bash is the info page—the specific pages I find myself referring to most often are under “info bash” -> “Basic Shell Features” -> “Shell Expansions”. (If you’ve never tried it, you’ve been missing out! Type “info bash” at your favorite prompt. But not on a Debian or Ubuntu machine, where the info page is missing due to a stupid licensing dispute. Info is the home of the best documentation available for Bash, GCC, GDB, Emacs, miscellaneous GNU utilities, and Info itself.)

This feature is under “Shell Parameter Expansion” there.

`${PARAMETER#WORD}'
`${PARAMETER##WORD}'
     The WORD is expanded to produce a pattern just as in filename
     expansion (*note Filename Expansion::).  If the pattern matches
     the beginning of the expanded value of PARAMETER, then the result
     of the expansion is the expanded value of PARAMETER with the
     shortest matching pattern (the `#' case) or the longest matching
     pattern (the `##' case) deleted.

The % and %% features work similarly, with “beginning” substituted with “end”.

My mnemonic for # versus % is that $ is for the variable; # is to the left of $, so it strips from the left, and % is to the right, so it strips from the right. I suspect this is the actual motivation for the choice of # and %, though I’m curious to see evidence to confirm or refute that thought.

So after my line o=${oo%/*}${oo#*/}, o consists of the part of oo to the left of the last slash, and then the part of oo to the right of the first slash. Since there should be just one slash in oo, it has the effect of making o be everything but the slash.

That makes one trick I use all the time. There’s plenty more, and those Info pages explain many of them. I’m not sure all these tricks are a good thing on balance—they serve as a crutch to make the shell go further, when maybe I should just be quicker to switch to a real programming language. But they sure come in handy.

Written by Greg Price

February 27th, 2010 at 4:56 am

Posted in Uncategorized

Tagged with , ,

Unlocking the Clubhouse, part 1: it’s not about innate differences

leave a comment

If you work on computing in school, on the side, or in industry and you’ve been paying attention to the people around you, you’ve probably wondered why so many fewer women than men enter our field and stay in it.

This is no immutable law. In fact, the proportion of women in computer science in the United States was once much higher. Of people receiving bachelor’s degrees in computer science, women made up nearly 40% in the mid-1980s, declining to 20% in 2006. (graphs, NSF data.) And it varies among cultures, too—in Malaysia, women actually outnumber men in computer science. (data, analysis)

So the natural way to ask the question is in this form: What are we doing in computer science that causes so many fewer women than men to enter our field and to stay in it? And what can we do differently to change that?

Recently I picked up a book on this subject. Unlocking the Clubhouse is the product of a collaboration between Jane Margolis, a social scientist studying gender and education, and Allan Fisher, the founding dean of the undergraduate program in computer science at Carnegie Mellon University.

The authors gather scores of previous studies, and they did their own work from a privileged position at the helm of the undergraduate program at Carnegie Mellon University. Their success at answering these questions may be indicated by the reversal they achieved of national trends at CMU in the five years of their research:

  • Before the authors’ work, the proportion of women among entering freshmen ranged from 5% to 7% over the five years 1991-1995. At the conclusion of their project in 2000, this proportion had reached 42%.
  • Graph of percentage of women in computer science freshman class at CMU, 1989-2000

  • Of students entering the program at the start of the project in 1995, only 42% of women remained after two years. This rate rose to 80% for women entering in 1996, and stabilized at nearly 90%. The rate among men was steady around 90%.
  • Graph of persistence rate for women and men in undergraduate computer science at CMU, 1995-1998

With that kind of success in practice, it’s clear their scientific findings and their recommendations have earned serious consideration. In a future post I’ll say more about those, and I’ll also look at what some other people have found on the subject. Ironically, it turns out one result of Margolis and Fisher’s success may have been to invalidate some of their findings in the new environment they created.

Written by Greg Price

February 15th, 2010 at 1:59 am

How many MIT students does it take to change computing?

3 comments

A new SIPB chair and vice-chair are taking office tomorrow, and the other night several of their predecessors took an evening to give them an orientation.

SIPB has two priorities: people and projects. Each active project has its own organizers, maintainers, and/or developers who move it forward and make its decisions, so the role of the chair and vice-chair is about keeping track of how things go, helping connect the project to outside resources and connect new contributors to the project, mediating shared resources like the machine room, and making sure that key projects get passed on from year to year.

It makes sense, then, that we spent most of our time talking about people—bringing people in the door at SIPB, making the office a welcoming place for them, drawing them into our community, and electing them as members. We hear in almost every membership election about how the organization could do better at this. Here’s a quick version of why it’s so important:

Every year, about 1/3 of student SIPB members graduate.

Put another way, in steady state:

Size of SIPB = 3 * (# new members / year)

For example, right now SIPB has 26 student members, and by my count 9 are planning to leave MIT in June. So the only way SIPB can stay as strong as it is is to get 9 new members this year, and about as many again the next year, and the next year, and so on. Fewer new members ⇒ fewer members ⇒ fewer awesome projects, fewer people to learn from, fewer people to hire away to Ksplice (ahem, maybe not everyone shares that motivation).

Fortunately, we built a good track record over the last few years:


    academic year          freshmen &
      starting     total   sophomores

        2010        ???       ???
        2009         7+        4+
        2008         9         6
        2007         8         2
        2006        10         3
        2005        10         2
        2004         5         3
        2003         4         2
        2002        10         3
        2001         8         2
        2000         3         2

From those numbers in the last five years, it’s not hard to see how we got the organization to the point where three strong candidates stood at the last election for chair, and where the office is full to crowding at nearly every Monday’s meeting. It’s also clear how it wasn’t always this way—the numbers from the 2004 and 2003 academic years led directly to the election of 2005 in which the nine-member EC comprised every student member of the SIPB.

But my favorite aspect of these numbers is in the column on the right. When I was the chair in 2008-9, I put an emphasis on getting people involved in SIPB in their first and second years. I’ve heard a lot of people’s stories over the years of showing up at SIPB as a freshman or sophomore, going away for a variety of reasons, and finally coming back two or three or more years later and becoming members. Some of them went on to become highly active and valued contributors, and it’s too bad for everyone that we didn’t succeed in bringing them in the first time around. With the record 6 freshman and sophomore new members in the 2008 academic year, I think we succeeded in turning a lot of those stories around into members who will be active students for a long time. Edward and Evan have gotten this 2009 year to outpace 2008 so far, so the new team of Jess and Greg have the chance to finish it at another record. 2010 will be theirs to create, and I wish them the best of luck in outdoing 2008 and 2009 both.

Written by Greg Price

February 15th, 2010 at 1:58 am

Posted in Uncategorized

Tagged with ,

Web design goes global

4 comments

Last month, we decided at Ksplice that it was time to redesign our website. We had a very clean website that we had developed in-house, but we’re finally selling our Linux security product directly on the Web, we’re beginning to seek greater publicity, and it was time to make a website focused completely on selling Ksplice-Uptrack-the-product rather than explaining Ksplice-the-company and Ksplice-the-technology. This time, we also wanted to get the design from a professional web designer, to see what they do differently.

In the olden days, I gather the way a company might have done this is to find a web design firm in SoHo (or maybe Beacon Hill or Brookline for us) and pay them $20K to make an array of gorgeous mockups on their 30″ Apple Cinema Displays in their loft offices, display them in a presentation for our admiration and our selection, and then turn out perfect XHTML.

I’m not sure it ever worked quite like that in the real world. But in any case it turns out that’s not the world we live in anymore—a good thing, too, because as a bootstrapped startup it’d be hard for us to justify spending $20K on a website at this stage. Instead of a firm, we hired a freelancer. Instead of SoHo, he lives in Sri Lanka. Instead of $20K, we paid on the order of $1K all told—and it would have been under $1K if we had started with a clear sense of how much things should cost in this market. And instead of a theatrically delivered client presentation in our office, the mockups were delivered in a string of emails as I communicated with our designer—let’s call him “Sajith”—over Google Talk to iterate through designs while North America was deep in the night. It’s not hard to see the logic that drives this new way of doing things: $1K is six months’ worth of per capita income in Sri Lanka. Sajith is making a nice living for himself on fees that even a cash-conserving startup like us can easily pay.

Here’s how it works:

  1. Post a project on a freelance job board—I used getafreelancer.com.
  2. Wait for bids. We were running on our characteristic tight schedule, so I set a 24-hour deadline, and in that interval 26 people and firms offered to do the project.
  3. Pick someone. Most of the bidders’ portfolios were terrible, some qualified as mediocre. I went for the only one whose portfolio looked good.
  4. Get mockups and iterate. This was the next-to-longest stage of the process. I spent many late hours with Sajith as he presented a mockup image of the page, I made comments, he went off for a few minutes to implement them and I half-worked on something else until he came back, etc. One of the pitfalls: English language skills may not be what you hope for. I learned to give instructions in short sentences, and gave up on trying to get text right in favor of us correcting it later.
  5. Slice. This is apparently the word for taking a mockup and producing an HTML document, CSS, etc, that implements it. More on this step below.
  6. Integrate and polish. Our sites comprised a Django application with a handful of distinct pages, and our mostly-static main website with a large number of pages, of which we were only redesigning/adding a few of the most critical for selling our product. Taking Sajith’s static HTML and turning it into Django templates that behave in all the right ways, getting all the right text in place, and dealing with final improvements to the HTML and the CSS to the point where we were happy, consumed about three person-weeks of engineering time on our end. I knew there would be work to do here, but we were completely unprepared for how much work it was.

I’ll close with the episode in this process that really made me feel I was living inside a Tom Friedman book. We were humming along on our aggressive schedule, and one day Sajith failed to deliver on a deadline: he was to get a page of sliced HTML to me that morning, his evening. I pressed him as the hours passed—OK, I understand some details aren’t finished, can you get me what you have? We want to start on integrating it. Eventually he confessed that he wasn’t doing this work himself; rather, he had himself gone out and found a subcontractor, somewhere else in the world, who had promised to do the slicing, and the subcontractor had failed to come through. (Apparently he had done the job and then accidentally deleted it—someone needs to learn about source control.) After another failed subcontract attempt the next day, Sajith gave up and did the slicing himself. Maybe the outcome of Sajith’s experiment suggests we don’t live in Friedman-flattopia just yet.

Written by Greg Price

February 8th, 2010 at 5:46 am

Posted in Uncategorized

Tagged with , , ,

A simple code review script for Git

3 comments

This January, Ksplice swelled from 8 people to 20 people. You can imagine what that did to the office—it’s a good thing that Tim and Jeff have been practicing the art of rearranging a space to fit more people than ever thought possible since their days in the SIPB office. Fortunately, because we have at our disposal a computer-systems training and recruitment machine of awesome effectiveness, our interns defied Fred Brooks and produced a great deal of useful code.

The problem: how to keep track of all that new code and get it all reviewed smoothly? Our ad hoc practices relying on one-on-one exchanges clearly were not going to scale. The solution: on the first day the new interns showed up, I took a couple of hours and threw together a script to request code reviews. The key design considerations were

  • Public visibility. The script sends mail, and CCs a list going to the whole team.
  • Non-diffusion of responsibility. The user must identify someone to be the reviewer, and the request is addressed to them.
  • Git friendliness. Being a kernel shop, we use Git for everything, so the script assumes a basic Git workflow: you make some commits, and then you request a whole series of commits to be reviewed at once.

We looked at some existing code review tools like Gerrit and Rietveld, but we weren’t happy with any of them because the ones we found don’t work on branches—they work on individual commits—and we have drunk too deeply of Git to be satisfied working that way.

On the other hand, being the product of a few hours’ work, there’s several things that could be made better. The interaction with the user could be better to prevent mis-sends. The script could do better at detecting what the repository in question is called, it could take advantage of commit messages for better subject lines, and it could try to give the reviewer a command line that will show the commits in question. (Until then: it’s usually git log -p --reverse origin/master..origin/branchname.) Someday we may also want a system that tracks what commits have been reviewed by whom and blocks commits from going in without review; that will be a bigger project.

Apparently we did something right with this script, because I heard a couple of people say they’d like to use it outside of Ksplice. So the other day we decided we were happy to release it as free software. As of last night you can get it from Github—enjoy.

Let me know if you use it, and patches welcome, of course.

Written by Greg Price

February 1st, 2010 at 3:56 am

Posted in Uncategorized

Tagged with , , ,