Strace: The Lost Chapter


I wrote another post last week for the Ksplice blog: Strace — The Sysadmin’s Microscope. If you’re running a Linux system, or just writing or maintaining a complex program, sometimes strace is indispensable — it’s the tool that tells you what a program is really doing. My post explains why strace is so good at showing the interesting events in a program (hint: it gets to sit between the program and everything else in the universe), describes some of its key options, and shows a few ways you can use it to solve problems.

Unfortunately there’s only so much you can say in a blog post of reasonable length, so I had to cut some of my favorite uses down to bullet points. Here’s one such use, which I can’t bear to keep off of the Web, just because I thought I was so clever when I came up with it in real life a couple of months ago.

(If you haven’t already, I encourage you to go read the main post first. I’ll be here when you come back.)

Strace As A Progress Bar

Sometimes you start a command, and it turns out to take forever. It’s been three hours, and you don’t know if it’s going to be another three hours, or ten minutes, or a day.

This is what progress bars were invented for. But you didn’t know this command was going to need a progress bar when you started it.

Strace to the rescue. What work is your program doing? If it’s touching anything in the filesystem while it works, or anything on the network, then strace will tell you exactly what it’s up to. And in a lot of cases, you can deduce how far into its job it’s gotten.

For example, suppose our program is walking a big directory tree and doing something slow. Let’s simulate that with a synthetic directory tree and a find that just sleeps for each directory:

  $ mkdir tree && cd tree
  $ for i in $(seq 1000); do mktemp -d -p .; done >/dev/null
  $ find . -exec sleep 1 \;

Well, this is taking a while. Let’s open up another terminal and ask strace what’s going on:

  $ pgrep find
  $ strace -p 2714
  fstat(5, {st_mode=S_IFDIR|0700, st_size=4096, ...}) = 0
  fchdir(5)                               = 0
  getdents(5, /* 2 entries */, 4096)      = 48
  getdents(5, /* 0 entries */, 4096)      = 0
  close(5)                                = 0
  fstat(5, {st_mode=S_IFDIR|0755, st_size=36864, ...}) = 0
  fchdir(5)                               = 0
  close(5)                                = 0
  newfstatat(AT_FDCWD, "tmp.MiHDWiBURu", {st_mode=02, st_size=17592186044416, ...}, AT_SYMLINK_NOFOLLOW) = 0
  clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fb19c92a770) = 13044
  wait4(13044, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 13044
  --- SIGCHLD (Child exited) @ 0 (0) ---
  fstat(5, {st_mode=S_IFDIR|0700, st_size=4096, ...}) = 0
  fchdir(5)                               = 0

The find just looked at tmp.HvbzfbbWSa, and now it’s going into tmp.MiHDWiBURu. How far is that into the total? ls will tell us the list of directories that the find is working from; we just have to tell it to give them to us in the raw, unsorted order that the parent directory lists them in, with the -U flag. And then grep -n will tell us where in that list the entry tmp.HvbzfbbWSa appears:

  $ ls -U | grep -n tmp.HvbzfbbWSa
  $ ls -U | grep -n tmp.MiHDWiBURu
  $ ls -U | wc -l

So tmp.HvbzfbbWSa is entry 258 out of 1000 entries in this directory — we’re 25.8% of the way there. If it’s been four minutes so far, then we should expect about twelve more minutes to go.

(But With The Benefit Of Foresight…)

I’d be remiss if I taught you this hackish approach without mentioning that if you realize you want a progress bar before you start the command, you can do it much better — after all, the ‘progress bar’ above doesn’t even have a bar, except in your head.

Check out pv, the pipe viewer. In my little example, you’d have the command itself print out where it is, like so:

  $ find . -exec sh -c 'echo $1 && sleep 1' -- \{\} \;

and then you could get a real, live, automatically-updated progress bar, like so:

  $ find . -exec sh -c 'echo $1 && sleep 1' -- \{\} \; \
     | pv --line-mode --size=$(find . | wc -l) >/dev/null
   175 0:02:57 [0.987/s ] [=====>                               ] 17% ETA 0:13:55

Here we’ve passed --line-mode to make pv count lines instead of its default of bytes, and --size with an argument to tell it how many lines to expect in total. Even if you can’t estimate the size, pv will cheerfully tell you how far you’ve gone, how long it’s been, and how fast it’s moving right now, which can still be handy. pv is a pretty versatile tool in its own right — explaining all the ways to use it could be another whole blog post. But the pv man page is a good start.

That’s Just One

There’s lots of other ways to use strace — starting with the two I described in my main post, and the three more, besides this one, that I only mentioned there. I don’t really know anymore how I used to manage without it.

Liked this post? Subscribe and keep ‘em coming.

Written by Greg Price

August 16th, 2010 at 12:50 am

Seeing a song spread

leave a comment

A few weeks ago I blogged about how many people say they “can’t sing”, and don’t sing even in a crowd, so they miss out on the fun and stay aloof from the shared expression of a crowd of lifted voices.

So the other day I was interested to run across some examples of the power of shared song, related by Dan Bricklin of VisiCalc fame. It starts with an account of the classic American civil-rights song We Shall Overcome. Wikipedia’s excellent and detailed article about the song has this 1867 account, from a minister and longtime abolitionist, of witnessing in action the spread of a new folk song:

I always wondered, about these, whether they had always a conscious and definite origin in some leading mind, or whether they grew by gradual accretion, in an almost unconscious way. On this point I could get no information, though I asked many questions, until at last, one day when I was being rowed across from Beaufort to Ladies’ Island, I found myself, with delight, on the actual trail of a song. One of the oarsmen, a brisk young fellow, not a soldier, on being asked for his theory of the matter, dropped out a coy confession. “Some good spirituals,” he said, “are start jess out o’ curiosity. I been a-raise a sing, myself, once.”

My dream was fulfilled, and I had traced out, not the poem alone, but the poet. I implored him to proceed.

“Once we boys,” he said, “went for to tote some rice, and de nigger-driver, he keep a-callin’ on us; and I say, ‘O, de ole nigger-driver!’ Den another said, ‘First thing my mammy told me was, notin’ so bad as a nigger-driver.’ Den I made a sing, just puttin’ a word, and den another word.”

Then he began singing, and the men, after listening a moment, joined in the chorus as if it were an old acquaintance, though they evidently had never heard it before. I saw how easily a new “sing” took root among them.

Does that happen so easily today? I’m not sure it does.

Bricklin finishes with a packed stadium singing We Shall Overcome at the 90th birthday of Pete Seeger, the folk singer and activist, who helped bring it to a wide audience. I’m sure not everyone in that stadium was singing in tune or knew all the words, but for the song to have its impact that didn’t matter.

Time to stop writing—it’s July 4th and I’m off to the Esplanade to await the fireworks. The crowd will sing the national anthem and other songs for fun and patriotism.

Written by Greg Price

July 4th, 2010 at 7:04 pm

Posted in Uncategorized

Tagged with ,

Preventing SQL Undeath (Killing a MySQL Query For Real)


Sometimes a MySQL query doesn’t die when you think it does.

If you’ve spent much time with MySQL, you’ve probably tried a query from the mysql command line, changed your mind after it didn’t return for a while, and hit C-c:

$ mysql -h -u guest youtomb

mysql> SELECT COUNT(DISTINCT status) FROM artifacts;
^CCtrl-C -- sending "KILL QUERY 183" to server ...
Ctrl-C -- query aborted.
ERROR 1317 (70100): Query execution was interrupted

That kills the query. But notice what the mysql program is telling you after you hit that C-c: it’s sending a separate command, namely the “query” “KILL QUERY 183“, to the server.

In fact, that KILL query is the only way to get the MySQL server to stop running a query. In particular, the MySQL server is really bad at noticing when a client goes away. Suppose instead of hitting C-c, which the mysql program traps and handles in a smart way, I simply kill the program by hitting C-\:

mysql> SELECT COUNT(DISTINCT status) FROM artifacts;

Then in fact the server keeps running the query. If I fire up the MySQL client anew and issue the query SHOW PROCESSLIST, I can see the query still chugging away:

$ mysql -h -u guest youtomb

*************************** 1. row ***************************
     Id: 183
   User: guest
   Host: OPUS.MIT.EDU:37938
     db: youtomb
Command: Query
   Time: 4
  State: Sending data
   Info: SELECT COUNT(DISTINCT status) FROM artifacts
*************************** 2. row ***************************
     Id: 185
   User: guest
   Host: OPUS.MIT.EDU:37940
     db: youtomb
Command: Query
   Time: 0
  State: NULL
2 rows in set (0.00 sec)

That \G is an alternative to the semicolon that makes the format more readable here.

If I want to actually kill the query, I can do it with the same meta-query my client used automatically upon C-c:

mysql> KILL QUERY 183;
Query OK, 0 rows affected (0.02 sec)

Et voilà:

*************************** 1. row ***************************
     Id: 185
   User: guest
   Host: OPUS.MIT.EDU:37940
     db: youtomb
Command: Query
   Time: 0
  State: NULL
1 row in set (0.00 sec)

Now, why would you or I care? After all, nobody in their right mind goes about hitting control-backslash or employing equally messy means to kill their MySQL clients. And control-C behaves just as you’d hope — so long as you are using the mysql command-line client.

Where the story isn’t so good is on a typical other client program. The KILL behavior on control-C is a feature of the mysql program, not of the MySQL C API. (If you think about it, it involves installing a signal handler — not something a well-behaved library will just do.) And because it’s not a feature of the MySQL C API, it’s probably not a feature of your favorite language’s MySQL bindings, which wrap that API. In particular, I know it’s not a feature of MySQLdb, the leading Python bindings.

So suppose you write a Python script to do some MySQL queries… and you have a big honking table in your database, and you write an inefficient query… and the query planner resorts to copying most of the table to a temporary table… and after a couple of hours you kill the Python script with control-C or kill or some other means because it’s taking forever. The query will keep running. And the next day maybe it’s copied enough that it fills up your disk, and the database has an outage.

I wish that were a hypothetical. Fortunately, the MySQL server will then remove the temporary table and the disk will have space again. If you’re lucky, the server will even come back up.

Lesson: when you want to kill a MySQL query, make sure it dies. Use SHOW PROCESSLIST to check and KILL QUERY to kill.

Written by Greg Price

June 28th, 2010 at 12:35 am

Posted in Uncategorized

Tagged with , ,

Who can’t sing?


Sometimes after singing songs with a crowd, I hear friends or strangers compliment me on my singing. That’s always nice to hear. But then sometimes they add something else—“I can’t sing”. And I don’t think that feeling is limited to a few people I talk to; standing on the crowded banks of the Charles River for the July 4 celebration, I’ve marveled every year at how my little knot of friends and I are the only ones in the vicinity who sing along, even to the national anthem.

That’s a sad situation. Singing is fun, singing is a social activity that brings people together, and singing is a tool of expression that social movements from the Reformation to the Civil Rights Movement and into the present day have used for its power to stir emotions and affirm common purpose. I think it’s a recent one, too. A century ago, all kinds of people sang—working or playing, on a ship or at home with friends and a banjo. In those days the most skillful singer you’d hear all week was someone who lived in your community. Today, I think many of those people who say they “can’t sing” really mean they can’t sing like Michael Jackson, or Lady Gaga, or maybe Plácido Domingo. And how many of us can? But why should we have to? Singing is for the singers, and not only for a hushed audience.

So I was intrigued today by the following anecdote from a musician I respect:

Back when I was a touring musician I met a lot of people who insisted they were monotones. Sometimes I had time to sit down with them and with considerable encouragement they matched every pitch I gave them. With still more encouragement they carried a tune. They were victims not of genetic impoverishment but of cultural theft: the theft of their birthright of singing.

May every self-described monotone or nonsinger receive such personal encouragement. In the meanwhile, all of us who do sing should take time to step off of our pedestals, if we have any, and make singing a part of the social life available to everyone.

In this spirit, I love how the MIT Concert Choir sings Messiah each year right in Lobby 10 for passersby to join in. What steps like that can you take?

Written by Greg Price

May 24th, 2010 at 1:16 am

Posted in Uncategorized

Tagged with , ,

Two celebrations, two observations

leave a comment

It’s late and I’m going to bed, so this will be short.

This weekend was packed for me with fantastic, joyous events that don’t happen every year. An old friend got married, and the Harvard Glee Club and its two sister choruses held a huge reunion to mark the retirement of their longtime conductor. (The latter was joyous in reunion and in remembrance of his career—not in the fact that he’s leaving!) At each event I caught up with a number of people I hadn’t seen in several years, and at the wedding celebrations I also met a large number of young mathematicians, colleagues of the bride and groom.

Two observations. First, at the Glee Club reunion I saw a cross-section of Harvard graduates between one and five years out, and heard what they’re all up to. One thing I heard from very few of them was that they have today a job directly on the career path they want to follow for life. Maybe half are in graduate school of one sort or another, some after working a job, some not. Many of the rest talk about entering a program next fall, or applying next year. One guy quit his job last week, another is going to quit next month—either to write plays, or work in politics, he’s not yet sure which—and another has realized he hates his job and the whole career path it lies in and is trying to figure out his next move so he can quit too. While in school, particularly senior year, it often seemed like everyone had their whole future paths exactly figured out. It’s comforting in a way to know that so many of them were wrong.

Second, everybody still loves Ksplice. I must have told at least thirty people, either that I’d just met or just caught up with after years, that we make those ‘reboot’ popups obsolete. Most everyone was suitably glad to learn of the idea; one reached out and shook my hand a second time. And then I think I broke new ground for Ksplice when one of them was so happy for the death of reboots that he gave me a hug.

Written by Greg Price

May 3rd, 2010 at 2:45 am

Posted in Uncategorized

Tagged with , ,

Music and memory


This Thursday afternoon, I created a Slashdot account for the first time. One of the signup steps prompted me for about ten different forms of contact—email, AIM, ICQ, and “Warcraft Main”. I take it that last is a way of asking for some form of identity in World of Warcraft.

An hour or two later, there was a bit of music running through my head. It took me a few minutes to place it: it was a line of background music from Warcraft II, which Wikipedia helpfully reminds me was released in 1995. (In other words, when Newt Gingrich was shutting down the federal government, encyclopedias came on CD-ROM, and dinosaurs walked the earth.) The last time I heard that music through my ears was probably about 2000. Somehow, apparently, after I saw that name my brain decided to make a field trip browsing through some of the dustier shelves of the library, and it put this music on without my noticing.

On the other hand, right now as I’m actually thinking about this, I can’t recall the music even in fragments. Maybe in half an hour I’ll catch it playing in my head again.

I’ve had similar experiences before where I notice that the music playing in my head is connected to some stimulus from minutes or an hour earlier. I don’t think it’s ever come from a shelf this dusty before.

Written by Greg Price

March 29th, 2010 at 2:23 am

Posted in Uncategorized

Tagged with , ,

Ksplice and the intern army meet the Internet

leave a comment

This week I wrote a post on the Ksplice blog, our first substantive post, following an intro post by Waseem. As I mentioned last month, we swelled from 8 to 20 people this January with interns, and were triumphant in making the whole scheme work productively. If you want to know how we did it, read the post. In fact, just go read it. I’ll wait.

The crackerjack Ksplice PR team (*) got my post to show prominently all day Wednesday on Reddit and Hacker News, and then it went up on Slashdot all Wednesday evening and Thursday during the day. Traffic numbers were much, much more than anything else I’ve ever written, except YouTomb.

Naturally, we learned some things about interacting with your average comment-leaving reader on the Internet. The first wave of comments, a few both on link aggregators and on the post itself, were vicious denunciations of us for the (apparently) illegal practice of employing unpaid interns to do real work. These commenters were of course wrong—you can’t get any intern in software for free, let alone the kind of people we wanted, and we paid as much or more than they could make with their skills in research jobs on campus. I clarified that, I and others replied, and the comments shifted to mostly positive. Then when we landed on Slashdot, the text was a classic opposite-of-the-article Slashdot item: we had claimed to “bust” Fred Brooks’ pioneering observations on software project management. Dozens of commenters poured in to grouch that we hadn’t disproved his law, only sidestepped it—which was of course our point.

Fortunately, not all commenters are just being wrong. We had several good comments, but this afternoon came one last comment from a source far beyond any response I imagined. I feel a twinge of regret now for comparing the OS/360 project to Windows Vista, apt though it was. Prof. Brooks, of couse, did far better than the Vista managers in the end, in that he learned lessons from the experience and put them in a book that the whole profession learned from.

How we’re going to top that comment in our next post, I don’t know—it might be tough, for example, to get a comment from a man who hasn’t used email since before blogging was invented.

(*) Namely, us and our friends on zephyr/twitter lending a few upvotes to our posts. Several others at Ksplice made substantial comments and edits before the post was published, too, which greatly improved it.

[Update, 2010-03-18: there is now a straight-up newspaper-style article about... the comment threads on my post. The Internet never ceases to amaze me.]

Written by Greg Price

March 15th, 2010 at 3:23 am

How to pretend bash is a real programming language, tip #13


I wrote some throwaway shell code tonight that looked like this:

   for oo in $(cd .git/objects/ && ls ??/*); do
     # do something horrible with the Git object $o, which is in the file $oo

It doesn’t matter now exactly what the code was for. But a collaborator wrote back to me:

> > o=${oo%/*}${oo#*/}
> How does this line work/what is it supposed to accomplish?  In
> particular not sure what the %foo and #foo do.

Stop for a moment: do you know how that line works? I wouldn’t have in my first years writing shell scripts.

This line demonstrates one of a repertoire of tricks I’ve picked up to get some things done in bash that might otherwise require invoking a separate program. None of these will be news to shell-programming experts, but I sure didn’t know all of them when I started writing in shell. Here’s a little braindump on one of my favorite tricks, and where to read about more.

The best documentation for Bash is the info page—the specific pages I find myself referring to most often are under “info bash” -> “Basic Shell Features” -> “Shell Expansions”. (If you’ve never tried it, you’ve been missing out! Type “info bash” at your favorite prompt. But not on a Debian or Ubuntu machine, where the info page is missing due to a stupid licensing dispute. Info is the home of the best documentation available for Bash, GCC, GDB, Emacs, miscellaneous GNU utilities, and Info itself.)

This feature is under “Shell Parameter Expansion” there.

     The WORD is expanded to produce a pattern just as in filename
     expansion (*note Filename Expansion::).  If the pattern matches
     the beginning of the expanded value of PARAMETER, then the result
     of the expansion is the expanded value of PARAMETER with the
     shortest matching pattern (the `#' case) or the longest matching
     pattern (the `##' case) deleted.

The % and %% features work similarly, with “beginning” substituted with “end”.

My mnemonic for # versus % is that $ is for the variable; # is to the left of $, so it strips from the left, and % is to the right, so it strips from the right. I suspect this is the actual motivation for the choice of # and %, though I’m curious to see evidence to confirm or refute that thought.

So after my line o=${oo%/*}${oo#*/}, o consists of the part of oo to the left of the last slash, and then the part of oo to the right of the first slash. Since there should be just one slash in oo, it has the effect of making o be everything but the slash.

That makes one trick I use all the time. There’s plenty more, and those Info pages explain many of them. I’m not sure all these tricks are a good thing on balance—they serve as a crutch to make the shell go further, when maybe I should just be quicker to switch to a real programming language. But they sure come in handy.

Written by Greg Price

February 27th, 2010 at 4:56 am

Posted in Uncategorized

Tagged with , ,

Unlocking the Clubhouse, part 1: it’s not about innate differences

leave a comment

If you work on computing in school, on the side, or in industry and you’ve been paying attention to the people around you, you’ve probably wondered why so many fewer women than men enter our field and stay in it.

This is no immutable law. In fact, the proportion of women in computer science in the United States was once much higher. Of people receiving bachelor’s degrees in computer science, women made up nearly 40% in the mid-1980s, declining to 20% in 2006. (graphs, NSF data.) And it varies among cultures, too—in Malaysia, women actually outnumber men in computer science. (data, analysis)

So the natural way to ask the question is in this form: What are we doing in computer science that causes so many fewer women than men to enter our field and to stay in it? And what can we do differently to change that?

Recently I picked up a book on this subject. Unlocking the Clubhouse is the product of a collaboration between Jane Margolis, a social scientist studying gender and education, and Allan Fisher, the founding dean of the undergraduate program in computer science at Carnegie Mellon University.

The authors gather scores of previous studies, and they did their own work from a privileged position at the helm of the undergraduate program at Carnegie Mellon University. Their success at answering these questions may be indicated by the reversal they achieved of national trends at CMU in the five years of their research:

  • Before the authors’ work, the proportion of women among entering freshmen ranged from 5% to 7% over the five years 1991-1995. At the conclusion of their project in 2000, this proportion had reached 42%.
  • Graph of percentage of women in computer science freshman class at CMU, 1989-2000

  • Of students entering the program at the start of the project in 1995, only 42% of women remained after two years. This rate rose to 80% for women entering in 1996, and stabilized at nearly 90%. The rate among men was steady around 90%.
  • Graph of persistence rate for women and men in undergraduate computer science at CMU, 1995-1998

With that kind of success in practice, it’s clear their scientific findings and their recommendations have earned serious consideration. In a future post I’ll say more about those, and I’ll also look at what some other people have found on the subject. Ironically, it turns out one result of Margolis and Fisher’s success may have been to invalidate some of their findings in the new environment they created.

Written by Greg Price

February 15th, 2010 at 1:59 am

How many MIT students does it take to change computing?


A new SIPB chair and vice-chair are taking office tomorrow, and the other night several of their predecessors took an evening to give them an orientation.

SIPB has two priorities: people and projects. Each active project has its own organizers, maintainers, and/or developers who move it forward and make its decisions, so the role of the chair and vice-chair is about keeping track of how things go, helping connect the project to outside resources and connect new contributors to the project, mediating shared resources like the machine room, and making sure that key projects get passed on from year to year.

It makes sense, then, that we spent most of our time talking about people—bringing people in the door at SIPB, making the office a welcoming place for them, drawing them into our community, and electing them as members. We hear in almost every membership election about how the organization could do better at this. Here’s a quick version of why it’s so important:

Every year, about 1/3 of student SIPB members graduate.

Put another way, in steady state:

Size of SIPB = 3 * (# new members / year)

For example, right now SIPB has 26 student members, and by my count 9 are planning to leave MIT in June. So the only way SIPB can stay as strong as it is is to get 9 new members this year, and about as many again the next year, and the next year, and so on. Fewer new members ⇒ fewer members ⇒ fewer awesome projects, fewer people to learn from, fewer people to hire away to Ksplice (ahem, maybe not everyone shares that motivation).

Fortunately, we built a good track record over the last few years:

    academic year          freshmen &
      starting     total   sophomores

        2010        ???       ???
        2009         7+        4+
        2008         9         6
        2007         8         2
        2006        10         3
        2005        10         2
        2004         5         3
        2003         4         2
        2002        10         3
        2001         8         2
        2000         3         2

From those numbers in the last five years, it’s not hard to see how we got the organization to the point where three strong candidates stood at the last election for chair, and where the office is full to crowding at nearly every Monday’s meeting. It’s also clear how it wasn’t always this way—the numbers from the 2004 and 2003 academic years led directly to the election of 2005 in which the nine-member EC comprised every student member of the SIPB.

But my favorite aspect of these numbers is in the column on the right. When I was the chair in 2008-9, I put an emphasis on getting people involved in SIPB in their first and second years. I’ve heard a lot of people’s stories over the years of showing up at SIPB as a freshman or sophomore, going away for a variety of reasons, and finally coming back two or three or more years later and becoming members. Some of them went on to become highly active and valued contributors, and it’s too bad for everyone that we didn’t succeed in bringing them in the first time around. With the record 6 freshman and sophomore new members in the 2008 academic year, I think we succeeded in turning a lot of those stories around into members who will be active students for a long time. Edward and Evan have gotten this 2009 year to outpace 2008 so far, so the new team of Jess and Greg have the chance to finish it at another record. 2010 will be theirs to create, and I wish them the best of luck in outdoing 2008 and 2009 both.

Written by Greg Price

February 15th, 2010 at 1:58 am

Posted in Uncategorized

Tagged with ,