What is That Process Doing?

Greg Price

Hacker School, Feb 2014

Debugging Your Code is "Easy"

Tests

Logging

Debug symbols

REPL backdoor

Debugging Everything Else

This talk

Goal

Too many tools for detail in 30m

Just prime you on what's possible

Can do hands-on in rest of week

Outside

Outside

Watching from OS.

Great because

Tools from Outside

/proc/$PID/: The Outside, Now

/proc/$PID/

$ echo $$     # This shell's PID
31492

$ ls -l /proc/31492
[...]
-r--r--r-- 1 greg greg 0 Feb 13 18:56 cmdline
lrwxrwxrwx 1 greg greg 0 Feb 16 13:56 cwd ->
                        /home/greg/w/talk/trace/slides
-r-------- 1 greg greg 0 Feb 16 13:56 environ
lrwxrwxrwx 1 greg greg 0 Feb 13 19:30 exe -> /bin/bash
dr-x------ 2 greg greg 0 Feb 13 18:44 fd
dr-x------ 2 greg greg 0 Feb 16 13:56 fdinfo
-r--r--r-- 1 greg greg 0 Feb 16 13:56 maps
-rw------- 1 greg greg 0 Feb 16 13:56 mem
-r--r--r-- 1 greg greg 0 Feb 13 21:03 status

/proc/: Files

/proc/: Files II

Where's wpa_supplicant's logfile?

$ pgrep wpa_supplicant
1300

$ sudo ls -l /proc/1300/fd/
total 0
lrwx------ 1 root root 64 Feb 16 19:37 0 -> /dev/null
lrwx------ 1 root root 64 Feb 16 19:37 1 -> /dev/null
lrwx------ 1 root root 64 Feb 16 19:37 2 -> /dev/null
l-wx------ 1 root root 64 Feb 16 19:37 3 -> /var/log/wpa_supplicant.log
lrwx------ 1 root root 64 Feb 16 19:37 4 -> /dev/null
lrwx------ 1 root root 64 Feb 16 19:37 5 -> socket:[10952]
lrwx------ 1 root root 64 Feb 16 19:37 6 -> socket:[10953]
[...]

/proc/: File Progress

Problem: It's working on some file; how far has it gotten?

/proc/: File Progress

Problem: It's working on some file; how far has it gotten?

$ slowjob < $bigfile &
[1] 18061

$ cat /proc/18061/fdinfo/0
pos:    344064
flags:  0100000

$ for i in {1..4}; do
>   grep pos /proc/18061/fdinfo/0; sleep 0.1
> done
pos:    10567680
pos:    22192128
pos:    33923072
pos:    44941312

/proc/: Reviving Dead Files

Problem: It's using some file which is now deleted. What's in there?

/proc/: Reviving Dead Files

Problem: It's using some file which is now deleted. What's in there?

$ echo hi >foo; tail -f foo & \
>   sleep 0.1; rm foo
[1] 16346
hi
$ ls -l /proc/16346/fd
[...]
lr-x------ 1 greg greg 64 Feb 16 23:56 3 ->
 /home/greg/w/talk/trace/slides/foo (deleted)
$ cat /proc/16346/fd/3
hi

/proc/: Library Hell

Problem: It's misbehaving and it seems to be a library's fault.

But you installed the library version that fixes this bug. Are you getting an old version?

You could carefully study load paths. But really you want ground truth.

/proc/: Library Hell II

Libraries are mmap'd in. So look at maps.

$ python -c 'import json, time; time.sleep(3)'
[1] 22227

$ cat /proc/22227/maps
08048000-0829e000 r-xp 00000000 fc:01 3907 /usr/bin/python2.7
0829e000-0829f000 r--p 00256000 fc:01 3907 /usr/bin/python2.7
0829f000-082f4000 rw-p 00257000 fc:01 3907 /usr/bin/python2.7
082f4000-08300000 rw-p 00000000 00:00 0
0a23e000-0a321000 rw-p 00000000 00:00 0    [heap]
b7096000-b70a0000 r-xp 00000000 fc:01 3962 /usr/lib/python2.7/lib-dynload/_json.so
b70a0000-b70a1000 r--p 00009000 fc:01 3962 /usr/lib/python2.7/lib-dynload/_json.so
b70a1000-b70a2000 rw-p 0000a000 fc:01 3962 /usr/lib/python2.7/lib-dynload/_json.so
[...]

$ </proc/22227/maps perl -lane 'print $F[5]' | sort -u | grep json
/usr/lib/python2.7/lib-dynload/_json.so

/proc/ digest

/proc/$PID/fd

/proc/$PID/fdinfo

/proc/$PID/fd even when deleted

/proc/$PID/maps

many more...

ls /proc/$$ ;  man proc

/proc/ another way: lsof

Scans /proc/*/, nicely summarizes

$ sudo lsof | perl -lane 'print if
       ($F[3] == 2 && $F[8] ne "/dev/null")'
[edited]
modem-man 1066 2u CHR     12 /dev/pts/9
rsyslogd  1070 2w REG 940180 /var/log/auth.log
getty     1211 2u CHR   1047 /dev/tty4
getty     1224 2u CHR   1045 /dev/tty2
getty     1226 2u CHR   1046 /dev/tty3
postgres  1309 2w REG 925403 /var/log/postgresql/postgresql-9.1-main.log
[...]

strace: The Outside, Through Time

strace

Traces system calls:

Files: stat64 open read write ...

Network: socket connect ...

Processes: clone execve wait4 ...

Even the clock: clock_gettime

strace: Simple demo

$ strace true
execve("/bin/true", ["true"], [/* 42 vars */]) = 0
brk(0)                                  = 0x8d6e000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7738000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=99024, ...}) = 0
mmap2(NULL, 99024, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb771f000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/i386-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0000\226\1\0004\0\0\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=1730024, ...}) = 0
mmap2(NULL, 1743580, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb7575000
mprotect(0xb7718000, 4096, PROT_NONE)   = 0
mmap2(0xb7719000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1a3) = 0xb7719000
mmap2(0xb771c000, 10972, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb771c000
close(3)                                = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7574000
set_thread_area({entry_number:-1 -> 6, base_addr:0xb7574900, limit:1048575, seg_32bit:1,
             contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0
mprotect(0xb7719000, 8192, PROT_READ)   = 0
mprotect(0x804d000, 4096, PROT_READ)    = 0
mprotect(0xb775b000, 4096, PROT_READ)   = 0
munmap(0xb771f000, 99024)               = 0
exit_group(0)                           = ?

strace: A stuck process

Problem: it seems not to be doing anything

strace: A stuck process II

Problem: it seems not to be doing anything

$ strace -p 22067
Process 22067 attached - interrupt to quit
flock(3, LOCK_EX

$ readlink /proc/22067/fd/3
/tmp/foobar.lock

$ lsof | grep /tmp/foobar.lock
 command  21856 price 3uW  REG 253,88  0 34443743  /tmp/foobar.lock
 command  22067 price 3u   REG 253,88  0 34443743  /tmp/foobar.lock

strace: Finding config file

Problem: it's not finding the config file you expected

strace: Finding config file II

Problem: it's not finding the config file you expected

Tame example: Git

$ strace -efile git rev-parse HEAD 2>&1 | grep config
access("/etc/gitconfig", R_OK)          = -1 ENOENT (No such file or directory)
access("/home/greg/.config/git/config", R_OK) = -1 ENOENT (No such file or directory)
access("/home/greg/.gitconfig", R_OK)   = 0
open("/home/greg/.gitconfig", O_RDONLY|O_LARGEFILE) = 3
access(".git/config", R_OK)             = 0
open(".git/config", O_RDONLY|O_LARGEFILE) = 3
access("/etc/gitconfig", R_OK)          = -1 ENOENT (No such file or directory)
access("/home/greg/.config/git/config", R_OK) = -1 ENOENT (No such file or directory)
access("/home/greg/.gitconfig", R_OK)   = 0
open("/home/greg/.gitconfig", O_RDONLY|O_LARGEFILE) = 3
access(".git/config", R_OK)             = 0
open(".git/config", O_RDONLY|O_LARGEFILE) = 3

strace: Processes

Problem: it's invoking some other program wrong

strace: Processes II

Problem: it's invoking some other program wrong

$ strace -eprocess -f make
execve("/usr/bin/make", ["make"], [/* 42 vars */]) = 0
sphinx-build -b slides -d _build/doctrees  . _build/slides
vfork(Process 12174 attached
)                                 = 12174
[pid 12173] wait4(-1, Process 12173 suspended
 <unfinished ...>
[pid 12174] execve("/home/greg/bin/sphinx-build", ["sphinx-build",
  "-b", "slides", "-d", "_build/doctrees", ".", "_build/slides"],
  [/* 45 vars */]) = -1 ENOENT (No such file or directory)
[...]
[pid 12174] execve("/usr/bin/sphinx-build", ["sphinx-build",
  "-b", "slides", "-d", "_build/doctrees", ".", "_build/slides"],
  [/* 45 vars */]) = 0

strace digest

strace CMD ARGS...

strace -p PID

-e file,process,...

-f

Much more: timing, longer strings, more filters

man strace, [strace ksplice]

Outside digest

/proc/$PID/, lsof

strace

Outside is great when it works

Sometimes you need the inside

Inside

Inside: perf

Fantastic profiler

In cahoots with kernel and CPU

True call-graph profile

perf record

$ perf record $SLOWCMD

$ perf record ruby -I reqtest -e \
>    '(1..10000).each {|i| require "#{i}"}'

$ ls perf.data
perf.data

perf report

$ perf report

58.68% ruby libc-2.15.so 0x25820
12.36% ruby ruby         rb_string_value_ptr
 7.90% ruby ruby         rb_feature_p
 6.19% ruby ruby         loaded_feature_path
 5.83% ruby [kernel]     0xc10359d8

perf: Call-Graph Profiling

$ perf record -g ruby ...

$ perf report
  -  32.16%  ruby  rb_string_value_ptr
     - rb_string_value_ptr
        + 28.38% rb_feature_p
        + 3.78% rb_require_safe
  +  27.41%  ruby  rb_feature_p
  -  22.10%  ruby  loaded_feature_path
     - loaded_feature_path
        + 21.00% rb_feature_p
        + 1.11% rb_require_safe

Inside: gdb

Debugger; a subject in itself

Today just one class of tricks

gdb: Debugging Python

pdb is great

. . . when you can edit and restart

import pdb; pdb.set_trace()

(Ruby a similar story)

gdb: Debugging Python II

  1. apt-get install python2.7-dbg

    (i.e. python-gdb.py + a debug build)

  2. python2.7-dbg $PROGRAM

  3. $ gdb -p $PID
    (gdb) py-bt     # Python-level backtrace
    (gdb) py-list   # Python source
    (gdb) py-print  # print Python-level variables

Thanks!

Possible workshops, tell me if interested: