What is That Process Doing?
Greg Price
Hacker School, Feb 2014
Greg Price
Hacker School, Feb 2014
Tests
Logging
Debug symbols
REPL backdoor
This talk
Too many tools for detail in 30m
Just prime you on what's possible
Can do hands-on in rest of week
Watching from OS.
Great because
$ echo $$ # This shell's PID
31492
$ ls -l /proc/31492
[...]
-r--r--r-- 1 greg greg 0 Feb 13 18:56 cmdline
lrwxrwxrwx 1 greg greg 0 Feb 16 13:56 cwd ->
/home/greg/w/talk/trace/slides
-r-------- 1 greg greg 0 Feb 16 13:56 environ
lrwxrwxrwx 1 greg greg 0 Feb 13 19:30 exe -> /bin/bash
dr-x------ 2 greg greg 0 Feb 13 18:44 fd
dr-x------ 2 greg greg 0 Feb 16 13:56 fdinfo
-r--r--r-- 1 greg greg 0 Feb 16 13:56 maps
-rw------- 1 greg greg 0 Feb 16 13:56 mem
-r--r--r-- 1 greg greg 0 Feb 13 21:03 status
Where's wpa_supplicant's logfile?
$ pgrep wpa_supplicant
1300
$ sudo ls -l /proc/1300/fd/
total 0
lrwx------ 1 root root 64 Feb 16 19:37 0 -> /dev/null
lrwx------ 1 root root 64 Feb 16 19:37 1 -> /dev/null
lrwx------ 1 root root 64 Feb 16 19:37 2 -> /dev/null
l-wx------ 1 root root 64 Feb 16 19:37 3 -> /var/log/wpa_supplicant.log
lrwx------ 1 root root 64 Feb 16 19:37 4 -> /dev/null
lrwx------ 1 root root 64 Feb 16 19:37 5 -> socket:[10952]
lrwx------ 1 root root 64 Feb 16 19:37 6 -> socket:[10953]
[...]
Problem: It's working on some file; how far has it gotten?
Problem: It's working on some file; how far has it gotten?
$ slowjob < $bigfile &
[1] 18061
$ cat /proc/18061/fdinfo/0
pos: 344064
flags: 0100000
$ for i in {1..4}; do
> grep pos /proc/18061/fdinfo/0; sleep 0.1
> done
pos: 10567680
pos: 22192128
pos: 33923072
pos: 44941312
Problem: It's using some file which is now deleted. What's in there?
Problem: It's using some file which is now deleted. What's in there?
$ echo hi >foo; tail -f foo & \
> sleep 0.1; rm foo
[1] 16346
hi
$ ls -l /proc/16346/fd
[...]
lr-x------ 1 greg greg 64 Feb 16 23:56 3 ->
/home/greg/w/talk/trace/slides/foo (deleted)
$ cat /proc/16346/fd/3
hi
Problem: It's misbehaving and it seems to be a library's fault.
But you installed the library version that fixes this bug. Are you getting an old version?
You could carefully study load paths. But really you want ground truth.
Libraries are mmap'd in. So look at maps.
$ python -c 'import json, time; time.sleep(3)'
[1] 22227
$ cat /proc/22227/maps
08048000-0829e000 r-xp 00000000 fc:01 3907 /usr/bin/python2.7
0829e000-0829f000 r--p 00256000 fc:01 3907 /usr/bin/python2.7
0829f000-082f4000 rw-p 00257000 fc:01 3907 /usr/bin/python2.7
082f4000-08300000 rw-p 00000000 00:00 0
0a23e000-0a321000 rw-p 00000000 00:00 0 [heap]
b7096000-b70a0000 r-xp 00000000 fc:01 3962 /usr/lib/python2.7/lib-dynload/_json.so
b70a0000-b70a1000 r--p 00009000 fc:01 3962 /usr/lib/python2.7/lib-dynload/_json.so
b70a1000-b70a2000 rw-p 0000a000 fc:01 3962 /usr/lib/python2.7/lib-dynload/_json.so
[...]
$ </proc/22227/maps perl -lane 'print $F[5]' | sort -u | grep json
/usr/lib/python2.7/lib-dynload/_json.so
/proc/$PID/fd
/proc/$PID/fdinfo
/proc/$PID/fd even when deleted
/proc/$PID/maps
many more...
ls /proc/$$ ; man proc
Scans /proc/*/, nicely summarizes
$ sudo lsof | perl -lane 'print if
($F[3] == 2 && $F[8] ne "/dev/null")'
[edited]
modem-man 1066 2u CHR 12 /dev/pts/9
rsyslogd 1070 2w REG 940180 /var/log/auth.log
getty 1211 2u CHR 1047 /dev/tty4
getty 1224 2u CHR 1045 /dev/tty2
getty 1226 2u CHR 1046 /dev/tty3
postgres 1309 2w REG 925403 /var/log/postgresql/postgresql-9.1-main.log
[...]
Traces system calls:
Files: stat64 open read write ...
Network: socket connect ...
Processes: clone execve wait4 ...
Even the clock: clock_gettime
$ strace true
execve("/bin/true", ["true"], [/* 42 vars */]) = 0
brk(0) = 0x8d6e000
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7738000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=99024, ...}) = 0
mmap2(NULL, 99024, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb771f000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/i386-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0000\226\1\0004\0\0\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=1730024, ...}) = 0
mmap2(NULL, 1743580, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb7575000
mprotect(0xb7718000, 4096, PROT_NONE) = 0
mmap2(0xb7719000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1a3) = 0xb7719000
mmap2(0xb771c000, 10972, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb771c000
close(3) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7574000
set_thread_area({entry_number:-1 -> 6, base_addr:0xb7574900, limit:1048575, seg_32bit:1,
contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0
mprotect(0xb7719000, 8192, PROT_READ) = 0
mprotect(0x804d000, 4096, PROT_READ) = 0
mprotect(0xb775b000, 4096, PROT_READ) = 0
munmap(0xb771f000, 99024) = 0
exit_group(0) = ?
Problem: it seems not to be doing anything
Problem: it seems not to be doing anything
$ strace -p 22067
Process 22067 attached - interrupt to quit
flock(3, LOCK_EX
$ readlink /proc/22067/fd/3
/tmp/foobar.lock
$ lsof | grep /tmp/foobar.lock
command 21856 price 3uW REG 253,88 0 34443743 /tmp/foobar.lock
command 22067 price 3u REG 253,88 0 34443743 /tmp/foobar.lock
Problem: it's not finding the config file you expected
Problem: it's not finding the config file you expected
Tame example: Git
$ strace -efile git rev-parse HEAD 2>&1 | grep config
access("/etc/gitconfig", R_OK) = -1 ENOENT (No such file or directory)
access("/home/greg/.config/git/config", R_OK) = -1 ENOENT (No such file or directory)
access("/home/greg/.gitconfig", R_OK) = 0
open("/home/greg/.gitconfig", O_RDONLY|O_LARGEFILE) = 3
access(".git/config", R_OK) = 0
open(".git/config", O_RDONLY|O_LARGEFILE) = 3
access("/etc/gitconfig", R_OK) = -1 ENOENT (No such file or directory)
access("/home/greg/.config/git/config", R_OK) = -1 ENOENT (No such file or directory)
access("/home/greg/.gitconfig", R_OK) = 0
open("/home/greg/.gitconfig", O_RDONLY|O_LARGEFILE) = 3
access(".git/config", R_OK) = 0
open(".git/config", O_RDONLY|O_LARGEFILE) = 3
Problem: it's invoking some other program wrong
Problem: it's invoking some other program wrong
$ strace -eprocess -f make
execve("/usr/bin/make", ["make"], [/* 42 vars */]) = 0
sphinx-build -b slides -d _build/doctrees . _build/slides
vfork(Process 12174 attached
) = 12174
[pid 12173] wait4(-1, Process 12173 suspended
<unfinished ...>
[pid 12174] execve("/home/greg/bin/sphinx-build", ["sphinx-build",
"-b", "slides", "-d", "_build/doctrees", ".", "_build/slides"],
[/* 45 vars */]) = -1 ENOENT (No such file or directory)
[...]
[pid 12174] execve("/usr/bin/sphinx-build", ["sphinx-build",
"-b", "slides", "-d", "_build/doctrees", ".", "_build/slides"],
[/* 45 vars */]) = 0
strace CMD ARGS...
strace -p PID
-e file,process,...
-f
Much more: timing, longer strings, more filters
man strace, [strace ksplice]
/proc/$PID/, lsof
strace
Outside is great when it works
Sometimes you need the inside
Fantastic profiler
In cahoots with kernel and CPU
True call-graph profile
$ perf record $SLOWCMD
$ perf record ruby -I reqtest -e \
> '(1..10000).each {|i| require "#{i}"}'
$ ls perf.data
perf.data
$ perf report
58.68% ruby libc-2.15.so 0x25820
12.36% ruby ruby rb_string_value_ptr
7.90% ruby ruby rb_feature_p
6.19% ruby ruby loaded_feature_path
5.83% ruby [kernel] 0xc10359d8
$ perf record -g ruby ...
$ perf report
- 32.16% ruby rb_string_value_ptr
- rb_string_value_ptr
+ 28.38% rb_feature_p
+ 3.78% rb_require_safe
+ 27.41% ruby rb_feature_p
- 22.10% ruby loaded_feature_path
- loaded_feature_path
+ 21.00% rb_feature_p
+ 1.11% rb_require_safe
Debugger; a subject in itself
Today just one class of tricks
pdb is great
. . . when you can edit and restart
import pdb; pdb.set_trace()
(Ruby a similar story)
apt-get install python2.7-dbg
(i.e. python-gdb.py + a debug build)
python2.7-dbg $PROGRAM
$ gdb -p $PID
(gdb) py-bt # Python-level backtrace
(gdb) py-list # Python source
(gdb) py-print # print Python-level variables
Possible workshops, tell me if interested: