--- gdb notes, looking at small.c --- --- All this is on my mac os 10.6 core i7 (64bit) laptop; --- details may vary on other systems. --- TL;DR : YMMV (For the L33T among you.) ===== for Sep 30 ================================================ Within emacs, M-x gdb will run gdb. If program was compiled with "-g" (include symbols for gdb), then with emacs split into 2 windows and the source *.c in one and "M-x gdb" in the other, breaks/steps will be put symbollically into the source code. gdb can only do some of its tricks if the file hasn't been compiled with -g. 1. created small.c 2. compiled it & created small.i, small.o, small.s, small* : $ gcc -save-temps -m32 small.c -o small 3. debugged the executable: $ gdb small (gdb) info functions ... *long* list generated; includes address for 0x00001d98 start 0x00001dd6 algebra * functions defined in start.c 0x00001e0e swap * 0x00001e30 print_string * 0x00001e58 main * 0x00001f82 dyld_stub_exit 0x00001f88 dyld_stub_printf 0x00001f8e stub helpers (Note that running gdb on small.o and asking for "info functions" gives only the ones defined in small.c, without the myriad #include stuff.) 4. Here are some commands to try in gdb. Type "help ..." for any of these. $ break swap # set break point when we enter the swap function $ run # run until the break $ disass swap # show assembly code there $ info registers # show current value of registers $ x/FMT address # examine address using a variety of formats see "help x" $ step # continue running until function exit $ stepi N # step 1 assembly instruction; N times (default 1) $ nexti N # next instruction; thru sub calls (multiple IA32's) $ info program # show where we are In gdb, registers are spelled with $, not %; e.g. $esp, not %esp . Assembler: 0x8(%ebp) gdb: x/s 0x8+$ebp # examine string starting at address ebp+8 ==== for Oct 5 =============================== $ gcc -save-temps -m32 small.c -o small Looking for strings ('-t x' means 'format in hex offset from file beginning') : $ strings -t x small f10 string: '%s' f1e The first string. I think. f3c This second string. So there. f5c x,y,z = %i,%i,%i, algebra(x,y,z)=%i In *.s : .text code sections , e.g. http://en.wikibooks.org/wiki/X86_Assembly/GAS_Syntax .data data section .cstring another data section In gdb : $ gdb small (gdb) break main (gdb) run (gdb) disass algebra (gdb) disass swap (gdb) disass print_string (gdb) disass main So I copied the "disass *" output from gdb into the small.s source code, so I could compare the sections. (Macro commands within emacs made this relatively painless.) First thing I noticed was that the .cstring sections in small.s are *not* the same memory region as the rest of the code, all of which is in a contiguous .text region. This is pretty easy to see from the addresses in disass, since the last "ret" instruction is one byte in each case. Or find an address explicitly with e.g. "(gdb) x algebra". algebra 0x1dd6 to 0x1e0d low swap 0x1e0e to 0x1e2f print_string 0x1e30 to 0x1e57 main 0x1e58 to 0x1f0c high So all that actual code is 0x1f0c - 0x1dd6 + 1 = 0x137 = 311 bytes = 0.3kB Note that if you try to compare these address with our stack picture, you'll see that things are upside down: the images of the stack have the low address at the bottom. That means that the execution thread runs "upward" in the picture, in code at the bottom of this ascii ard. Here it is with break at = 0x1e65, which is as close to the beginnning as gdb will easily let us get. (I tried setting a break at "start" - see below - but trying to do "nexti" gives "Cannot access memory at..." errors, and it loops without jumping to main.) Some of the numbers here are from the analysis further down, but I figured I'd put all of it in here. Class exercise(?) : compile a short program that does a memory allocation with malloc, and look at what its address is. Compare with memory address of (a) a function's local variable (on the stack), and (b) an initialized string (in a data section, near the code.) Can you get the address of the function entry point to print out in C code? +++++++++++++++++++++++++++ | | HIGH MEMORY | |---------- | 0xbffff328 <= ebp (base pointer) | stack frame main (callee) | 0xbffff2e0 <= esp (stack pointer) = 3_221_222_112 = 3.2GB |---------- VVVVVV stack grows down VVVVVVV | | | ~~~~~~~~~~~~ | ~ the heap ~ | ~~~~~~~~~~~~ | | |---------- | | ... links to system library stuff ... | | | 0x1f82 = dlyd_stub_exit code |---------- | 0x1f5c "x,y,z = %i,%i,%i, algebra(x,y,z)=%i \n" | 0x1f3c "This second string. So there." | 0x1f10 "string: '%s'\n" |---------- | 0x1f0c | | 0x1e58
i.e. 7_768 = 7.7kB |---------- | 0x1e57 | | 0x1e30 |---------- | 0x1e2f | | 0x1e0e |---------- | 0x1e0d | | 0x1dd6 |---------- | 0x1dd5 | | 0x1d98 |---------- | | LOW MEMORY | ++++++++++++++++++++++++++++++++ So where is the data for the text strings in the .cstring sections of small.s ? And what lives above and below? (gdb) x/20i algebra-10 # look at 20 instructions, starting 10 before algebra. 0x1dcc : add %cl,-0x5217dbfc(%ecx) 0x1dd2 : add %eax,(%eax) 0x1dd4 : add %dh,%ah 0x1dd6 : push %ebp 0x1dd7 : mov %esp,%ebp ... Hmm. So there's a "start" routine in there; we can disass that to see it runs 0x1d98 to 0x1dd5 So I look before that : (gdb) x/10i start-10 0x1d8e: add %al,(%eax) 0x1d90: add %al,(%eax) 0x1d92: add %al,(%eax) 0x1d94: add %al,(%eax) 0x1d96: add %al,(%eax) 0x1d98 : push $0x0 0x1d9a : mov %esp,%ebp 0x1d9c : and $0xfffffff0,%esp 0x1d9f : sub $0x10,%esp 0x1da2 : mov 0x4(%ebp),%ebx And I see two things : (1) gdb doesn't have a name for things before , and x (2) it's the same instruction over and over; weird. I wonder what the code for it is: (gdb) x/10x start-10 0x1d8e: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x1d96: 0x00 0x00 Ah: they're all NULL. So how far back can I look? A little trial and error gives : (gdb) x 0xff 0xff: Cannot access memory at address 0xff This is what the "segfault" error seen when a C program crashes typically means; a program is trying to access a part of memory that it's not allowed to; see wikipedia:Segmentation_fault The lowest code that gdb lets me see is this : 0x1000 into # interupt for overflow; see http://siyobik.info/index.php?module=x86&id=142 0x1001 cli # see http://en.wikipedia.org/wiki/IF_(x86_flag)#CLI 0x1002 in (%dx),%eax # se interrupt handler, I think 0x1003 incb (%edi) 0x1004 0x0 # 0x0 = NULL = "add %cl,(%ebx)" as an instruction. 0x1005 0x0 # same all the way to ... but I am not convinced this every actually runs; it looks like it's setting up some system level interupt handler stuff. So much for lower than the code; let's look higher : (gdb) x/30b 0x1f0c # highest byte of
from earlier disass 0x1f0c : "?" 0x1f0e: "" 0x1f0f: "" 0x1f10: "string: '%s'\n" 0x1f1e: "The first string. I think." 0x1f3a: "" 0x1f3b: "" 0x1f3c: "This second string. So there." 0x1f5b: "" 0x1f5c: "x,y,z = %i,%i,%i, algebra(x,y,z)=%i \n" 0x1f82 : "?%\034 " 0x1f87 : "" 0x1f88 : "?% " 0x1f8d : "" 0x1f8e < stub helpers>: "h\030 " 0x1f92 < stub helpers+4>: "" 0x1f93 < stub helpers+5>: "?%\024 " Ah ha: we've found our strings. And above that, interfaces to system library calls. Now I'll fill in those in my memory picture above. The NULLs in 0x1f3a ("" is yet another way to write 0x0) are there for alignment (the .align in small.s); the strings are set to start at a multiple of 2 or 4 bytes. === info file There's another way to get some of that same information : (gdb) info file Symbols from "/Users/mahoney/academics/term/2010-09-fall/systems/code/gdb_example/small". Mac OS X child process: macosx_debug_inferior_status: current status: inferior task: 0x3907 [SIGNAL THREAD] macosx_debug_inferior_status: information on debugger task: macosx_debug_inferior_status: information on inferior task: macosx_debug_inferior_status: information on debugger threads: thread: 0x60f thread: 0x1703 thread: 0x2003 thread: 0x1b07 thread: 0x4103 macosx_debug_inferior_status: information on inferior threads: thread: 0x3c07 While running this, GDB does not access memory from... Mac OS X executable: /Users/mahoney/academics/term/2010-09-fall/systems/code/gdb_example/small, file type mach-o-le. Entry point: 0x00001d98 0x8fe00000 - 0x8fe42000 is LC_SEGMENT.__TEXT in /usr/lib/dyld 0x8fe01000 - 0x8fe2ffe4 is LC_SEGMENT.__TEXT.__text in /usr/lib/dyld ... 0x00001000 - 0x00002000 is LC_SEGMENT.__TEXT in .../small 0x00001d98 - 0x00001f0d is LC_SEGMENT.__TEXT.__text in .../small 0x00001f10 - 0x00001f82 is LC_SEGMENT.__TEXT.__cstring in .../small 0x00001f82 - 0x00001f8e is LC_SEGMENT.__TEXT.__symbol_stub in .../small 0x00001f8e - 0x00001fae is LC_SEGMENT.__TEXT.__stub_helper in .../small 0x00001fb0 - 0x00001ff8 is LC_SEGMENT.__TEXT.__unwind_info in .../small 0x00002000 - 0x00003000 is LC_SEGMENT.__DATA in /Users 0x00002000 - 0x00002014 is LC_SEGMENT.__DATA.__program_vars in .../small 0x00002014 - 0x0000201c is LC_SEGMENT.__DATA.__nl_symbol_ptr in .../small 0x0000201c - 0x00002024 is LC_SEGMENT.__DATA.__la_symbol_ptr in .../small 0x00002024 - 0x00002034 is LC_SEGMENT.__DATA.__data in .../small 0x00003000 - 0x0000323c is LC_SEGMENT.__LINKEDIT in .../small 0x90c9a000 - 0x90e41000 is LC_SEGMENT.__TEXT in /usr/lib/libSystem.B.dylib .... 0x97dedfa0 - 0x97dedfe8 is LC_SEGMENT.__TEXT.__unwind_info in /usr/lib/system/libmathCommon.A.dylib 0x99b5e000 - 0x99b5fca0 is LC_SEGMENT.__LINKEDIT in /usr/lib/system/libmathCommon.A.dylib First, the entry point, 0x1d98 is : (gdb) x/10i 0x1d98 0x1d98 : push $0x0 0x1d9a : mov %esp,%ebp 0x1d9c : and $0xfffffff0,%esp 0x1d9f : sub $0x10,%esp 0x1da2 : mov 0x4(%ebp),%ebx which (unlike the start of the other functions) begins by pushing 0 onto the stack Second, the strings are in the .__cstring segment, from 0x2f10 to 0x1f82. Third, the code (.__TEXT) is in 0x === registers Looking at the registers after setting a breakpoint at main : (gdb) start # same as "break main; run" (gdb) info registers eax 0x0 0 ecx 0xbffff34c -1073745076 edx 0x0 0 ebx 0x1e64 7780 esp 0xbffff2e0 0xbffff2e0 ebp 0xbffff328 0xbffff328 esi 0x0 0 edi 0x0 0 eip 0x1e65 0x1e65 eflags 0x282 642 cs 0x17 23 ss 0x1f 31 ds 0x1f 31 es 0x1f 31 fs 0x0 0 gs 0x37 55 The eflags register has all the condition codes; see http://en.wikipedia.org/wiki/FLAGS_register_(computing) . There a bunch of flags that the text hasn't talked about; the ones we've discussed are : bit flag --- ---- 0 CF carry 6 ZF zero 7 SF sign 11 OF overflow The other registers are mentioned briefly in wikipedia:x86 : 16-bit : 4 segment registers (CS, DS, SS, ES) are used to form a memory address 32-bit : 2 new segment registers (FS, GS) were added. Basically all that has to do with old tricks for building up relative addresses; see the "segment registers" discussion in wikipedia:x86. We can safelyl ignore all that. Quick quiz: if eflags is 0x282, which if any of (CF, ZF, SF, OF) are set? Answer 0x282 = 0010 1000 0010 ba98 7654 3210 ... I think. Now let's look at the stack, at this initial break point. The registers %ebp and %esp give us its start and end, which means that the stack size is (0xbffff328 - 0xbffff2e0) (quick quiz 2 : how to do that calcuulation? answer: python prompt) = 72 8-bit bytes, or 72/4 = 18 32-bit "extended" words. But we have to round up 1 more word, because of the "fencepost" : if %ebp = %esp, we'd still have 1 word to look at. And since %esp is the lower address, and gdb calls that $esp, we can look at the stack with (gdb) x/19x $esp 0xbffff2e0: 0x0000000a 0x00000006 0x8fe0053c 0x0000000a 0xbffff2f0: 0x0000000a 0x8fe42000 0xbffff3f0 0x8fe00638 0xbffff300: 0x00000000 0x00000000 0x00000000 0x00000000 0xbffff310: 0x00000000 0x00000000 0x00000000 0x00001000 0xbffff320: 0x00000000 0xbffff3f0 0xbffff344 (gdb) x $ebp # this is the stack bottom (highest address) 0xbffff328: 0xbffff344 (gdb) x $esp # this is the stack top (lowest address) 0xbffff2e0: 0x0000000a In class : With all that background and setup (whew!), step through small.c in detail, looking at * stack addresses and pointers * passed arguments * return values * which registers are saved and restored by whom.