Where oh where is my main() frame?
(gdb) break free
From time to time somebody asks me about help with debugging or information about how can something be debugged. Because bugs varies, there are a lot of debugging techniques, each of them fits better or worse to the particular type of bugs. However, there is one debugging technique, or, better - a tool, which could be used on a large variety of problems and although you can in many cases find some special, better fitted tool to debug that particular type of bug, mastering of this kind-of universal tool will pay off many times in future.
As the title already told you, i am writing about gdb. Gdb, GNU Debugger, is a tool proven by time, it can be used for both post-mortem analysis and from very basic to very advanced life debugging.
There is a lot of good documentation on gdb, so, why write another one? A lot of people will point you to the gdb documentation. It is very good, but... it is very long!
When somebody asks me of help, he does not expect me to point him/her to the hundreds pages of documentation. He usually expects me to either help with his problem, or to show him how he can solve it by himself.
And because i am lazy and have a limited time, and i do not like to repeat myself too many times, i decided to write this, aiming to create kind of compact training material for gdb.
This material is in no way complete in the sense of showing you all the gdb features. It aims just to be a good starter which should kick you up on your gdb-usage journey. And, it should be a thing i will point at saying 'you want to explain how to use gdb? Did you go through <this>? No? Let's go there and ask me again if you still do not know what to do after you are through it.'
That's it.
This article is free for any type of use (including printing, redistribution, changing,...), no matter if commercial or not. Do not remove my name as an author (or original author in the case you create derived work) and a HTTP link to this article (html form: http://www.bzz.cz/debug_text.html,PostScript/PDF: .ps/.pdf). Should you encounter any bug in this text, please let me know by emailing to "incoming at <domain name of this site - bzz cz>''
First, the tour is aimed for those who already has got some experience with programming. Although it could be of a good value for a skilled newbie, experienced programmer would probably value this much more. Just to got a clue: you should know what is a core (core-file), preprocessor, symbol and pointer before going on with the tour.
You will be confronted with a short C program, with 5 prepared failures. You will go through the failures and learn how to use gdb from the very basic usage to a reasonably good usage level.
I have chosen a field of pointer errors. I do not want to elaborate on why i have chosen this. If you are curious, ask me.
Run the compiled example by
Some message - alloc done
Some message - alloc done
Some message - alloc done
Segmentation fault (core dumped)
$
-rw---- 1 bazil bazil 282624 2008-01-08 00:00 core
Run gdb - i will past a full output here for now
GNU gdb 6.6.90.20070912-debian
Copyright (C) 2007 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i486-linux-gnu"...
Using host libthread_db library "/lib/i686/cmov/libthread_db.so.1".
warning: Can't read pathname for load map: Input/output error.
Reading symbols from /lib/i686/cmov/libc.so.6...done.
Loaded symbols for /lib/i686/cmov/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Core was generated by `./example'.
Program terminated with signal 11, Segmentation fault.
#0 0xb7e70c3c in memcpy () from /lib/i686/cmov/libc.so.6
(gdb)
Reading + Loaded informs you what binaries (libraries in this case) are opened by gdb to get the symbols resolved. Usually you debug a bit more sophisticated program than in our example so you would see more Reading and Loading messages.
Next line is really important:
Useful isn't it?
Next line is one of the possible gdb differences you can hit. Maybe you are looking at these two lines:
47 memcpy((char *)p_ok, text, text_size*1000000);
Now you know that the problem was in calling the memcpy(). So, either the source or the destination pointer of memcpy was an invalid reference.
So what, are you going to check all the memcpy commands? Yeah, you can do this for such a small program like this example one, but you cannot do this in normal situation. You can try some kind of real-time checker, e.g. valgrind, but you need to be able to run the program - ``reproduce the issue''. In the case you can only do the post-mortem analysis (using the corefile), valgrind cannot help you. You can have a try with a static code analyze (using e.g. lint) - but, this is a dynamic memory problem so it probably won't be reported by a static analyze tool.
But wait - we are just running post-mortem in gdb! So, it could do no harm doing three keystrokes here, right?
#0 0xb7e70c3c in memcpy () from /lib/i686/cmov/libc.so.6
#1 0x080486d9 in main (argc=1, argv=0xbfaedad4) at example.c:47
Let's look at this a bit more. You have some more useful tools built-in the debugger.
#0 0xb7e70c3c in memcpy () from /lib/i686/cmov/libc.so.6
#1 0x080486d9 in main (argc=1, argv=0xbfaedad4) at example.c:47
#1 0x080486e3 in main (argc=1, argv=0xbf9ae194) at example.c:47
47 memcpy((char *)p_ok, text, text_size*1000000);
p_ok = (void *) 0x804a008
p_clear = (void *) 0x804a048
p_stack = (void *) 0xbfc0d2c0
switcher = 0
42 p_ok = alloc_this(FALSE, FALSE, FALSE, text_size);
43 p_clear = alloc_this(TRUE, FALSE, FALSE, text_size);
44 p_stack = alloc_this(TRUE, TRUE, FALSE, text_size);
45
46 if (!switcher) {
47 memcpy((char *)p_ok, TEXT, text_size*1000000);
48 }
49
50
51 memcpy((char *)p_ok, TEXT, text_size);
Ok, now look at the arguments:
$3 = (void *) 0x804a008
(gdb) print text
$4 = "This is some very clever and not at all short sample text\n"
(gdb) print text_size
$5 = 59
(gdb) print text_size*1000000
$6 = 59000000
We found first error and are able to fix it. Fix and recompile it if you want.
Run the example binary (either you fixed the 0 bug and recompiled or not) with the parameter 1:
Some message - alloc done
Some message - alloc done
Some message - alloc done
Segmentation fault (core dumped)
After looking at the backtrace, you should see something like
#0 0xb7ec7c35 in memcpy () from /lib/i686/cmov/libc.so.6
#1 0x080485bc in alloc_this (flag_clear=0, flag_onstack=0, flag_fail=1,
size=59) at example.c:19
#2 0x08048746 in main (argc=2, argv=0xbfff62d4) at example.c:54
size=59) at example.c:19
#1 0x080486e4 in main (argc=2, argv=0xbfcc17b4) at example.c:54
Now, it is time for something new. Exit debugger.
Run debugger with this command line:
First, we want to trigger failure 1, not 0, so we need to set the program arguments somehow.
(gdb) run
Okay, anything new?
Yes. You have the possibility to stop the program just before hitting the bug. Looking at backtrace, you want to break the program somewhere in the alloc_this function, right?
To break the program, the debugger is using a breakpoint. It is just a point in the instruction flow, where the program execution is suspended and the debugger comes into charge again.
So set a breakpoint there:
Breakpoint 1 at 0x804844c: file example.c, line 14.
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/bazil/actual/gdb/example 1
Breakpoint 1, alloc_this (flag_clear=0, flag_onstack=0, flag_fail=0, size=59)
at example.c:14
14 void * p_bug = NULL;
at example.c:14
#1 0x0804856e in main (argc=2, argv=0xbfca8284) at example.c:42
Yes, if you look at the first backtrace in the failure 1, you can see that it was
We do not want to inspect this particular alloc_this() call because there is no failure here.
Lets continue.
Continuing.
Some message - alloc done
Breakpoint 1, alloc_this (flag_clear=1, flag_onstack=0, flag_fail=0,
size=59) at example.c:14
14 void * p_bug = NULL;
(gdb) bt
#0 alloc_this (flag_clear=1, flag_onstack=0, flag_fail=0, size=59)
at example.c:14
#1 0x08048596 in main (argc=2, argv=0xbfca8284) at example.c:43
#0 alloc_this (flag_clear=0, flag_onstack=0, flag_fail=1, size=59)
at example.c:14
#1 0x0804863a in main (argc=2, argv=0xbfca8284) at example.c:54
16 void * mem_ptr = (flag_onstack) ? alloca(size) : malloc(size);
(gdb) next
19 if (flag_fail) memcpy((char *) p_bug, text, text_size);
Now, you are at the failing line. Inspect thing a little using the print command (won't paste it in here because it is just a repetition from failure 0)
Now, let's say that you have a real program, and you know that function qweasd() is the failing one. But it is failing only once a thousand calls. You do not want do type cont one thousand times. In gdb, you have several options depending on what you exactly need, let's look at several breakpoint commands now:
Num Type Disp Enb Address What
1 breakpoint keep y 0x0804844c in alloc_this at example.c:14
breakpoint already hit 4 times
2 breakpoint keep y 0x0804849c in alloc_this at example.c:19
Remember the next command? It will just execute the source line you are at and stop executing before the next line of actual source file get executed. Consider you are in the main(), the actual line specifies an alloc_this call, and you do not want to just jump over to next line, you want to be able to see what is going on in the alloc_this call. You can use the step command to step inside the function. Using it would move you to the first line of the alloc_this function.
Now, we will do some magic to end-up this chapter.
I suppose you either made the conditional breakpoint as described above or any other kind of breakpoint with which you can get to the failing case of the alloc_this() function call.
Run the program and get into the failing case, but do not execute the failing memcpy() call - just get inside the alloc_this function. Check you have the failing case by inspecting the flag_fail value by the print command.
(gdb) print flag_fail
$1 = 1
(gdb) set flag_fail = 0
(gdb) print flag_fail
$2 = 0
(gdb) cont
Continuing.
Some message - alloc done
Program exited with code 01.
(gdb)
This failure is a simple one, but you will learn two new things here.
Let's call ./example 2. You get a corefile, run an gdb on it, execute bt full command.
You should see something like this:
#0 0xb7eec4db in strlen () from /lib/i686/cmov/libc.so.6
No symbol table info available.
#1 0x08048775 in main (argc=2, argv=0xbfadeac4) at example.c:67
p_ok = (void *) 0x0
p_clear = (void *) 0x804a048
p_stack = (void *) 0xbfade980
switcher = 2
Looking at local variables of main(), there is nothing suspicious, only the p_ok is pointing to null. Now look at the guilty line.
#1 0x08048775 in main (argc=2, argv=0xbfadeac4) at example.c:67
67 fprintf(stderr, "Kernel mem length is %d\n", strlen((char *) 1));
$1 = 0x1 <Address 0x1 out of bounds>
Now, let's have a look at another handy gdb feature. Stop the debugger, and run it on a binary (without the core file). Set commandline argument to 2, breakpoint on line 67, run the program.
You are now just before the failure.
Gdb can do something like this:
$1 = 4
Program received signal SIGSEGV, Segmentation fault.
0xb7e614db in strlen () from /lib/i686/cmov/libc.so.6
The program being debugged was signaled while in a function called from GDB.
GDB remains in the frame where the signal was received.
To change this behavior use "set unwindonsignal on"
Evaluation of the expression containing the function (strlen) will be abandoned.
Now look at the backtrace, it gives you a hint that you manipulated the calls a little bit.
This failure is similar to failure number 2, but although this is the easiest chapter, you still have possibility to learn something new here.
Now run the bugger with ./example 3
Look at the backtrace:
#0 0xb7e348eb in strlen () from /lib/tls/libc.so.6
#1 0xb7e0821e in vfprintf () from /lib/tls/libc.so.6
#2 0xb7e03d13 in cuserid () from /lib/tls/libc.so.6
#3 0xb7e0490f in vfprintf () from /lib/tls/libc.so.6
#4 0xb7e0d3c2 in fprintf () from /lib/tls/libc.so.6
#5 0x080486fb in main (argc=2, argv=0xbfd14c64) at example.c:71
Look around - select your main() frame, inspect locals and so (frame, info locals, list)
Because the libc has no symbol table available (try selecting the fprintf frame called from main() and looking at the local variables), you cannot see the arguments passed to the fprintf call.
Looking at the source line:
No symbol "MACRO" in current context.
print MACRO
No symbol "MACRO" in current context.
It is possible to instruct the compiler to put even the macro processing information into the resulting binary. With gcc, you need to recompile the example with:
Now, rerun ./example 3 and issue gdb example core.
Backtrace is still the same, let's switch to the main() frame.
You have at least two possibilities how to get to the evaluated macro value, passed to the fprintf call.
First, you can use the gdb command macro:
Failure number 4 is a bit advanced topic. It does not need any extra skills or so, but because it tries to illustrate something which can be classified as a type of buffer overflow and how it could happen that your program produced an (completely) useless coredump, it needs a bit understanding of a an (i386) stack.
But even if you are not interested in above (but you should be, because the overwritten stack case can happen to you possibly under a completely different conditions), you will learn a bit of a gdb beginner magic here.
However, i must note that depending on a several things, you have a chance of not overwriting the call stackwhen running on your hardware. You can try to increase the the number of the for cycles at line 58, but event then, depending on the situation, you may actually end up with hitting SIGSEGV without overwriting the stack. Just try it. If you still cannot overwrite the stack, don't be sad, you can learn the promised magic without overwriting stack.
Enough of talks, run ./example 4
If you haven't forgot to unset the corefile limit, you should get a coredump file.
Go on, examine it with gdb. After running gdb, your bt output could look like:
#0 0x20656c70 in ?? ()
#1 0x74786574 in ?? ()
#2 0x0804000a in ?? ()
#3 0x0000003b in ?? ()
#4 0x0000003b in ?? ()
#5 0xbfb1aa80 in ?? ()
#6 0x08049a10 in ?? ()
#7 0xbfb1aa58 in ?? ()
#8 0x080483f0 in _init ()
#9 0xb7e3d050 in __libc_start_main () from /lib/i686/cmov/libc.so.6
#10 0x080484e1 in _start ()
Let's show how that happened.
Start gdb, set commandline argument to 4, and before running the program, set some breakpoints - e.g. first one to main(), second to alloc_this().
Get ready some editor/viewer on the example source file.
Now, run it and continue several times after you get the SIGSEGV error.
You should see something like this:
Starting program: /home/bazil/actual/gdb/example 4
Breakpoint 1, main (argc=2, argv=0xbfd05524) at example.c:32
32 int switcher = 0;
(gdb) c
Continuing.
Breakpoint 2, alloc_this (flag_clear=0, flag_onstack=0, flag_fail=0, size=59)
at example.c:14
14 void * p_bug = NULL;
(gdb) c
Continuing.
Some message - alloc done
Breakpoint 2, alloc_this (flag_clear=1, flag_onstack=0, flag_fail=0, size=59)
at example.c:14
14 void * p_bug = NULL;
(gdb) c
Continuing.
Some message - alloc done
Breakpoint 2, alloc_this (flag_clear=1, flag_onstack=1, flag_fail=0, size=59)
at example.c:14
14 void * p_bug = NULL;
(gdb) c
Continuing.
Some message - alloc done
Program received signal SIGSEGV, Segmentation fault.
0x20656c70 in ?? ()
(gdb)
What now? Look at the output above. You can say that at least some of the alloc_this() calls went fine. Well, you can see it was the last one in the source file, but in the real situation, it could be the x'th one (neither first, nor last of the alloc_this() calls). You probably want to inspect the situation on the last successful alloc_this call.
Let's try some promised beginner magic now.
Let's go the fastest way.
You can specify a commands to be executed on each breakpoint hit.
So.
So what?
It's easy, try to think a little bit before reading further.
Let's suppose the alloc_this() breakpoint is a number 2. If you do something like:
Type commands for when breakpoint 2 is hit, one per line.
End with a line saying just "end".
>bt
>continue
>end
So, you will end up with the same overwritten stack, but the last backtrace will identify the position in the source file where the problem had arisen (at some other cases - like the problem arising on a call executed from cycle - you can use other commands to be executed as well - so you can display e.g. the cycle counter to identify the exact cycle run etc.)
Let's run it, possibly switching off the breakpoint 1 (in main()).
The end of gdb output should look like:
at example.c:14
14 void * p_bug = NULL;
#0 alloc_this (flag_clear=1, flag_onstack=1, flag_fail=0, size=59)
at example.c:14
#1 0x08048678 in main (argc=2, argv=0xbff16f34) at example.c:44
Some message - alloc done
Program received signal SIGSEGV, Segmentation fault.
0x20656c70 in ?? ()
(gdb)
Let's put a breakpoint on that line, disable the alloc_this() breakpoint and re-run the program.
You can step in the alloc_this() function, but to speed things up, lets go over by next. Repeat the next as long as you get to the failure:
44 p_stack = alloc_this(TRUE, TRUE, FALSE, text_size);
(gdb) n
Some message - alloc done
46 if (!switcher) {
(gdb) n
51 memcpy((char *)p_ok, text, text_size);
(gdb) n
52 memcpy((char *)p_clear, text, text_size);
(gdb) n
54 if (switcher==1) (void) alloc_this(FALSE, FALSE, TRUE, text_size);
(gdb) n
56 if (switcher==4) {
(gdb) n
58 for (x=0; x<5; x++) {
(gdb) n
59 memcpy((char *)(p_stack+x*text_size), text, text_size);
(gdb) n
58 for (x=0; x<5; x++) {
(gdb) n
59 memcpy((char *)(p_stack+x*text_size), text, text_size);
(gdb) n
Program received signal SIGSEGV, Segmentation fault.
0x20656c70 in ?? ()
Fine, it is identified, now we are going to look at it.
Delete all breakpoints and set a new one at line 59.
Set a display of some variables like:
(gdb) display p_stack
(gdb) display text
(gdb) display text_size
(gdb) display p_stack+x*text_size
Now, for a last time in this failure, let the program die. Use cont commands. You will see something like this, only the pointer adresses will be different:
59 memcpy((char *)(p_stack+x*text_size), text, text_size);
5: p_stack + x * text_size = (void *) 0xbfd103f0
4: text_size = 59
3: text = "This is some very clever and not at all short sample text\n"
2: p_stack = (void *) 0xbfd103f0
1: x = 0
(gdb) c
Continuing.
Breakpoint 4, main (argc=2, argv=0xbfd10534) at example.c:59
59 memcpy((char *)(p_stack+x*text_size), text, text_size);
5: p_stack + x * text_size = (void *) 0xbfd1042b
4: text_size = 59
3: text = "This is some very clever and not at all short sample text\n"
2: p_stack = (void *) 0xbfd103f0
1: x = 1
(gdb) c
Continuing.
Program received signal SIGSEGV, Segmentation fault.
0x20656c70 in ?? ()
Originally, i wanted to go through the stack addresses, compare it to p_stack pointer and do some pointer math to show you what happened, but given the fact it is long enough now and your patience is probably lost by now, i will give you the start and you can inspect yourself. So:
Stack level 0, frame at 0xbfd6a4f0:
eip = 0x8048715 in main (example.c:59); saved eip 0xb7db9050
source language c.
Arglist at 0xbfd6a4e8, args: argc=2, argv=0xbfd6a584
Locals at 0xbfd6a4e8, Previous frame's sp at 0xbfd6a4e4
Saved registers:
ebp at 0xbfd6a4e8, eip at 0xbfd6a4ec
(gdb) display
5: p_stack + x * text_size = (void *) 0xbfd6a47b
4: text_size = 59
3: text = "This is some very clever and not at all short sample text\n"
2: p_stack = (void *) 0xbfd6a440
1: x = 1
(gdb) print p_ok
$1 = (void *) 0x804a008
Run the example with argument 5 to get the last prepared failure.
It should fail on a SIGSEGV again. Your backtrace could look like
#1 0x08048712 in main (argc=2, argv=0xbfa6a9b4) at example.c:75
First, inspect the situation a bit. Look at the frame, locals, print the source code.
Then, let's say you decided you need to see what happens live.
Exit debugger, run it with binary only.
Set proper arguments.
Wait, can we put the breakpoint inside the dynamic library? Try it:
Function "free" not defined.
Make breakpoint pending on future shared library load? (y or [n])
Now, run the program, and inspect each free() call a bit. Let's say you are paranoid, so set the frame to your main() at each free() call and look at the address to be freed.
When you get to the failing free() call, switch to the main() frame, look at the p_stack value and compare with stack frame address. You see the problem is that this pointer was not allocated by malloc() or any of its wrappers, because it is a pointer somewhere to the stack. Because it was not allocated via malloc, it cannot be freed.
That's all of the prepared failures i have got for you. However, don't miss the next chapter.
Let me give you one little advice: you can learn how to use gdb best by trying to use it. Where can you found real problems if you are not full-time developer and not solving customer problems?
If you are looking for some additional information (and you should be because this article is just to start you up), consider:
There is always a lot more what can be written about gdb or debugging in general, but i hope this short (compared to the doc) text helped you with learning gdb. Farewell!
-
Jan 'bazil' Otte, 2007
should you need to contact me, write to "incoming at <domain name - bzz cz>''.
You should be able to get this online at http://www.bzz.cz/data/example.c
#include <stdlib.h>
#include <alloca.h>
#include <string.h>
#define FALSE 0
#define TRUE 1
#define MACRO(x) (x+1)
int text_size = 0;
char text[] = "This is some very clever and not at all short sample text\n";
void * alloc_this(int flag_clear, int flag_onstack, int flag_fail, size_t size) {
/* little allocation function, can alloc on heap and on stack,
clear the memory and simulate an error (copy to NULL) */
void * p_bug = NULL;
/* allocate memory */
void * mem_ptr = (flag_onstack) ? alloca(size) : malloc(size);
/* simple failure */
if (flag_fail) memcpy((char *) p_bug, text, text_size);
/* clear memory if flag set and memory alloc succeeded */
if (mem_ptr && flag_clear) mem_ptr = memset(mem_ptr, 0, size);
/* just a clue - you can break here */
printf("Some message - alloc done\n");
return mem_ptr;
}
int main(int argc, char ** argv) {
void * p_ok, * p_clear, * p_stack;
int switcher = 0;
text_size = strlen(text) + 1;
/* get the switcher if any */
if (argc>1) {
switcher=atoi(argv[1]);
}
/* allocate it */
p_ok = alloc_this(FALSE, FALSE, FALSE, text_size);
p_clear = alloc_this(TRUE, FALSE, FALSE, text_size);
p_stack = alloc_this(TRUE, TRUE, FALSE, text_size);
if (!switcher) {
memcpy((char *)p_ok, text, text_size*1000000);
}
memcpy((char *)p_ok, text, text_size);
memcpy((char *)p_clear, text, text_size);
if (switcher==1) (void) alloc_this(FALSE, FALSE, TRUE, text_size);
if (switcher==4) {
int x;
for (x=0; x<5; x++) {
memcpy((char *)(p_stack+x*text_size), text, text_size);
}
}
free(p_ok); p_ok = NULL;
free(p_clear);
if (switcher == 2) {
fprintf(stderr, "Kernel mem length is %d\n", strlen((char *) 1));
}
if (switcher == 3) {
fprintf(stderr, "Kernel mem: %s\n", (char *) MACRO(1));
}
if (switcher>4) {
if (p_stack) free(p_stack);
}
return switcher;
}
This document was generated using the LaTeX2HTML translator Version 2002-2-1 (1.71)
Copyright © 1993, 1994, 1995, 1996,
Nikos Drakos,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999,
Ross Moore,
Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html -no_subdir -split 0 -show_section_numbers debug_text.tex
The translation was initiated by on 2008-01-17
2008-01-17