gdb, backtrace and core dump

                                                                    Kernighan’s Law
“Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.”

BacktracesA backtrace is a summary of how your program got where it is. It shows one line per  frame, for many frames, starting with the currently executing frame (frame zero),  followed by its caller (frame one), and on up the stack.

backtrace
bt
Print a backtrace of the entire stack: one line per frame for all frames in the stack.
You can stop the backtrace at any time by typing the system interrupt character, normally

C-c.
backtrace n
bt n
Similar, but print only the innermost n frames.
backtrace -n
bt -n
 
Similar, but print only the outermost n frames.
The names where and info stack (abbreviated info s) are additional aliases for backtrace. Each line in the backtrace shows the frame number and the function name. The program  counter value is also shown--unless you use set print address off. The backtrace also  shows the source file name and line number, as well as the arguments to the function. The  program counter value is omitted if it is at the beginning of the code for that line  number.

Explaining through a code And Setting A Breakpoint
We'll look at the stack again, this time, using GDB. You may not understand all of this
since you don't know about breakpoints yet, but it should be intuitive. Compile and run

test.c:

  1 #include<stdio.h>
  2 static void display(int i, int *ptr);
  3
  4 int main(void) {
  5         int x = 10;
  6         int *xptr = &x;
  7         printf("In main():\n");
  8         printf("   x is %d and is stored at %p.\n", x, &x);
  9         printf("   xptr points to %p which holds %d.\n", xptr, *xptr);
 10         display(x, xptr);
 11         return 0;
 12 }
 13
 14 void display(int z, int *zptr) {
 15         printf("In display():\n");
 16         printf("   z is %d and is stored at %p.\n", z, &z);
 17         printf("   zptr points to %p which holds %d.\n", zptr, *zptr);
 18 }

Make sure you understand the output before continuing with this tutorial. Here's what I
see:
Output :

[root@CISCO cprog]# gcc -g test.c
[root@CISCO cprog]# ./a.out
In main():
   x is 10 and is stored at 0x7fffd1f1bc54.
   xptr points to 0x7fffd1f1bc54 which holds 10.
In display():
   z is 10 and is stored at 0x7fffd1f1bc3c.
   zptr points to 0x7fffd1f1bc54 which holds 10.

You debug an executable by invoking GDB with the name of the executable. Start a  debugging session with test. You'll see a rather verbose copyright notice:

[root@CISCO cprog]# gdb ./a.out
GNU gdb (GDB) Fedora (7.1-18.fc13)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/cprog/a.out...done.
(gdb)


The (gdb) is GDB's prompt. It's now waiting for us to input commands. The program is  currently not running; to run it, type run. This runs the program from inside GDB:


(gdb) run
Starting program: /home/cprog/a.out
In main():
   x is 10 and is stored at 0x7fffffffe464.
   xptr points to 0x7fffffffe464 which holds 10.
In display():
   z is 10 and is stored at 0x7fffffffe44c.
   zptr points to 0x7fffffffe464 which holds 10.

Program exited normally.
(gdb)


Well, the program ran. It was a good start, but frankly, a little lackluster. We could've  done the same thing by running the program ourself. But one thing we can't do on our own  is to pause the program in the middle of execution and take a look at the stack. We'll do  this next.

You get GDB to pause execution by using breakpoints. We'll cover breakpoints later, but  for now, all you need to know is that when you tell GDB break 5, the program will pause  at line 5. You may ask: does the program execute line 5 (pause between 5 and 6) or does  the program not execute line 5 (pause between 4 and 5)? The answer is that line 5 is not  executed. Remember these principles:

break 5 means to pause at line 5.
This means GDB pauses between lines 4 and 5. Line 4 has executed. Line 5 has not.

Set a breakpoint at line 10 and rerun the program:
(gdb) break 10
Breakpoint 1 at 0x40055f: file test.c, line 10.
(gdb) run
Starting program: /home/cprog/a.out
In main():
   x is 10 and is stored at 0x7fffffffe464.
   xptr points to 0x7fffffffe464 which holds 10.

Breakpoint 1, main () at test.c:10
10          display(x, xptr);
(gdb)


The Backtrace Command

We set a breakpoint at line 10 of file test.c. GDB told us this line of code corresponds  to memory address 0x8048445. We reran the program and got the first 2 lines of output.  We're in main(), sitting before line 10. We can look at the stack by using GDB's

backtrace command:
(gdb) backtrace
#0  main () at test.c:10
(gdb)

The gdb backtrace command simply lists all of the frames currently on the stack. In the  example above, there is one frame on the stack, numbered 0, and it belongs to main(). If  we execute the next line of code, we'll be in display(). From the previous section, you  should know exactly what should happen to the stack: another frame will be added to the  bottom. Let's see this in action. You can execute the next line of code using GDB's step  command:

(gdb) step
display (z=10, zptr=0x7fffffffe464) at test.c:15
15          printf("In display():\n");
(gdb)

Look at the stack again, and make sure you understand everything you see:
(gdb) backtrace
#0  display (z=10, zptr=0x7fffffffe464) at test.c:15
#1  0x0000000000400570 in main () at test.c:10
(gdb)


Some points to note:
We now have two stack frames, frame 1 belonging to main() and frame 0 belong to display().
Each frame listing gives the arguments to that function. We see that main() took no arguments, but display() did (and we're shown the value of the arguments). Each frame listing gives the line number that's currently being executed within that  frame. Look back at the source code and verify you understand the line numbers shown in  the backtrace.

Personally, I find the numbering system for the frame to be confusing. I'd prefer for  main() to remain frame 0, and for additional frames to get higher numbers. But this is  consistent with the idea that the stack grows "downward". Just remember that the lowest  numbered frame is the one belonging to the most recently called function. Execute the next two lines of code:

(gdb) step
display (z=10, zptr=0x7fffffffe464) at test.c:15
15          printf("In display():\n");
(gdb) backtrace
#0  display (z=10, zptr=0x7fffffffe464) at test.c:15
#1  0x0000000000400570 in main () at test.c:10
(gdb) step
In display():
16          printf("   z is %d and is stored at %p.\n", z, &z);


The Frame Command

Recall that the frame is where automatic variables for the function are stored. Unless  you tell it otherwise, GDB is always in the context of the frame corresponding to the  currently executing function. Since execution is currently in display(), GDB is in the  context of frame 0. We can ask GDB to tell us which frame its context is in by giving the  frame command without arguments:

(gdb) frame
#0  display (z=10, zptr=0x7fffffffe464) at test.c:16
16          printf("   z is %d and is stored at %p.\n", z, &z);
(gdb)

I didn't tell you what the word "context" means; now I'll explain. Since GDB's context is
in frame 0, we have access to all the local variables in frame 0. Conversely, we don't
have access to automatic variables in any other frame. Let's investigate this. GDB's
print command can be used to give us the value of any variable within the current frame.
Since z and zptr are variables in display(), and GDB is currently in the frame for
display(), we should be able to print their values:

(gdb) print z
$1 = 10
(gdb) print zptr
$2 = (int *) 0x7fffffffe464
(gdb)


But we do not have access to automatic variables stored in other frames. Try to look at
the variables in main(), which is frame 1:

(gdb) print x
No symbol "x" in current context.
(gdb) print xptr
No symbol "xptr" in current context.
(gdb)

Now for magic. We can tell GDB to switch from frame 0 to frame 1 using the frame command
with the frame number as an argument. This gives us access to the variables in frame 1.
As you can guess, after switching frames, we won't have access to variables stored in
frame 0. Follow along:

(gdb) print x
$3 = 10
(gdb) print xptr
$4 = (int *) 0x7fffffffe464
(gdb) print z
No symbol "z" in current context.
(gdb) print zptr
No symbol "zptr" in current context.
(gdb)

By the way, one of the hardest things to get used to with GDB is seeing the program's output:

x is 10 and is stored at 0x7fffffffe464
xptr holds 0x7fffffffe464 and points to 10.


intermixed with GDB's output:
Starting program: /home/cprog/a.out
In main():
   x is 10 and is stored at 0x7fffffffe464.
   xptr points to 0x7fffffffe464 which holds 10.

Breakpoint 1, main () at test.c:10
10          display(x, xptr);
(gdb)

intermixed with your input to GDB:

(gdb) run

intermixed with your input to the program (which would've been present had we called some
kind of input function). This can get confusing, but the more you use GDB, the more you  get used to it. Things get tricky when the program does terminal handling (e.g. ncurses  or svga libraries), but there are always ways around it.

Exercises

Continuing from the previous example, switch back to display()'s frame. Verify that you  have access to automatic variables in display()'s frame, but not main()'s frame. Figure out how to quit GDB on your own. Control-d works, but I want you to guess the  command that quits GDB.

GDB has a help feature. If you type help foo, GDB will print a description of command  foo. Enter GDB (don't give GDB any arguments) and read the help blurb for all GDB  commands we've used in this section.

Debug "test" again and set a breakpoint anywhere in display(), then run the program. Figure
out how to display the stack along with the values of every local variable for each frame  at the same time. Hint: If you did the previous exercise, and read each blurb, this  should be easy.

Core Dump(in Unix parlance)/Memory Dump/System Dump:

Core dumps are often used to assist in diagnosing and debugging errors in computer programs or in operating system. In very simple way core dump is "storing of a large amount of raw data for further examination".Core dump is the contents of random access memory ( RAM ) at one moment in time. One can think of it as a full-length "snapshot" of RAM.

Typically, a core dump or actually the report that results from the core dump presents the RAM contents as a formatted series of lines that indicate memory locations and the hexadecimal values recorded at each location. Additional information tells exactly which instruction was executing at the time the core dump was initiated.

In computing, a core dump consists of the recorded state of the working memory of a computer program at a specific time, generally when the program has terminated abnormally (crashed). In practice, other key pieces of program state are usually dumped at the same time, including the processor registers, which may include the program counter and stack pointer, memory management information, and other processor and operating system flags and information.

Uses of core dumps :
  • Core dumps can serve as useful debugging aids in several situations. On early standalone or batch-processing systems, core dumps allowed a user to debug a program without monopolizing the (very expensive) computing facility for debugging; a printout could also be more convenient than debugging using switches and lights.


  • On shared computers, whether time-sharing, batch processing, or server systems, core dumps allow off-line debugging of the operating system, so that the system can go back into operation immediately.


  • Core dumps allow a user to save a crash for later or off-site analysis, or comparison with other crashes. For embedded computers, it may be impractical to support debugging on the computer itself, so analysis of a dump may take place on a different computer. Some operating systems such as early versions of Unix did not support attaching debuggers to running processes, so core dumps were necessary to run a debugger on a process's memory contents.


  • Core dumps can be used to capture data freed during dynamic memory allocation and may thus be used to retrieve information from a program that is no longer running. In the absence of an interactive debugger, the core dump may be used by an assiduous programmer to determine the error from direct examination. 
If a process fails, most operating systems write the error information to a log file to alert system operators or users that the problem occurred. The operating system can also take a core dump—a capture of the memory of the process— and store it in a file for later analysis. (Memory was referred to as the “core” in the early days of computing.) Running programs and core dumps can be probed by a debugger, which allows a programmer to explore the code and memory of a process.
 
Debugging user-level process code is a challenge. Operating-system kernel debugging is even more complex because of the size and complexity of the kernel, its control of the hardware, and the lack of user-level debugging tools. A failure in the kernel is called a crash.When a crash occurs, error information is saved to a log file, and the memory state is saved to a crash dump

Operating-system debugging and process debugging frequently use different tools and techniques due to the very different nature of these two tasks. Consider that a kernel failure in the file-system code would make it risky for the kernel to try to save its state to a file on the file system before rebooting. 

A common technique is to save the kernel’s memory state to a section of disk set aside for this purpose that contains no file system. If the kernel detects an unrecoverable error, it writes the entire contents of memory, or at least the kernel-owned parts of the system memory, to the disk area. When the system reboots, a process runs to gather the data from that area and write it to a crash dump file within a file system for analysis. Obviously, such strategies would be unnecessary for debugging ordinary user-level processes.

9 comments:

  1. A Computer Science portal for geeks. It contains well written, well thought and well explained computer science and programming articles, quizzes and practice/competitive programming/company interview Questions.
    website: geeksforgeeks.org

    ReplyDelete
  2. A Computer Science portal for geeks. It contains well written, well thought and well
    explained computer science and programming articles, quizzes and practice/competitive
    programming/company interview Questions.
    website: geeksforgeeks.org

    ReplyDelete
  3. A Computer Science portal for geeks. It contains well written, well thought and well
    explained computer science and programming articles, quizzes and practice/competitive
    programming/company interview Questions.
    website: geeksforgeeks.org

    ReplyDelete
  4. A Computer Science portal for geeks. It contains well written, well thought and well explained computer science and programming articles, quizzes and practice/competitive programming/company interview Questions.
    website: geeksforgeeks.org

    ReplyDelete
  5. A Computer Science portal for geeks. It contains well written, well thought and well explained computer science and programming articles, quizzes and practice/competitive programming/company interview Questions.
    website: geeksforgeeks.org

    ReplyDelete
  6. A Computer Science portal for geeks. It contains well written, well thought and well
    explained computer science and programming articles, quizzes and practice/competitive
    programming/company interview Questions.
    website: geeksforgeeks.org

    ReplyDelete
  7. A Computer Science portal for geeks. It contains well written, well thought and well
    explained computer science and programming articles, quizzes and practice/competitive
    programming/company interview Questions.
    website: geeksforgeeks.org

    ReplyDelete
  8. A Computer Science portal for geeks. It contains well written, well thought and well
    explained computer science and programming articles, quizzes and practice/competitive
    programming/company interview Questions.
    website: geeksforgeeks.org

    ReplyDelete
  9. A Computer Science portal for geeks. It contains well written, well thought and well
    explained computer science and programming articles, quizzes and practice/competitive
    programming/company interview Questions.
    website: geeksforgeeks.org

    ReplyDelete