Memory Layout of C Programs

In C program, there are four storage classes: automatic, register, external, and static.
Primary Storage : registers, cache, memory (Random Access Memory) 
Secondary Storage : magnetic and optical disk.

When a program is loaded into memory, it’s organized into three areas of memory, called segments:
  1. text/code segment
  2. Data segment
  3. Stack and heap
Memory organization of a typical program

1. Text Segment :
Compiled code of the program 
A text segment , also known as a code segment or simply as text, is one of the sections of a program in an object file or in memory, which contains executable instructions.

As a memory region, a text segment may be placed below the heap or stack in order to prevent heaps and stack overflows from overwriting it.

Usually, the text segment is shareable so that only a single copy needs to be in memory for frequently executed programs, such as text editors, the C compiler, the shells, and so on. Also, the text segment is often read-only, to prevent a program from accidentally modifying its instructions.

2. Data segment: 
Data segment is sub divided into two parts
– Initialized data segment: All the global, static and constant data are stored in the data segment.
– Uninitialized data segment: All the uninitialized data are stored in BSS.

A. Initialized Data Segment:
Initialized data segment, usually called simply the Data Segment. A data segment is a portion of virtual address space of a program, which contains the global variables and static variables that are initialized by the programmer.

Note that, data segment is not read-only, since the values of the variables can be altered at run time.

This segment can be further classified into initialized read-only area and initialized read-write area.

For instance the global string defined by char s[] = “hello world” in C and a C statement like int debug=1 outside the main (i.e. global) would be stored in initialized read-write area. And a global C statement like const char* string = “hello world” makes the string literal “hello world” to be stored in initialized read-only area and the character pointer variable string in initialized read-write area.

Ex: static int i = 10 will be stored in data segment and global int i = 10 will also be stored in data segment
 
B. Uninitialized Data Segment:
Uninitialized data segment, often called the “bss” segment, named after an ancient assembler operator that stood for “block started by symbol.” Data in this segment is initialized by the kernel to arithmetic 0 before the program starts executing

uninitialized data starts at the end of the data segment and contains all global variables and static variables that are initialized to zero or do not have explicit initialization in source code.

For instance a variable declared static int i; would be contained in the BSS segment.
For instance a global variable declared int j; would be contained in the BSS segment.
3. Stack:
The stack area traditionally adjoined the heap area and grew the opposite direction; when the stack pointer met the heap pointer, free memory was exhausted. (With modern large address spaces and virtual memory techniques they may be placed almost anywhere, but they still typically grow opposite directions.)

The stack area contains the program stack, a LIFO structure, typically located in the higher parts of memory. On the standard PC x86 computer architecture it grows toward address zero; on some other architectures it grows the opposite direction. A “stack pointer” register tracks the top of the stack; it is adjusted each time a value is “pushed” onto the stack. The set of values pushed for one function call is termed a “stack frame”; A stack frame consists at minimum of a return address.

Stack, where automatic variables are stored, along with information that is saved each time a function is called. Each time a function is called, the address of where to return to and certain information about the caller’s environment, such as some of the machine registers, are saved on the stack. The newly called function then allocates room on the stack for its automatic and temporary variables. This is how recursive functions in C can work. Each time a recursive function calls itself, a new stack frame is used, so one set of variables doesn’t interfere with the variables from another instance of the function.
 
Heap:
Heap is the segment where dynamic memory allocation usually takes place.

The heap area begins at the end of the BSS segment and grows to larger addresses from there.The Heap area is managed by malloc, realloc, and free, which may use the brk and sbrk system calls to adjust its size (note that the use of brk/sbrk and a single “heap area” is not required to fulfill the contract of malloc/realloc/free; they may also be implemented using mmap to reserve potentially non-contiguous regions of virtual memory into the process’ virtual address space). The Heap area is shared by all shared libraries and dynamically loaded modules in a process. 

In sort as a summary :

local variables can be stored either on the stack or in a data segment depending on whether they are auto or static. (if neither auto or static is explicitly specified, auto is assumed)

global variables are stored in a data segment (unless the compiler can optimize them away, see const) and have visibility from the point of declaration to the end of the compilation unit.

static variables are stored in a data segment (again, unless the compiler can optimize them away) and have visibility from the point of declaration to the end of the enclosing scope. Global variables which are not static are also visible in other compilation units (see extern).

auto variables are always local and are stored on the stack.
the register modifier tells the compiler to do its best to keep the variable in a register if at all possible. Otherwise it is stored on the stack.

extern variables are stored in the data segment. The extern modifier tells the compiler that a different compilation unit is actually declaring the variable, so don't create another instance of it or there will be a name collision at link time.

const variables can be stored either on the stack or a read only data segment depending on whether they are auto or static. However, if the compiler can determine that they cannot be referenced from a different compilation unit, or that your code is not using the address of the const variable, it is free to optimize it away (each reference can be replaced by the constant value). In that case it's not stored anywhere.

the volatile modifier tells the compiler that the value of a variable may change at anytime from external influences (usually hardware) so it should not try to optimize away any reloads from memory into a register when that variable is referenced. This implies static storage.

No comments:

Post a Comment