ONLamp.com
oreilly.comSafari Books Online.Conferences.

advertisement


Modern Memory Management, Part 2

by Howard Feldman
11/23/2005

My previous article discussed the layout of memory during program execution in modern computers. This article continues the discussion, looking at how programming languages store and access variables in memory. It also delves into swap memory, shared memory, and memory leaks.

Regardless of the programming language you use, you will use variables--a program would be quite inflexible without them. While called differently in different languages, all have some concept of global variables, available and accessible anywhere in the code, and local variables, which have a limited scope. Usually there is some concept of a define or constant as well, with a value that does not change and some sort of compile-time processing.

Constants

Constant symbols include things such as #define in C, Java literals, string constants such as string msg="hello";, and even numbers such as the 2 in x = x + 2. Note that the preprocessor reads #defines in C, which then goes through the code and replaces literally each instance of the defined symbol with its value--the compiler never sees them, and so for all intents and purposes, these are the same as typed constants.

While not variables, this important type of program data still must be stored somewhere. As demonstrated in the previous article, constant variables are stored in the code segment of memory, right along with the program instructions. As such, they should be considered read-only.

Local Variables

A local variable has a limited scope--it is automatically destroyed once that scope ends. Often, this will be the duration of one function. You can also think of arguments passed to a function as local to that function, except when passed by reference, because you're really passing only temporary copies of the variables. Local variables are the most commonly used variables, so it is important to understand their storage in memory. Due to their transient nature, they are stored on the stack. This is convenient, because it allows them to bypass the time-consuming overhead of using malloc and the heap to obtain their storage locations. When their scope ends, all the computer needs to do is to pop them off of the stack; this effectively destroys them. Even if several scopes are nested, local variables will be instantiated in a LIFO manner. The innermost scope will always end before outer ones, making a stack ideal for supplying this temporary storage.

Recall that return addresses for function calls are stored on the stack as well. Because all local variables (including function arguments) will be destroyed by popping them off the stack when the function terminates, the return address will then be at the top of the stack ready to be popped off. There is, however, one undesirable side effect of using the stack for variable storage--the stack is of limited size. As a result, large local variables can overflow the stack, bringing the program to an abrupt halt. Consider the following C program to sum 2,000,000 random numbers and print out the result:

#define BUFSZ 2000000

double GetSum(void)
{
    double num[BUFSZ];
    int i;
    double sum = 0.0;

    for (i = 0; i < BUFSZ; i++) {
            num[i] = random();
    }
    for (i = 0; i < BUFSZ; i++) {
            sum += num[i];
    }
    return sum;
}

int main(void)
{
    printf("%f\n", GetSum());
    return 0;       
}

When executed on a 64-bit Solaris machine with the stack size set with ulimit to 8MB, a segmentation fault occurs. Specifically:

signal SEGV (no mapping at the fault address) in GetSum at line 8:
    8           double sum=0.0;
dbx: read of 4 bytes at address fecbd5a0 failed -- Error 0

If you reduce the array size to 1,000,000, the program works fine. Because a double uses eight bytes, the smaller size does not exceed the 8MB stack size. Similarly, doubling the stack size would avoid the crash. Note how the error message seems to indicate a problem in the allocation of the sum variable, although it is not at all clear from the error what is really happening here. To avoid overflowing the stack like this, always allocate large local variables dynamically, using malloc, forcing their storage to come from the heap.

In the previous example, instead declare num as double *num; and then allocate it with num = (double *) malloc (BUFSZ* sizeof(double));. Then be sure to call free() when you finish with num. Alternatively, you could declare it static (in C) or global, so that it is not stored on the stack, and not destroyed when the function exits (see below).

Secure Programming Cookbook for C and C++

Related Reading

Secure Programming Cookbook for C and C++
Recipes for Cryptography, Authentication, Input Validation & More
By John Viega, Matt Messier

Pages: 1, 2, 3

Next Pagearrow





Sponsored by: