ONLamp.com    
 Published on ONLamp.com (http://www.onlamp.com/)
 See this if you're having trouble printing code examples


Modern Memory Management

by Howard Feldman
10/27/2005

Computer memory is perhaps the most rapidly changing component in computer systems. Back in the early '80s, 1K memory chips were quite common in the first home microcomputers. Now, 25 years later, the average home computer may contain memory modules of 1GB, representing growth by a factor of 1 million times, compared with only a 1,000-fold growth in CPU speed.

Despite this enormous increase in memory capacity, many of the problems that exist on today's machines are the same as those of their early predecessors--namely, running out of memory. The more RAM that is available to a software developer, the more the developer tends to use. Certainly techniques of memory conservation have changed with the times. In the early days of computing, programmers used clever tactics to tighten code loops or reuse blocks of memory at the expense of code readability, in order to cram impressive amounts of code into tiny amounts of RAM. Today, code size is rarely a concern anymore, and attention has shifted to storing large amounts of data--images, video, audio, and tables--but little else has changed. The articles in this series deal with memory management issues on modern Unix-like systems. Most of the topics apply to all computers and operating systems.

This article, the first in the series, discusses the Unix dynamic memory allocation system along with the concept of memory segmentation. It also reviews the utilities top and ulimit, giving special attention to their role in memory management.

Related Reading

Secure Programming Cookbook for C and C++
Recipes for Cryptography, Authentication, Input Validation & More
By John Viega, Matt Messier

Memory management is an important concept to grasp regardless of which programming language you use. You must be most careful with C, where you control all memory allocation and freeing. Languages such as C++, Java, Perl, and PHP take care of a lot of the housekeeping automatically. Nevertheless, all of these languages and others can allocate memory dynamically, and thus the following discussion applies to them all.

malloc

The majority of programming languages ultimately end up using a single system call to allocate memory, malloc. malloc is part of the main C library, but most other languages internally end up calling malloc to reserve blocks of memory as well. The process works as follows:

This brings up two important points. First is the concept of memory fragmentation. That is to say, malloc always returns memory in linear blocks. This may sound odd at first--why not use all the memory available, regardless of the address? Remember, though, it returns only a single pointer to the memory block. If it were not one big contiguous block, how would you know when to stop reading or writing sequentially and jump to the next block? The CPU would have to do a lot of extra work to give the illusion that such a returned block were still contiguous if it in fact were not.

For example, consider a machine with 1,000MB of RAM. You request three blocks of 300MB each and get back pointers starting at 0MB, 350MB, and 700MB. This leaves free a block from 300MB to 350MB, and 650MB to 700MB, for a total of 100MB free, as you would expect. However, any further request for a block larger than 50MB will fail, as this is the largest contiguous piece available. Only after freeing one of the earlier 300MB blocks could you request something bigger than 50MB. Thus it is not enough that memory be available; it must be in a contiguous block.

To avoid problems with memory fragmentation, try to avoid making many small memory allocations and instead replace them with one or two large allocations where possible.

Second, note that freeing a block of memory does little more than mark it as unallocated in malloc's internal allocation tables. The contents of the memory generally remain intact, although with C programs compiled in debug mode, for example, you can have memory zeroed automatically after freeing to aid in debugging. Often freed pointers will contain a value of 0xDEADBEEF or some such clever mnemonic to help the programmer identify such pointers easily when debugging. When building release code, however, these cleanup tasks are not present--mainly because they slow down the program significantly. This leaves it up to the programmer to clean up after himself.

It is impossible to tell whether a block of memory has been freed just by looking at it. You must keep track of that yourself. An additional point is that malloc does not normally return the freed memory to the operating system; it remains owned by the process until it terminates. The process can reuse it the next time it requests more memory, but other programs will not have access to it, even if no other memory is available. As a corollary, then, the memory footprint of a program is the size of the largest allocation(s) made at any one time. Thus it is always wise to free objects you do not need, especially large ones as soon as possible, to minimize this footprint.

There are also alternatives to malloc. For example, a program linked to the mapmalloc library will return dynamically allocated memory to the operating system when it calls free. While no change in the code is necessary, the price is that memory allocation takes about five times longer.

Memory Segmentation

As far back as the days of 8-bit computing, CPUs have always contained a few critical registers. These include the accumulator, where mathematical operations actually occur; the program counter (PC), which points to the command being executed; some index registers or data counters (DCs), which refer to data tables or variables in memory; and the stack pointer (SP), which always points to the top of the stack, a last-in, first-out (LIFO) data structure used for temporary storage.

The 8088 (IBM XT) CPU contained four special 16-bit registers called segment registers: code segment (CS), data segment (DS), stack segment (SS), and extra segment (ES). The name segment refers to memory access in real mode. A 20-bit address, allowing access to up to 1MB of RAM, came from a 16-bit segment and a 16-bit index to give the final address, by multiplying the segment by 16 and adding the index to the result. Thus each segment address corresponded to a single (overlapping) 64K chunk of memory (plenty for that time). The idea was that CS would always point to the 64K block that contained the machine code program being executed, DS to the 64K block where the program stored variables and temporary data, and SS to the program stack. ES was usable for other purposes, such as pointing to a second data area. The segment registers could of course change during the execution of the program if needed, but until programs using more than 256K became common, this was largely unnecessary.

While modern CPUs have far more registers, most of which are 32-bit or more, the concept of memory segmentation has not really changed all that much. Compiled programs still separate code from data from the stack, though the programmer may never see this. Nevertheless, understanding this separation and exactly how it works can be critical in debugging code, as well as writing portable applications.

Consider for a moment a program to be written on a dedicated system; for example, one to control all the automatic doors in an airport. Such programs generally read inputs from a number of sensors--in this case, they activate when someone approaches a door. The program must then respond to the input by activating some external device, a motor perhaps, to open the door. The program will normally run from ROM or an EPROM chip--these are cheap, and robust because the program stays in memory even when power is shut off. However, the program must store temporary data, including readings from the sensors, in writable memory--RAM. And so must the stack, as it is a writable object. In this situation, there is a true physical separation between program code and data. Any attempt to write data to the code segment, stored in a read-only medium, will fail. This was at least one early motivation for enforcing a strict separation between code and data.

The only real nomenclature change for contemporary systems is that the data segment, which is often the largest block of memory required, is called the heap. When malloc divvies up memory and returns it to your program, this memory comes off the heap. The compiled code still resides in its own section of memory, separate from the heap. The function of the stack has not changed in 30 years, either--its function is to pass arguments to subroutines, store the return address for subroutine calls, and store temporary values, such as the registers just before a CPU interrupt occurs. Often the stack and heap share the same memory space but, starting at opposite ends, grow toward each other. The stack has a preset size limit (sometimes customizable). It is possible to overflow the stack, which is almost always fatal to the program. In fact, buffer overflow security vulnerabilities caused by destroying the stack are the subject of many other articles.

How can writing to the code segment become a problem? Consider the following short C program:

int main(void)
{
    char *buf;

    /* warning: buggy code: do not copy and paste */
    buf = "hello";
    printf(buf);
    strcpy(buf,"world");
    printf(buf);
    return 0;
}

It is simple enough to follow, and the expected output might be helloworld. However, run it in the debugger (in this case, on a Solaris machine) and examine the variables a bit more closely. Upon initialization, buf = 0xffbffa60 points to some random initial memory location--&buf = 0xffbffa58. That is, the actual character pointer variable is stored at this location in memory, which must be somewhere in the heap. The initial value of buf is not so random after all, but being in debug mode, this is to be expected.

Watch what happens after assigning buf the value hello, however. Now printing the contents of buf reveals that buf = 0x00020898--not in the heap, but way on the other end of memory in the code segment! This makes sense, because, as the next article in the series will show, the code segment actually stores constants such as hello. The code here made buf to point to it. So far so good; nothing bad has happened ... yet. The code then prints out buf, and again there is no trouble here.

The code then copies the next word to print into buf. Did you catch it? Granted, this certainly is not the recommended way to code, but the second word is the same length as the first, so there are no troubles in terms of writing past the end of the buffer--just remember the address buf contains. The strcpy command attempts to write 5 bytes to 0x00020898, an address in the code segment--which is potentially ROM! This is a big no-no. If this code were really stored in a ROM chip what would happen? The strcpy would effectively be a nonoperation, because writes to read-only memory make no sense, and the succeeding printf would print hello a second time.

Of course, you should always allocate string buffers in C as character arrays of a fixed size (if known) or else dynamically with malloc, never as shown in the previous example. That code is merely to illustrate how a seemingly correct program can produce unexpected results. A corrected program might be:

int main(void)
{
    char buf[6];

    strcpy(buf,"hello");
    printf(buf);
    strcpy(buf,"world");
    printf(buf);
    return 0;
}

top and ulimit

The most universal method for checking memory usage in Unix is probably top. This useful utility is available on most operating systems, and displays several important statistics about each process running on the machine. The memory-related ones are specifically RSS/RES and SIZE (though these names may vary slightly). The RSS column stands for resident set size, and indicates the amount of physical RAM being used by the program at a given time. SIZE, on the other hand, gives the total RAM usage, including code, heap, and stack. When the numbers differ, the missing data is actually stored in swap memory, on disk. This is usually memory that has gone unaccessed for a while and that will not consume physical memory. Normally the OS decides automatically which objects to swap in and out of virtual memory. As mentioned earlier, freed memory does not return to the heap until a program terminates, and as a result, memory use indicated by SIZE will never decrease throughout the extent of a program. It only ever goes one way--up.

A handy method for controlling memory use by processes under Unix is the ulimit command. With it you can set the maximum size of a process's data segment, RSS, open files, stack size, and more. You can specify hard limits (which you cannot change once set) and soft limits (adjustable up to a maximum value determined by the hard limit). Limiting memory usage can help avoid runaway processes from eating up memory, and it is often a good safeguard to have while debugging a program.

It may be useful to increase the stack size if running a highly recursive algorithm, because each recursive call pushes more data onto the stack. Another good time to use ulimit is when running a web server, where a CGI process may run hundreds of times, once for each connection allowed by the server. In this case, it is wise to limit per-process memory usage to avoid accidentally exhausting the memory of the server. Set the limit to the amount of RAM in the server divided by maximum number of connections permitted by the server, leaving some RAM for other processes too. If this limit is less than the peak RSS of an active process, reduce the number of connections permitted to the web server accordingly.

A useful way to reduce the memory footprint of a CGI is to use a technique such as streaming or chunking. The exact implementation may vary depending on the task, but essentially this entails breaking the problem into smaller pieces, working on one piece at a time, and freeing each piece before the next is started. In this way, you only ever need enough memory to hold a single piece. A simple example would be a CGI that needs to read a 100MB file from disk and return it to the user. Instead of reading it all in at once, then writing it to stdout using a full 100MB of RAM, the program can read 1MB at a time, write this to stdout, and free it before reading the next chunk. This limits the memory usage to only 1MB. The size of the chunks is usually adjustable, giving complete and flexible control over how much memory each process will require, while not limiting the size of data that the program can manage.

Summary

This article has discussed the fundamentals of memory management in PCs, and how they have changed over the past 30 years since the inception of the microcomputer. While memory capacity has increased drastically, the need to manage its use efficiently has not changed. We have simply started applying computers to much bigger problems to take advantage of all the available RAM.

To understand how to detect and remedy memory management issues, it is necessary to understand the storage of code and data in memory. The next article in the series will cover the storage and representation of variables in memory and will show how to use this knowledge to identify certain errors that are extremely difficult to debug without such knowledge. It will also cover swap memory, shared memory, and memory leaks in detail.

Howard Feldman is a research scientist at the Chemical Computing Group in Montreal, Quebec.


Return to ONLamp.com.

Copyright © 2009 O'Reilly Media, Inc.