Modern Memory Managementby Howard Feldman
Computer memory is perhaps the most rapidly changing component in computer systems. Back in the early '80s, 1K memory chips were quite common in the first home microcomputers. Now, 25 years later, the average home computer may contain memory modules of 1GB, representing growth by a factor of 1 million times, compared with only a 1,000-fold growth in CPU speed.
Despite this enormous increase in memory capacity, many of the problems that exist on today's machines are the same as those of their early predecessors--namely, running out of memory. The more RAM that is available to a software developer, the more the developer tends to use. Certainly techniques of memory conservation have changed with the times. In the early days of computing, programmers used clever tactics to tighten code loops or reuse blocks of memory at the expense of code readability, in order to cram impressive amounts of code into tiny amounts of RAM. Today, code size is rarely a concern anymore, and attention has shifted to storing large amounts of data--images, video, audio, and tables--but little else has changed. The articles in this series deal with memory management issues on modern Unix-like systems. Most of the topics apply to all computers and operating systems.
This article, the first in the series, discusses the Unix dynamic memory allocation system along with the concept of memory segmentation. It also reviews the utilities
ulimit, giving special attention to their role in memory management.
Memory management is an important concept to grasp regardless of which programming language you use. You must be most careful with C, where you control all memory allocation and freeing. Languages such as C++, Java, Perl, and PHP take care of a lot of the housekeeping automatically. Nevertheless, all of these languages and others can allocate memory dynamically, and thus the following discussion applies to them all.
The majority of programming languages ultimately end up using a single system call to allocate memory,
malloc is part of the main C library, but most other languages internally end up calling
malloc to reserve blocks of memory as well. The process works as follows:
- The program requests a block of memory from
mallocof size n bytes.
- If a contiguous block of memory of n bytes is not available,
mallocmarks the block as occupied in its allocation tables and returns a pointer to the memory.
- The calling program uses the memory as needed.
- The calling program calls
free, with the pointer returned by
mallocas its argument.
malloclooks up the pointer in its allocation tables and, if it finds it, removes it from the tables, effectively marking it as freed.
This brings up two important points. First is the concept of memory fragmentation. That is to say,
malloc always returns memory in linear blocks. This may sound odd at first--why not use all the memory available, regardless of the address? Remember, though, it returns only a single pointer to the memory block. If it were not one big contiguous block, how would you know when to stop reading or writing sequentially and jump to the next block? The CPU would have to do a lot of extra work to give the illusion that such a returned block were still contiguous if it in fact were not.
For example, consider a machine with 1,000MB of RAM. You request three blocks of 300MB each and get back pointers starting at 0MB, 350MB, and 700MB. This leaves free a block from 300MB to 350MB, and 650MB to 700MB, for a total of 100MB free, as you would expect. However, any further request for a block larger than 50MB will fail, as this is the largest contiguous piece available. Only after freeing one of the earlier 300MB blocks could you request something bigger than 50MB. Thus it is not enough that memory be available; it must be in a contiguous block.
To avoid problems with memory fragmentation, try to avoid making many small memory allocations and instead replace them with one or two large allocations where possible.
Second, note that freeing a block of memory does little more than mark it as unallocated in
malloc's internal allocation tables. The contents of the memory generally remain intact, although with C programs compiled in debug mode, for example, you can have memory zeroed automatically after freeing to aid in debugging. Often freed pointers will contain a value of
0xDEADBEEF or some such clever mnemonic to help the programmer identify such pointers easily when debugging. When building release code, however, these cleanup tasks are not present--mainly because they slow down the program significantly. This leaves it up to the programmer to clean up after himself.
It is impossible to tell whether a block of memory has been freed just by looking at it. You must keep track of that yourself. An additional point is that
malloc does not normally return the freed memory to the operating system; it remains owned by the process until it terminates. The process can reuse it the next time it requests more memory, but other programs will not have access to it, even if no other memory is available. As a corollary, then, the memory footprint of a program is the size of the largest allocation(s) made at any one time. Thus it is always wise to free objects you do not need, especially large ones as soon as possible, to minimize this footprint.
There are also alternatives to
malloc. For example, a program linked to the
mapmalloc library will return dynamically allocated memory to the operating system when it calls
free. While no change in the code is necessary, the price is that memory allocation takes about five times longer.
As far back as the days of 8-bit computing, CPUs have always contained a few critical registers. These include the accumulator, where mathematical operations actually occur; the program counter (PC), which points to the command being executed; some index registers or data counters (DCs), which refer to data tables or variables in memory; and the stack pointer (SP), which always points to the top of the stack, a last-in, first-out (LIFO) data structure used for temporary storage.
The 8088 (IBM XT) CPU contained four special 16-bit registers called segment registers: code segment (CS), data segment (DS), stack segment (SS), and extra segment (ES). The name segment refers to memory access in real mode. A 20-bit address, allowing access to up to 1MB of RAM, came from a 16-bit segment and a 16-bit index to give the final address, by multiplying the segment by 16 and adding the index to the result. Thus each segment address corresponded to a single (overlapping) 64K chunk of memory (plenty for that time). The idea was that CS would always point to the 64K block that contained the machine code program being executed, DS to the 64K block where the program stored variables and temporary data, and SS to the program stack. ES was usable for other purposes, such as pointing to a second data area. The segment registers could of course change during the execution of the program if needed, but until programs using more than 256K became common, this was largely unnecessary.
Pages: 1, 2