LinuxDevCenter.com
oreilly.comSafari Books Online.Conferences.

advertisement


Animation in SDL: OpenGL
Pages: 1, 2

Memory Usage

The amount of video memory built into a video card is a scarce resource. If your program asks for an OpenGL visual that requires more memory than is available on the video card, the request will fail. This can happen even if the card supports the visual you asked for, because you might ask for a display size that is too large to fit. To help you pick a memory budget, the following describes the effect each OpenGL attribute has on memory usage.



  • Number of Pixels. The width in pixels times the height in pixels. This number tends to be rather large. if I'm talking about a 640x480 video mode then the number of pixels is 307,200, if I'm talking about a 1600x1200 video mode then it is 1,920,000. This value is used as a multiplier to compute the size of a buffer.

  • Buffer Depth. The elements in each buffer you ask for have a specific size. It might be 1 bit, it might be 32 bits. The size of the buffer is usually the buffer depth times the number of pixels. The result can be a large number of bytes of memory.

  • Selected Buffers. There are several different buffers you can ask for. You can have multiple frame buffers, stencil buffers, Z (depth) buffers and so on. Using more buffers uses more memory.

  • Rendering Attributes. Using full-scene antialiasing multiplies the amount of memory needed for rendering.

  • Your Video Card. The video card hardware arranges different buffers in memory to make drawing as fast as possible. That means that if you use one kind of buffer, you may get the memory for another type of buffer, even if you don't use it. The most common examples are the alpha and stencil buffers. These buffers are built into the color and depth buffers in some modes. It doesn't matter if you use them; you pay for them anyway.

When thinking about video memory usage, many programmers fail to consider how much video memory is already in use before the program starts. The graphics processor can work with data that is in video memory much faster than it can work with the computer's main memory. The OS and/or windowing system will try to take advantage of that speed by moving every graphic resource, from fonts to images, into video memory. Competition for video memory can cause your program to slow down or stall at odd times. It is a good idea to decide on a specific minimum video memory size, and both design to that size and regularly test on a machine with that amount of video memory.

Now let's take a look at each attribute and see how using it can affect the total video memory used by your program.

  • SDL_GL_RED_SIZE, SDL_GL_GREEN_SIZE, SDL_GL_BLUE_SIZE, and SDL_GL_ALPHA_SIZE specify the size of the color and alpha fields in a pixel. The sum of these sizes must be less than or equal to the buffer size of a visual. These values set the minimum possible depth for the color buffer.

  • SDL_GL_BUFFER_SIZE is the depth of the color buffer in bits. Divide by 8 to get bytes and multiply by the number of pixels to find out how much memory you need to store the color information for your application.

    This size may or may not include the size of the alpha buffer. It you look at one of the 16-bit visuals listed above, you see that they have space reserved for red, green, and blue, but not for alpha, while the 32-bit visuals include 24 bits for red, green, and blue, and 8 bits for alpha. There are 16-bit color modes that include space for alpha and there are 24-bit color modes that do not. Any time you get 24 bits of color in a 32-bit buffer size you are paying the cost for the alpha buffer, whether you use it or not. The extra space is added to make pixels line up on 16- or 32-bit boundaries.

  • SDL_GL_DOUBLEBUFFER. Double buffering is required for smooth animation, but it requires two display buffers and doubles the memory used for color information.

  • SDL_GL_STEREO. Stereo viewing is used in virtual reality and design systems. Stereo viewing works by showing one image to the left eye and a different image to the right eye. That means you need a color buffer for each eye. If you want to do smooth stereo animation, then you need a back buffer and a front buffer for each eye. Smoothly animating stereo requires four buffers and four times the memory of a single buffered application.

  • SDL_GL_DEPTH_SIZE and SDL_GL_STENCIL_SIZE. I lump the depth (Z) buffer and the stencil (S) buffer together because they are closely related. In computer graphics literature, you will often find these buffers referred to as the ZS buffer rather than as two separate buffers. Looking at the example visuals listed above you see that there are visuals with a 16-bit Z buffer and no stencil buffer, and there are visuals with a 24-bit Z buffer and an 8-bit stencil buffer. There are two things going on here. Both the Z buffer and the stencil buffer have values that have to be read and tested before a color value can be written. Since they have to be read, hardware designers work to make the read as fast as possible. In the 16-bit case, they can put two 16-bit Z values in a 32-bit word and read them two at a time. Or, they can put an 8-bit stencil value and a 24-bit Z value together and read a value from each buffer with only one read. Either way, you get a speedup by packing to values into a single 32-bit word.

    The size of the Z, or ZS, buffer is the number of pixels times either 16 or 32 bits. For all of the example visuals, it is the same size as a color buffer. There is no rule that forces that to be the case, but I've never seen it any other way. I'm not sure if that is the result of design tradeoffs or simply that no one expects a 24-bit Z buffer with a 16-bit color buffer, or vice versa.

    The next question is: how many ZS buffers do you get when you ask for one? Normally you only draw into one color buffer at a time, so you only need a single ZS buffer. But there have been video cards that allocated a ZS buffer with each color buffer. Programs written to use a single ZS buffer will work if there are multiple ZS buffers, but the reverse is not true. Poking around in the documentation for my video card, I found out how to configure it to allocate a ZS buffer for both the front and back color buffers. The moral of the story is that using a ZS buffer may use much more memory that you expect. The total memory used by ZS buffers may be equal to the number of color buffers times the size of a ZS buffer.

  • SDL_GL_MULTISAMPLESAMPLES and SDL_GL_MULTISAMPLEBUFFERS. Multisampling is a technique used to implement full-scene antialiasing. The simplest way to implement full-scene antialiasing is to render the scene at a higher resolution than the final image and then average the colors of adjacent pixels in the sample buffer to get the value of the antialiased pixels for the final image. That is how multisampling works. The size of each multisample buffer is roughly the number of samples times the size of a color buffer. For example, if the color buffer is 640 by 480 pixels, then a 4x multisample buffer is 1280 by 960 pixels. To render an image, it is first drawn into the large multisample buffer, and then every 2-by-2 piece of the multisample buffer is averaged to generate a pixel in the final image. The ZS buffer has to be the same size as the multisample buffer, because you have to have depth and stencil values for each pixel in the multisample buffer. Using multisampling adds the memory needed for a multisample buffer and the increased size of the ZS buffer.

  • SDL_GL_ACCUM_RED_SIZE, SDL_GL_ACCUM_GREEN_SIZE, SDL_GL_ACCUM_BLUE_SIZE, and SDL_GL_ACCUM_ALPHA_SIZE. Use these attributes to ask for an accumulation buffer. An accumulation buffer has the same number of pixels as a color buffer, but each pixel has more color information. Typically, an accumulation buffer has 16, or more, bits of red, green, blue, and alpha. That means that each pixel is at least 64 bits long. Accumulation buffers are used for graphic effects, such as motion blur, that require more color precision than is available in 32-bit or 16-bit color buffers. These effects are done by summing, or accumulating, a series of color values. The accumulation buffer has large color fields so that the sums will not overflow. The size of an accumulation buffer is the number of pixels in the buffer times the pixel size, just like a normal color buffer, except that the pixels are much bigger.

Attributes and Speed

The maximum performance of a video card is determined by how fast it can read from and write to video memory. That speed is controlled by two factors: the raw speed of the memory, and the width of the memory bus. The raw speed of the memory sets the read/write time, and the width of the bus is how many bits can be read or written at a time. The total memory bandwidth is the speed in reads or writes per second multiplied by the number of bits that can be read or written at one time. That means that if the memory can be written 100 million times per second (10-nanosecond write time) and the bus is 256 bits wide, then the system-write bandwidth is 25.6 billion bits per second. If a pixel is 32 bits wide, then this system can write 8 pixels at a time and write 800 million pixels per second. That number is the absolute limit to the number of pixels that can be written to this hypothetical video memory in a single second. I'm skipping over a lot of details, but no matter what designers do to get around this limit, at some point the architecture is limited by the bandwidth of the video memory chips.

The attributes you select control the amount of video memory bandwidth that is needed to draw a pixel on the screen. Let's look at a few cases in terms of a 256-bit memory bus. Note that to write a single pixel on such a bus, you may have to read 256 bits, modify the bits that represent your pixel, and write it back. But if you are writing more than one pixel, such as when a polygon is being filled, the odds are in your favor that you will be able to write more than one pixel per write. And, if you are lucky, you will be able to write a whole 256 bits' worth of pixels and not have to do a read at all. Most game program draw rather large polygons, so they get lucky most of the time. The following describes the effect that using different buffers has on graphics performance. The examples ignore the effect of texture- and bump-mapping. Using texture mapping and bump mapping increases the number of reads that have to be done for each pixel that is drawn.

  • Buffer Depth. A 16-bit pixel is half the size of a 32-bit pixel, which means that each write to memory can move twice as many 16-bit pixels as 32-bit pixels, which means you can draw twice as fast. The same is true for the other buffers; smaller is faster.

  • Alpha Blending. When you use an alpha buffer, each new pixel is a combination of a new pixel and the pixel you are drawing. That means that you must do a read/modify/write operation for each pixel that is drawn. The speed effect is most visible when you are drawing large polygons. If you weren't using alpha, you wouldn't have to read most of the pixels on the inside of a polygon. Using alpha forces you read every pixel. If you don't need it, don't use it.

  • Depth and Stencil. Using these buffers hits you twice. When you are using either or both of these buffers, the graphic processor must read and compare the stored value against a new value before a pixel can be written. Then, when a pixel is written, a Z value must also be written, along with the color value. Essentially, using these buffers doubles the cost of drawing a pixel. You have to read twice as much data for each pixel you might change, and if you change it, you have to write twice as much data per pixel.

  • Multisample Buffer. As described above, the multisample buffer has several times the number of pixels that a normal color buffer has. If you have twice the number of pixels to draw, it takes twice as long to draw them. Once they are drawn, the pixels have to be averaged to generate the final image. (The averaging can, and often is, done on the output side of the video system so that it appears to be very fast, even free.) To estimate the effect on the speed of your program, just divide its speed without multisampling turned on by the number of samples you want to use. The actual slowdown depends on details of the implementation on specific video cards.

  • Accumulation Buffer. Smaller is faster, bigger is slower. If the pixels in the accumulation buffer are twice as big as normal pixels, drawing into it has to be half as fast.

The proceeding list only considers speed effects that result from memory bandwidth limitations. There are architectural effects that also come into play. For example, an architecture designed for using the Z buffer can mask the size effects of the Z buffer by placing it in a special block of memory that can be read and written in parallel with the color buffers. Or, a system designed to draw polygons may be surprisingly slow at drawing lines. There are many tradeoffs that have to be made. The actual performance of any given video card depends on a combination of its target market, target price, and the technology available at the time it was designed.

Create the OpenGL Surface

Once the OpenGL attributes are set the way we want, then we call SDL_SetVideoMode() to create the OpenGL surface. When you specify the SDL_OPENGL flag, the bits-per-pixel parameter is ignored; that information is instead provided by OpenGL attributes. The same is true for the SDL_DOUBLEBUF flag. The size parameters are used to specify the size of the display. The other SDL video mode flags work as expected.

if (NULL == (screen = SDL_SetVideoMode(640, 480, 0, SDL_OPENGL)))
{
    printf("Can't set OpenGL mode: %s\n", SDL_GetError());
    SDL_Quit();
    exit(1);
}

Screen Flipping

The last thing we need before we can start animating is a way to swap the buffers. For SDL hardware and software buffers, we use SDL_Flip(). For OpenGL, we use SDL_GL_SwapBuffers().

I keep running into people who are confused by the way buffer swapping interacts with the video display. The video display hardware is constantly reading the contents of video memory and converting it to a video signal that your monitor then turns into the pattern of colored light that you see on the screen. The process of painting an image on the screen takes time. At 85 frames per second, it takes just under 12 milliseconds to draw the frame on your screen. The process is broken up into several phases, but the ones we are interested in are the frame time and the vertical retrace period. The frame time is the length of time from when the hardware starts displaying the current image on the screen until it starts display the next image on the screen. The vertical retrace period is a brief period at the end of the frame time when the video system has finished displaying one image but hasn't started displaying the next image.

If we change the display buffer during the frame time, the hardware will display part of the front buffer at the top of the screen and part of the back buffer at the bottom of the screen. Splitting the image like that causes a visual effect called tearing. We want the buffers to switch during the vertical retrace period so we never see parts of two frames on the screen at the same time. That means that when we measure the frames per second rate of a program it should never be faster than the vertical refresh rate of the monitor. Sadly, that is almost never what we see.

The problem is that some OpenGL drivers perform buffer swaps during the vertical retrace period and some do not. Others can be configured by the customer to synchronize or not. That means that a program that runs at 80 frames per second on one computer will run at 800 frames per second on an identical computer with a tiny change in the configuration of the video card. Of course, on the second computer, you will never see most of the frames that were drawn but they will be drawn. This difference in frames-per-second speed drives people nuts, because they don't understand what causes it.

The confusion is worsened by the manufacturer's habit of configuring cards to ignore vertical retrace. Why do they do that? My guess is that they know that if, given the choice between buying a video card that claims to run your favorite video game at 1,000 frames per second and one that only claims to draw 85 frames per second, most people will buy the 1,000 frames per second card. More must be better, right?

OpenGL Extensions

Hardware vendors have added many extensions to OpenGL. Some extensions were added as standard extensions by the ARB, and others were added by specific vendors. Many of the vendor extensions have been adopted by other vendors and have become de facto standards. It is not possible for SDL to provide a function call for every extension; I'm not sure it is possible for the SDL developers to even know about every extension. Instead, SDL provides SDL_GL_GetProcAddress(), which looks up functions by name and returns a pointer to the named function. You can then use that pointer to call the function. Using SDL_GL_GetProcAddress() allows you to access every OpenGL extension.

Custom Libraries

The idea that an application would load its own special OpenGL library seems a little odd when you first think about it. After all, don't you want to use the system library to get the best performance out of your video card? Well, no, not always. It may be that you know that a lot of the machines your application is going to run on do not have OpenGL support, so you include your own OpenGL library. (There is at least one commercial implementation of OpenGL that is based entirely on DirectX. It is designed to let OpenGL programs run on Windows without any other OpenGL support.) Or you may need to use software rendering, because you are generating images that are larger and have more bits of color or Z buffer than can be rendered on any existing video card. In those cases, you must load your own OpenGL library.

SDL provides SDL_GL_LoadLibrary() for loading custom OpenGL libraries. After you have loaded the library you must use SDL_GL_GetProcAddress() to retrieve pointers to all of the OpenGL functions your program will use. This function is not for the amateur or the faint of heart.

Animation with OpenGL

In the previous articles in this series, I used a simple animation program, softlines.cpp, to demonstrate SDL software surfaces. Then I converted it into hardlines.cpp to highlight the differences between hardware and software buffers. In this article, I'll continue with that theme by modifying the same program to do the same animation using OpenGL. The new program is called gllines.cpp.

The first change I have to make is to include the SDL OpenGL support library by adding an include statement:

#include "SDL.h"
#include "SDL_opengl.h"

Without this tiny change, the program won't compile.

The biggest change in the program is the removal of all of the software line drawing code. The first two programs had several hundred lines of code devoted to line-drawing routines. OpenGL has its own line-drawing code, so my old code can be deleted.

The next change is in the sweepLine class. SDL and OpenGL have very different ways of handling color. I had to make several changes in the program to accommodate those differences. In the SDL versions of the program, we used SDL_MapRGP() to convert colors to pixel values. We then use the pixel values to draw colored lines. In OpenGL, you set the color using the red, green, and blue color components just before you draw something. So I changed sweepLine to keep around the color components instead of just using a single pixel value. Then I had to change the actual line-drawing code to use OpenGL.

The next set of changes show up in main() before and after the SDL_SetVideoMode() call. Before setting the video mode, I had to add calls to SDL_GL_SetAttribute() to configure the OpenGL surface. To make sure that the program runs on as many machines as possible, I only ask for 1 bit each of red, green, and blue.

After setting the video mode, there are calls to glViewport() and glOrtho(). These calls put the (0,0) coordinate in the upper-left-hand corner of the screen and force the Y coordinate to increase down the screen. They also set the logical width and height of the window to be the same as the width and height measured in pixels. Programs tend to have built-in assumptions. If changes violate those assumptions, you are likely to introduce bugs. I'm trying to avoid that problem by making the coordinate system of the OpenGL surface match the coordinate system of an SDL software surface.

The rest of the changes are pretty small. The SDL code for creating pixel values for colors has been deleted. The call to SDL_FillRect() used to clear the back buffer has been replaced with a call to glClear(), which has the same effect. And the call to SDL_Flip() has been replaced with a call to SDL_GL_SwapBuffers().

When I tested that version of the program, it reported that it was running at over 500 frames per second and my system monitor showed that it was using 100% of the CPU. Every so often, the animation would stop and then jerk forward as the OS took the CPU away from my program to run another task. To fix that, I added one more short piece of code at the top of the main animation loop.

now = SDL_GetTicks();
if ((now - ticks) < 10)
{
  SDL_Delay(5);
}
ticks = now;

The code tests to see how long it has been since the last time the top of the loop was reached. It if is less than 10 milliseconds, then the program executes SDL_Delay(5) to slow down the program. After adding that change, the program still reports running at over 140 frames per second, but it only uses about 2% of the CPU.

Note: I used the magic value 5 for the delay because on a system where the clock ticks every 10 milliseconds (a lot of systems), you will get an average delay of 5 milliseconds (half of 10), and on systems where the clock ticks every millisecond (another large group of systems), you will get an average delay of 4.5 milliseconds. So asking for a delay of 5 milliseconds will, on average, give me a delay pretty close to 5 milliseconds. No other integer less than 10 has that property.

The final program is several hundred lines shorter than the original program. Many of these changes make the program smaller and simpler; I like changes like that. The majority of the changes were one-for-one substitutions of OpenGL APIs for SDL APIs, which did not force any structural changes on the program.

Conclusion

In this article, I have introduced all of the functions in SDL that interface directly with OpenGL. The gllines.cpp program demonstrates the use of those functions. Equally as important, I have described the effects on memory usage and performance that can result from using the many different kinds of OpenGL buffers.

My next article will cover using SDL with OpenGL to solve common graphic programming problems. SDL has a small set of graphics operations that are ideal for preparing graphics for use in OpenGL applications.

Bob Pendleton has been fascinated by computer games ever since his first paid programming job -- porting games from an HP 2100 minicomputer to a UNIVAC 1108 mainframe.


Return to the Linux DevCenter.




Linux Online Certification

Linux/Unix System Administration Certificate Series
Linux/Unix System Administration Certificate Series — This course series targets both beginning and intermediate Linux/Unix users who want to acquire advanced system administration skills, and to back those skills up with a Certificate from the University of Illinois Office of Continuing Education.

Enroll today!


Linux Resources
  • Linux Online
  • The Linux FAQ
  • linux.java.net
  • Linux Kernel Archives
  • Kernel Traffic
  • DistroWatch.com


  • Sponsored by: