AddThis Social Bookmark Button

Print

Planning for Disaster Recovery on LAMP Systems
Pages: 1, 2

Unfortunately, not all system files have an include feature. Of these, crontab is the most significant. In this case you can still store the configuration data under /proj but you will need to cut and paste it into the appropriate file as part of your recovery procedure.



Regardless of how you incorporate the information, you need some way of distinguishing your changes from the default configuration. My preference is to be explicit and to bracket any changes with comment lines like these:

## Craic modification BEGIN

[...]

## Craic modification END

This also gives me tags that I can look for, should I want to uninstall the changes or check whether they are already in place.

You should store all of the included files on the /proj partition. You could create a configuration directory for each application but my preference is for a single directory for all the application settings. In the examples above, this is /proj/linux_config. Within that I have subdirectories for httpd, samba, mysql, and so on. Bringing all the configuration data for all my applications together allows me to manage their interactions more readily than would separate directories. Additionally, I can refer to that single directory in the recovery plan, which is reassuring for our friends in IT.

The Installation Script

The disaster recovery plan people really, really want installation scripts for your applications. These scripts should wrap up all the messy details and make the recovery process look much more like the Windows applications they are used to.

By implementing the ideas given above, we can now give them what they want. All the script needs to do is insert the include directives, or blocks of code, into the system configuration files and restart the appropriate daemons. Make sure that your script tests whether the changes have already been made and then creates a backup copy of the system file before doing anything else. Here is an example block of code from an install script, written in bash, which inserts an include directive into the Apache configuration file.

HTTPDCONF=/etc/httpd/conf/httpd.conf
 
FLAG=`grep '## Craic modification BEGIN' $HTTPDCONF`
 
if [ -n "$FLAG" ] ; then
   echo '... no changes needed'
 
else
   cp $HTTPDCONF ${HTTPDCONF}.bak
 
   echo -e '## Craic modification BEGIN\n'  >> $HTTPDCONF
   echo "Include /proj/linux_config/httpd " >> $HTTPDCONF
   echo -e '\n## Craic modification END'    >> $HTTPDCONF

   /etc/rc.d/init.d/httpd restart 
fi

Using comments to delimit each block makes it easy to excise the modification as part of an uninstall script. This is extremely useful during the testing phase of your disaster recovery plan.

Why not just create an RPM or other package for each application? That would be great but it involves quite a bit of effort for an application that will only ever be installed at a single site. To my mind, a simple shell script is at the right level of complexity. Should you choose to go the extra mile and create an RPM installation package, then the ideas described here should serve as a good foundation.

Documentation

We all strive for good documentation but, if we are honest, we don't usually do a very good job of it, especially when our applications are still under active development. So how much documentation is enough in the context of disaster recovery?

Our friends in corporate IT want to know what an application does, where the software and data for it resides, what other software it depends on, and exactly what to do to rebuild it. They want this in a form they can print out and put in a binder on the shelf in the computer room, along with a copy in off-site storage.

I need something different. I want to see ReadMe files scattered throughout the directories that tell me what the files represent, what the scripts do, and what other parts of the application and system they interact with. These make up the informal map that another developer can use to decipher my work if I get hit by a bus. Perhaps more importantly, they will refresh my memory three years from now when I need to make some changes.

Don't be shy about what you put in the ReadMe files. If part of the application is a complete hack that depends on an obsolete Perl module, or if you know it will crash next time we have a leap year, then say so. You don't need to tell the world about the gory details but you do need to capture that information in a form that you or another developer can access. In the midst of a real disaster recovery, those notes can make all the difference to someone trying to fix things.

Please remember, ALWAYS put a date and a name next to your comments. A problem that required a huge workaround last year might well have been fixed in the current release of the operating system. This is a widespread problem with Linux HOWTOs and web sites. A date helps me assess whether the information is still relevant and a name gives me someone to contact if I need more information.

If you can create and maintain this level of documentation as you develop the application then it is not too much effort to rework it into the form that the good people in corporate IT are looking for.

Use DNS Aliases for Multiple Applications

If you have several applications and multiple servers then you should consider setting up Apache virtual hosts for each of them along with DNS aliases that relate each application to the physical host.

For example, let us say I have two machines (server1 and server2) with the application app1 on server1 and app2 on server2. The default way to access the start page for each application would be to use the URLs http://server1/app1 and http://server2/app2.

If server1 blows up then I either need to replace it or move the app1 application to server2. But then all my users will have to update their bookmarks to point to the new machine. The better alternative is to create the hosts app1 and app2 as DNS aliases that point to server1 and server2 respectively. In the Apache config for each server I create virtual hosts for ALL of these applications, in essence replicating their configuration even though the application itself may not be present. Users now access the applications as http://app1 and http://app2. The IP address in the DNS alias dictates which machine the user is directed to.

If I need to move either application to another machine I simply install the software, set up the virtual host, and then change the DNS alias to point to the new machine. All the existing bookmarks and links continue to work. Users are none the wiser to the change in venue.

Mirrored Servers

Live replication of applications and data is a great way to ensure that your applications are available. Rsync lets you maintain duplicate copies of directories on different machines, with regular updates. The directory layout I've discussed here fits in perfectly with rsync's abilities. MySQL replication can take that one step further with live mirroring of the contents of a database to another machine. The setup is more involved than rsync but it can be well worth the effort.

Be careful not to confuse high availability with disaster recovery. Mirrored servers will do a great job at replicating your data, good or bad. Replication can easily result in two corrupt databases instead of one. Things are different with the high-end commercial databases that you'll find in the banks, but that's not what we dealing with here. On our level, replication is great but nothing beats having a tape on a shelf in off-site storage.

Final Thoughts

Disaster recovery planning should be just as important to developers as it is to corporate IT. While the cultural differences between "us" and "them" can be frustrating, we need to address their needs head-on if our style of application is going to find a place in their world.

By designing our apps with disaster recovery in mind right from the start, we become an ally of corporate IT rather than a thorn in their side. A little bit of forethought pays big dividends.

Robert Jones runs Craic Computing, a small bioinformatics company in Seattle that provides advanced software and data analysis services to the biotechnology industry. He was a bench molecular biologist for many years before programming got the better of him.


Return to ONLamp.com.