Distributed Computing Sanity Checkingby Howard Feldman
Dozens of distributed computing (DC) projects have recently become available for interested parties to download and run on their machines. As DC is becoming more and more popular, groups are jumping on the bandwagon to take advantage of this wondrous opportunity for "free" computer time. However, as with any other new technology these days, security and privacy have become important issues.
The security problem can be divided into two distinct facets: client security and server security. The former involves security on the computers of the volunteers running the distributed application. This is extremely important, perhaps even more important than with other types of software, because DC clients often communicate over the network. A poorly designed client could be hijacked into a back door for hackers and miscreants. By simply downloading and installing the application, volunteers implicitly trust the authors of the software not to do nasty things to their computers. After all, these programs are usually written by research groups, not software companies, and do not go through the same level of QA testing as commercial software. How can you be certain a client will not accidentally format the hard drive? Perhaps a particularly malicious programmer could release a DC app that secretly steals all of your credit card info the next time you enter it into a web application. None of this is beyond the realm of possibility.
Server security is another issue altogether. Most projects seek to answer a question or solve some scientific problem. The experiment is compromised unless the integrity of results returned by clients can be guaranteed. Similarly, if someone breaks into the server and changes results, the experiment is invalidated. Some users have also been clever enough to find ways to cheat, for example, uploading the same "work units" multiple times. This gives them extra credits and makes it appear as if they are doing lots of work. While perhaps impressing their friends, this sort of behavior can often be destructive to the project, biasing or perhaps completely ruining results computed on the server end. The SETI@Home project has had several problems with users cheating and exploiting loopholes. The lag in the project managers fixing this hole cost them many users who became upset by rampant cheating.
The purpose of this article is not to make you paranoid about running DC projects or to turn you off to them. By all means, donate your CPU cycles to worthy projects! It is also not to reveal any secrets or security holes of existing DC projects; any security compromised by this article was never really secure to begin with. I do not claim to be an expert in computer and network security, by any stretch of the imagination. However, with new projects appearing weekly, you should be cautious and evaluate new projects from a security standpoint before signing up. The following sections discuss things to look for to ensure a DC project is secure, as well as some things to do to improve the security of your own DC project.
Signatures and Hashes
The server will often need to send data to the clients running the DC application. Any time data is sent to a machine, security measures must be put in place. This data may be new work to process or a new version of the application; either way, sending unprotected raw data over the network is just asking for trouble. It would be relatively easy for a third party to pose as the server and deliver arbitrary code to your computer, especially in the case of client updates to the executable, which would then be automatically (or manually) executed. This may be a lesser problem on Unix if the software is run as a user with minimal permissions, but clearly this is still unacceptable.
This is where digital signatures come in handy. Digital signatures are a way of guaranteeing to a client that a certain message came from a trusted individual. It works through the use of an encryption keypair. A trusted individual, usually the author of the DC client, has the private, secret key. He signs the document to be protected with this key. The document is then delivered to the client, who has the corresponding public key. The client can verify the signature on the document using the public key. Only someone who knows the private key could have signed the document such that the public key verifies the signature and so the document must have come from the trusted source. Most importantly, a document cannot be modified once signed or the verification will fail.
By signing all updates sent to users and embedding the public key in the DC client, users can be reassured that no one can easily intercept or change update files for their own purposes. As long as users trust the document signer, their computers will be safe.
On the other side of the fence, project managers must ensure the integrity of the data being returned by the users. They should be reasonably confident that all incoming data is in fact being generated by the clients (and not manually, to cheat), that work is not being duplicated, and that in fact data integrity is maintained from when it is generated by the client until it is stored on the server. While the latter point may sound a bit excessive, remember thousands of people will likely be sending data to the server and you must be prepared for the unexpected. Modern network connections are generally quite reliable and transfer data flawlessly but a bad cable or weak connection along the way could still cause a few bad bits to sneak in. More commonly, users may have overclocked CPUs or bad RAM chips (more often than you may think). These can lead to corrupt data files which will then be uploaded. Depending on the exact nature of the problem, the data file may still look entirely legitimate though it contains incorrect data. This is perhaps the scariest problem of all for a DC project organizer.
Again, digital signatures can be used to identify packets as having come from the DC client. As a bonus you get a free data integrity check along with it at the server end. However, there is a catch. You must include the private key with the client, so it can do the signing. A clever computer whiz could then extract the private key, make some fake data files, and sign them, making them indistinguishable from normal packets. Although this is difficult and unlikely, you must consider all possibilities when dealing with security issues.
Another simpler approach in this case would be to use hashes. A hash, or checksum, is simply a function which takes an input, usually an arbitrarily complex set of data, and outputs a simple summary or representation of that information. Popular hashing algorithms include SHA1, MD5 (message digest 5) and CRC32 (cyclic redundancy check). Each of these digests input files or "messages", producing a hexadecimal checksum value for each file, 32-bits in length for CRC32 and 128 bits for MD5. Since there are a finite number of outputs (2^32 or 2^128) and an infinite number of inputs, the digest value is not guaranteed to be unique. However, it has been shown that it is extremely unlikely that any two files will have the same checksum, unless carefully contrived to produce such a result. It is even less likely that the same checksum will occur if minor errors or changes are made to a file, due to faulty RAM for example, because of the way in which they are computed.
Sending a message hash along with the message allows the recipient to compute the hash independently and validate the integrity of the message. This prevents messages from being corrupted in transit, and also forces the sender to know what hash to send and when and where to send it. Again, a clever user could likely find a way to forge an incoming message, sending the proper hash with the upload to make it look as if it were coming from the client software, but this will at least stop the average user from doing so. The hash can also be used to track uploaded data, to avoid counting duplicates (for example, a user trying to upload the same work twice for credit). To do this, only store the hashes of previously uploaded work instead of all the work itself and check newly uploaded work against the list of hashes to see whether it has already been received before.
Because users possess the binary executable of the client, anything within it that it may use to identify itself to the server as the source of the upload could theoretically be spoofed. Outgoing network packets can be sniffed and, even if encrypted, the encryption key must then be present on the user machines. The best we can do is make this task non-trivial. However, this does not mean the success of the project is in jeopardy. Logic and common sense will normally still prevail. Regardless of the nature of the project, the designers generally have some idea of what to expect, and in some cases can verify results. For example when searching for new prime numbers, once one is allegedly found, it can easily and quickly be validated by hand. When generating molecular simulations, they can be tested for discontinuities or unrealistic parameter values. The point is that, no matter how careful the project managers may be, all results should still be manually verified at some point. At a minimum, they must ask themselves "Are the results what we expected? Why or why not? Is this reasonable?" After all, if the results cannot be reproduced or verified, then from a scientific point of view, the experiment is useless and poorly designed.
Maintaining Project Integrity
Many projects are likely to use data files of some sort or another to help them do their work. These may be parameter files, configuration files, and so forth. While some, such as configuration files, may change over time, some will not and are effectively read-only. These will be for the most part large tables of numbers or words (a character set, an energy force field, dictionary, etc.), which have not been directly hardcoded into the program for one reason or another. To ensure the integrity of the project, these data files must be protected from accidental modification. Binary files are less likely to be changed than plain text ones, but both types should be protected from unauthorized modification to avoid cheating or, more generally, invalid results.
These files lie on the user's file system, however, so making them read-only will stop only the most neophyte hacker. The solution? Again, checksums are our friends. Simply compute MD5, CRC, or some other checksums for the correct files and hard code them into the program. If the checksum match fails, the program should exit with an appropriate error message. Of course someone could always go into the binary with a hex editor, find the checksum, and change it (maybe), but again this requires a much more ambitious and knowledgeable individual. Similar measures could even be taken to ensure no one tampers with the binary itself, like checksumming itself or checking the date stamp or file size. Keep in mind here that text files may have different checksums and sizes on Windows compared to Unix, due to the extra carriage return characters in Windows text files. In this case you may want to compute checksums for both environments and allow either one to be valid.
The biggest security hole in most any application or environment is people. Regardless of how deep an encryption you use, how many layers of security there are, and how complex your code, the human element will always remain the one unpredictable thing. Social engineering is the art of getting information out of people org getting them to do things for you, without them realizing it. You may think, "I would never give away any secrets of my project if I ran a DC project". Many very intelligent people have been fooled in the past by clever social engineers, and many more will no doubt follow. Kevin D. Mitnick wrote an excellent book the subject, The Art of Deception: Controlling the Human Element of Security.
As an example, suppose, as DC project technical support, you receive a message from firstname.lastname@example.org, saying he forgot his password and would like you to send it to him. This sounds perfectly innocent, so you look up his password and send it to him. After all, he signed up with that address so the password must belong to him. But you don't notice that the reply-to address is in fact email@example.com. In fact maybe it's a different address altogether. You have just given joeblow's password away to a stranger , perhaps his biggest competitor.
By spoofing mail header fields (including
To:), trickery, deception, and outright lying, you can very easily
be caught off guard and fooled into giving away information that you should
not. As a general rule, systems for sending lost passwords, registration, and
so on, should be fully automated. After all, computers don't make mistakes and
cannot be tricked into revealing information that they are not programmed to.
If you must give out information manually, be careful about what you give out. Keep written records of everything you give out so you can always go back to it if there is a problem later, and if a request sounds the least bit suspicious, check the message headers to see if they appear to be spoofed. You can always verify that they really sent you the message in question as well. They will be glad you were extra cautious before revealing any of their personal information. Lastly, never send out requests yourself by email for people to provide you with any sort of personal info. Legitimate companies never ask customers to reveal private information by email. Neither should you.
In the end, security in a DC project boils down to common sense. Always check the final results turned in to the server. Results turned in that seem too good to be true or seem like major outliers should be reproduced by hand. If that is not possible, reconsider how the project generates data in the first place. A non-reproducible experiment is not science.
Users caught cheating or trying to compromise the integrity of the project should be dealt with swiftly and removed from the project, however be sure the apparent cheating is indeed a result of the user and not a bug in the program code or even faulty computer hardware! Often talking to the user via e-mail will quickly establish which is the case.
If you're still paranoid about security after reading this article, firms exist which will perform professional security audits on any system you desire. They will look both for software and hardware issues and inform you of any places that they feel are insecure and need some work. These are professionals who do this for a living, and are generally quite good at what they do. You can never be 100% certain that your system is secure from attack but a thorough security audit, if you have the money for it, will get you 99% of the way there.
Howard Feldman is a research scientist at the Chemical Computing Group in Montreal, Quebec.
Return to ONLamp.com.