Distributing Content with BitTorrentby Robert Bernier
Growing up with the internet, it's a bit hard to relate to an earlier time when the prime means of sharing knowledge was through the medium of paper. It wasn't that long ago that the only way to brush up on the latest research was to go down to the university library and read your favorite trade magazine.
Now all you need to do to share your latest and greatest is to mount a web site on the space your ISP provides. But what happens if you don't have the available space for something large? Worse yet, what do you do if the entire rest of the internet suddenly discovers your site and your bandwidth goes beyond the acceptable limitation? Your ISP will either shut you down or, worse yet, charge you mightily for "excessive" bandwidth usage.
Everybody knows where I'm going with this: peer-to-peer file sharing is a technique allowing anybody to publish and access files such as documents, videos, CD ISOs, even music, without the restriction of bandwidth. It does so by harnessing the power of other people's computers that also have clients currently running on the file-sharing system at the same time.
There are many peer-to-peer protocols, each with their own strengths and weaknesses. Some of them are not well known, others are infamous, while still others have faded away and gone out of use.
This article shows how easy it is to publish your content online by using BitTorrent.
How BitTorrent Works
BitTorrent has three distinct components: the client, the web server, and the tracker. The client is the person/machine that downloads the content. The web server provides a link to a file called a torrent. The torrent is a specially created file that describes the shared file and the location of the tracker. This third component is a service that waits for a connection from a client. It sits on a user-assigned socket that can be either on the same machine as the web server or at another location. The tracker not only supervises the sharing of the content between multiple clients, but also logs all downloading activities. The tracker can manage many files at the same time from many different torrents on many different web servers. You can even refer to the tracker by a torrent that you have downloaded as a file on your machine, eliminating the need for the web server.
Beware the trap of false assumptions: surfing and downloading is so familiar that people take it for granted that supplying a file via BitTorrent is pretty much a case of uploading it to some server. No! Peer-to-peer file sharing means that files come from other clients and not from a server. Instead, the server manages the mechanics of sharing the file between clients.
Suppose you wish to share an ISO image and you're the only person who has it. Others can have it only if you are running a client yourself. This first client is the seed. If another client comes along and wants to download your content, the tracker guides it over to the first one and his download begins. Now suppose a third client shows up and wants the same content; the tracker coordinates three clients together providing the new content to client No. 3. As more clients connect, the faster the connection becomes, because each client provides additional bandwidth.
The Scenario: Sharing an ISO
The company I work for, Software Research Associates (SRA), has supported the creation of a community project, a live CD called pg_live. This Knoppix-based distribution profiles PostgreSQL. It comes equipped with replication, a half dozen programming languages, and documentation that includes how-tos, FAQs, references, and a book. It's the only live CD distro I know of that boasts a full-scale enterprise-ready RDBMS. It first saw use at OSCON, when the PostgreSQL community gave away pressed CDs at the booth. Recently, we updated the ISO where it made a big splash at Linux World Boston.
In early 2005, SRA decided to continue updating the CD and make sure the PostgreSQL community had full access to the most recent version. The easiest way to do this was by providing the ISO via BitTorrent.
I've made a few assumptions in writing this article. If your setup is different, you'll have to adjust it.
- I've opted to use the Debian implementation of BitTorrent, which includes man pages as well as wrappers for the Python-based BitTorrent utilities.
- The current version of pg_live image is between 300 and 400MB.
- My test server hosts both the web server and the tracker.
Installing BitTorrent on a Debian machine is easy:
# apt-get update # apt-get install bittorent
Don't worry about missing dependencies, because the installation procedure resolves them automatically, including the Python programming language.
The first step in sharing your file is to create the torrent. The utility is the Python script bymaketorrent.py. The Debian wrapper is btmakemetafile. The command line is:
$ btmakemetafilemyfile tracker_announce_address
where myfile is the name of the file that I want to share and
tracker_announce_address is the location where I'll install the
tracker service. There is only one option switch,
--piece_size_pow2. A bigger number means that you can
share/transfer more of the file at a time, but it requires that you limit the number of
connections to your peer once you've reach the network's maximum bandwidth.
Thus with a smaller number your peer will accept more connections, though the file
will transfer less quickly.
Because this was a test platform, I decided to use the localhost URL. The tracker can listen on any socket. The default is port number 80, but the BitTorrent documentation recommends using port 6969. I chose port 8099. You can use either a domain name or an IP address as your URL to the tracker's server. Remember, you must be root to be able to set up a service that listens on any ports less than 1024.
From the directory containing the ISO, I used the command:
$ btmakemetafile pg_live.1.3.3-SRA.iso http://localhost:8099/announce
This produced a torrent file named pg_live.1.3.3-SRA.iso.torrent.
btmakemetafile utility creates a hash used to verify the
data's integrity as clients download it. The larger the file, the longer it
takes to generate the torrent. The resulting torrent file size varies according
to the size of the file you want to share. For example, an 11MB tarball will
result produce a torrent of approximately 1.1K. On the other hand, a large ISO
of 380MB will increase the torrent to 31K.
Notice that I appended the path
announce to my localhost's URL.
This is a hardcoded value in BitTorrent that must always be present in the
tracker's URL. A real directory by the name of
announce is not required to exist on your web server.