Growing up with the internet, it's a bit hard to relate to an earlier time when the prime means of sharing knowledge was through the medium of paper. It wasn't that long ago that the only way to brush up on the latest research was to go down to the university library and read your favorite trade magazine.
Now all you need to do to share your latest and greatest is to mount a web site on the space your ISP provides. But what happens if you don't have the available space for something large? Worse yet, what do you do if the entire rest of the internet suddenly discovers your site and your bandwidth goes beyond the acceptable limitation? Your ISP will either shut you down or, worse yet, charge you mightily for "excessive" bandwidth usage.
Everybody knows where I'm going with this: peer-to-peer file sharing is a technique allowing anybody to publish and access files such as documents, videos, CD ISOs, even music, without the restriction of bandwidth. It does so by harnessing the power of other people's computers that also have clients currently running on the file-sharing system at the same time.
There are many peer-to-peer protocols, each with their own strengths and weaknesses. Some of them are not well known, others are infamous, while still others have faded away and gone out of use.
This article shows how easy it is to publish your content online by using BitTorrent.
BitTorrent has three distinct components: the client, the web server, and the tracker. The client is the person/machine that downloads the content. The web server provides a link to a file called a torrent. The torrent is a specially created file that describes the shared file and the location of the tracker. This third component is a service that waits for a connection from a client. It sits on a user-assigned socket that can be either on the same machine as the web server or at another location. The tracker not only supervises the sharing of the content between multiple clients, but also logs all downloading activities. The tracker can manage many files at the same time from many different torrents on many different web servers. You can even refer to the tracker by a torrent that you have downloaded as a file on your machine, eliminating the need for the web server.
Beware the trap of false assumptions: surfing and downloading is so familiar that people take it for granted that supplying a file via BitTorrent is pretty much a case of uploading it to some server. No! Peer-to-peer file sharing means that files come from other clients and not from a server. Instead, the server manages the mechanics of sharing the file between clients.
Suppose you wish to share an ISO image and you're the only person who has it. Others can have it only if you are running a client yourself. This first client is the seed. If another client comes along and wants to download your content, the tracker guides it over to the first one and his download begins. Now suppose a third client shows up and wants the same content; the tracker coordinates three clients together providing the new content to client No. 3. As more clients connect, the faster the connection becomes, because each client provides additional bandwidth.
The company I work for, Software Research Associates (SRA), has supported the creation of a community project, a live CD called pg_live. This Knoppix-based distribution profiles PostgreSQL. It comes equipped with replication, a half dozen programming languages, and documentation that includes how-tos, FAQs, references, and a book. It's the only live CD distro I know of that boasts a full-scale enterprise-ready RDBMS. It first saw use at OSCON, when the PostgreSQL community gave away pressed CDs at the booth. Recently, we updated the ISO where it made a big splash at Linux World Boston.
In early 2005, SRA decided to continue updating the CD and make sure the PostgreSQL community had full access to the most recent version. The easiest way to do this was by providing the ISO via BitTorrent.
I've made a few assumptions in writing this article. If your setup is different, you'll have to adjust it.
Installing BitTorrent on a Debian machine is easy:
# apt-get update # apt-get install bittorent
Don't worry about missing dependencies, because the installation procedure resolves them automatically, including the Python programming language.
The first step in sharing your file is to create the torrent. The utility is the Python script bymaketorrent.py. The Debian wrapper is btmakemetafile. The command line is:
$ btmakemetafilemyfile tracker_announce_address
where myfile is the name of the file that I want to share and
tracker_announce_address is the location where I'll install the
tracker service. There is only one option switch,
--piece_size_pow2. A bigger number means that you can
share/transfer more of the file at a time, but it requires that you limit the number of
connections to your peer once you've reach the network's maximum bandwidth.
Thus with a smaller number your peer will accept more connections, though the file
will transfer less quickly.
Because this was a test platform, I decided to use the localhost URL. The tracker can listen on any socket. The default is port number 80, but the BitTorrent documentation recommends using port 6969. I chose port 8099. You can use either a domain name or an IP address as your URL to the tracker's server. Remember, you must be root to be able to set up a service that listens on any ports less than 1024.
From the directory containing the ISO, I used the command:
$ btmakemetafile pg_live.1.3.3-SRA.iso http://localhost:8099/announce
This produced a torrent file named pg_live.1.3.3-SRA.iso.torrent.
btmakemetafile utility creates a hash used to verify the
data's integrity as clients download it. The larger the file, the longer it
takes to generate the torrent. The resulting torrent file size varies according
to the size of the file you want to share. For example, an 11MB tarball will
result produce a torrent of approximately 1.1K. On the other hand, a large ISO
of 380MB will increase the torrent to 31K.
Notice that I appended the path
announce to my localhost's URL.
This is a hardcoded value in BitTorrent that must always be present in the
tracker's URL. A real directory by the name of
announce is not required to exist on your web server.
You need to make a minor addition to the web server's configuration settings to include the .torrent file extension as a new MIME type. Otherwise, the browser may attempt to read the torrent as a text file. If you're using Apache httpd, add the following to your httpd.conf file (and don't forget to restart the web server!):
AddType application/x-bittorrent .torrent
Upload the torrent to your web server. You can install a link on any of your web pages referring the torrent much in the same manner as you would for an ordinary HTML page:
<HTML> <TITLE>Torrent Example</TITLE> <BODY> This is the <A HREF="./pg_live1.3.3-SRA.iso.torrent">torrent</A>. </BODY> </HTML>
Think of the tracker as the middleman to a financial transaction; he doesn't add anything to the product itself, but he makes it possible for both seller and the buyer to meet and carry out the transaction.
The Debian wrapper for the Python program bttrack.py is
bttrack. This invocation puts the tracker on port 8090, recording
all download activity in the file mydownloadlogfile.txt:
$ bttrack --dfile mydownloadlogfile.txt --port 8090
The switches control the logging information describing how to track the shared information. For further information, please refer to the source code or, in my case, the man pages, which the Debian distribution always includes.
The last piece of the puzzle is running the client itself. As I mentioned earlier, you need to run a client yourself if you are preparing to share a file for the first time; otherwise there will be no sharing. There are two kinds of command-line utilities that handle file sharing: single (btdownloadheadless.py) and multiple (btlaunchmany.py) file downloads. I will concentrate on the former.
The Debian command-line invocation is:
$ btdownloadheadless [ option ... ] torrent
The torrent can be either the URL or a file path. This invocation, used on
my test platform, calls the torrent file directly from the web server. The
--url switch refers to the torrent file, while the
--saveas switch indicates the file path of the existing file I
want to share with the world:
$ btdownloadheadless --url \ http://localhost/pg_live.1.3.3-SRA.iso.torrent --saveas \ ./pg_live.1.3.3-SRA.iso
The client now connects with the tracker, informing it that it is ready to share a copy of pg_live.1.3.3-SRA.iso.
Here's another client invocation that I could have used. This looks for the torrent file on the client's own machine located in the current directory:
$ btdownloadheadless --saveas ./pg_live.1.3.3-SRA.iso \ pg_live.1.3.3-SRA.iso.torrent
The client connects to the tracker, where it will synchronize its data. The larger the file, the longer it takes. My test machine took about two minutes before the sync completed. Here's a sample of the messages:
saving: pg_live.1.3.3-SRAA.iso (382.6 MB) percent done: 100 time left: Download Succeeded! download to: /home/robert/tmp/pg_live.1.3.3-SRAA.iso upload rate: 0.00 kB/s upload total: 0.0 MiB
BitTorrent has an amazing amount of flexibility. You can control upload and download bandwidths, the ports you use for socket connections, the number of connecting clients, and the refresh rate reflecting changed or new torrents that become available on the tracker. You can even configure the system to take into account clients who are behind firewalls.
This next section covers only some of the neat tricks that you can use with BitTorrent. Refer to the BitTorrent documentation for a complete listing of what you can do.
Getting tracker statistics from my tracker is easy. I just direct my browser to the tracker's port, http://localhost:8099. The resulting HTML page returns data similar to:
BitTorrent download info * tracker version: 3.4.2 * server time: 2005-03-29 13:50 UTC info hash complete downloading downloaded 4e98ea442573f5b8868537e970fd3ce6321e9e81 1 0 0 0 files 1/1 0/0 0/0 * info hash: SHA1 hash of the "info" section of the metainfo (*.torrent) * complete: number of connected clients with the complete file (total: unique IPs/total connections) * downloading: number of connected clients still downloading (total: unique IPs/total connections) * downloaded: reported complete downloads (total: current/all) * transferred: torrent size * total downloaded (does not include partial transfers)
Yes, that's right, you can jump onto other people's trackers instead of relying on one of your own! You can share your documents without having either a web server or a tracker. Suppose I want to use the postgresql.org tracker: all I need to do is download one of their latest postgresql torrents and read the first couple of lines to identify the tracker URL:
$ wget http://bt.postgresql.org/postgresql-8.0.1.zip.torrent $ cat bt.postgresql.org:6969 | less
In this case, the URL turns out to be http//bt.postgresql.org:6969/announce.
I can reannounce the tracker of my existing torrent by using
the torrent utility
Here's the general format:
btreannounce url torrent [ torrent ... ]
I can choose this tracker for my torrent with:
$ btreannounce http://bt.postgresql.org:6969/announce \ pg_live.1.3.3-SRAA.iso.torrent
The postgresql.org tracker will now manage all subsequent connections to the pg_live ISO. You can review the tracker's statistics by using your browser and going to http://bt.postrgresql.org:6969. (Some sites list only the md5 checksum without naming the file, but you can compare it with the checksum generated by your own tracker).
It's good etiquette to request permission for using other people's trackers.
You can prevent other people from using your tracker--as postgresql.org
does, by the way--by telling
bttrack to track only those
torrents of which it has copies in a user-defined directory. For example, if I
place a copy of the pg_live torrent file in a directory where
bttrack can see it, it will limit the tracker to file sharing for
just pg_live. That's the purpose of the
$ bttrack --port 8099 --show_names 1 --allowed_dir ~/mytorrents \ --dfile downloadinfo.txt
Another advantage of using this particular invocation is the detailed report that the tracker supplies.
|info hash||torrent name||size||complete||down-
I always seem to be ending articles when I say this, but ... I've only just
scratched the surface of what you can do. I guess you'll just have to play
with it to learn more. I recommend reading the man pages for
Robert Bernier is the PostgreSQL business intelligence analyst for SRA America, a subsidiary of Software Research America (SRA).
Return to the Linux DevCenter.
Copyright © 2009 O'Reilly Media, Inc.