Rawseeds and BitTorrent

Important note: the BitTorrent distribution system is not operational anymore. A new dataset distribution system is available.

Rawseeds chose to distribute its datasets using the BitTorrent protocol. While requiring a bit of additional effort from the users, this choice has a number of key advantages (such as the fact that otherwise Rawseeds would not have been possible :-) ).

To know what you have to do to use BitTorrent to retrieve Rawseeds’ datasets, read the BitTorrent HOWTO. Here, we will describe the reasons why we had to use a bit of lateral thinking while setting up the distribution system for Rawseeds’ datasets. And the reasons why the use of BitTorrent for that was, we think, a Really Good Idea.

The usual way to download a file via the web is, of course, to use HTTP download. This is behind the “click here to download the file” mechanism you find everywhere. Unfortunately, this method is unsuitable for Rawseeds for a couple of reasons:

  1. it is reliable enough only if the files to download are not too big: say, a few hundreds of MBytes;
  2. as the number of downloaders rise, the upload bandwidth required from the server hosting the files rises accordingly.

Rawseeds’ datasets include very large files, up to some tens of GBytes. You really don’t want your downloading process to stop after a full day of downloading… so HTTP was ruled out. Some of you are surely thinking, by now, “What about ftp?”. Well, ftp solves the first problem but not the second. Bandwidth (especially upwards) is costly. Rawseeds is a small project, and it really hasn’t the resources to buy an expensive hosting service capable of the uploading speeds that are needed if many users require huge files at the same time and don’t want to wait a month to have them delivered.

By distributing Rawseeds’ data using the BitTorrent peer-to-peer protocol, we are able to overcome both the problems described above, while the users also benefit from additional useful features.
A data distribution system based on BitTorrent, such as ours, has many advantages (look in the BitTorrent HOWTO for details)

  • download is very reliable, even for extremely large files such as ours;
  • users (called “peers”) upload data as well as downloading, and this leads to
    • faster download for all the peers
    • self-adjusting available bandwidth: it increases just as it is needed
    • overall bandwidth available for download is not limited by the upload bandwidth of the project’s own servers
    • worst-case (i.e., single user) download speed is similar to HTTP speed; in all other conditions speed is larger… and usually much larger.

Of course, a custom-designed file distribution system based on BitTorrent has a complexity of design, realization and management that far exceed those of standard, off-the-shelf systems based on HTTP click-and-download. This is our price to pay to give you a better service. What is your price to pay? Simple: please let your BitTorrent client program running for as long as possible, even after your download from Rawseeds is complete. If your machine is not always on, ensure that the client is automatically restarted as the PC is turned on. In this way, you act as a BitTorrent seed for Rawseeds’ data, i.e., you donate a small fraction (chosen by you) of your upload bandwidth to help all users of Rawseeds get faster and better service. This is very important to keep Rawseeds running!
All the details about this are here.