BitTorrent HOWTO for Rawseeds users

Important note: the BitTorrent distribution system is not operational anymore. A new dataset distribution system is available.

What is BitTorrent?

BitTorrent is a protocol for efficient peer-to-peer file sharing. The term “peer-to-peer” means that each user requesting a specific file also acts as a distributor for that file towards other users. This leads to massively faster download speed for every user, something that is very needed for the case of Rawseeds, where extremely large files have to be distributed within reasonable time. An explanation of why and how the use of BitTorrent to distribute the datasets is a key element of Rawseeds can be read here.

The key concepts behind BitTorrent are the following:

  • every file to be distributed is subdivided into chunks
  • the chunks are distributed to different users requesting that file
  • users exchange already downloaded portions as well as downloading new ones
  • “smart” distribution policies (e.g., rarest chunk first) are implemented to further speed up things

The elements of a BitTorrent system for the distribution of a file are:

  1. the peers, i.e., the PCs requesting the file;
  2. the seeds, i.e., the peers possessing a complete copy of the file;
  3. the tracker, i.e., the PC coordinating the sharing of the file.

The Rawseeds project set up the tracker and a few seeds. However, many other seeds are needed, so your help is crucial! In fact, we are a small project: we do not have the resources to set up many seeds by ourselves, or to buy ultra-wideband hosting services. So we ask you to share a little bit of your upload bandwidth to help other users get the files (just as you were helped by previous users). How? In a nutshell, keeping your BitTorrent client (i.e., the program you used to get the data) running after you have finished the download, and ensuring that it is restarted at every boot. Read here for the details. Thank you!

What you need to get Rawseeds’ datasets

To be able to download Rawseeds’ datasets, you need two things:

  1. a BitTorrent client program;
  2. an open TCP port towards that program (this is only required if you want fast download).

First, let’s see what a BitTorrent client is.
It is a little program that is able to manage the exchange of data through the internet using the BitTorrent protocol. BitTorrent clients are available for free for any software platform (and particularly for Windows, Mac, Linux); actually, they have become so commonplace that they are frequently installed by default together with the OS (this is the case of many Linux distributions, for instance). Here you can find a list of available BitTorrent clients. Some of the most commonly used are μTorrent (Windows only), Transmission (Mac and Linux only), KTorrent ( Linux, Mac, Windows): however, everyone has her preferences here, so let’s just say that any of them is good for Rawseeds (if we can say a word about this… prefer open source software!). Just ask your friends and colleagues: we’re sure that some (or possibly all…) of them already use BitTorrent.

Now, let’s describe what an open TCP port is and how to get it.
Your BitTorrent client will point out to you if the port it is trying to use to receive connections is closed. If it isn’t, you don’t even need to read this paragraph. If it is closed, this means that your PC is behind a firewall (be it hardware or software, such as Windows’ own), and that an external PC cannot try and connect to your running applications unless you instruct your firewall to accept incoming connections. Incoming connections are directed to one of the 65536 ports defined by the TCP protocol, each of which is associated to a specific service or application (or to none). One of this ports is associated to your BitTorrent client: you must open it in order to receive connections from other Rawseeds users: in this way, your download speeds will be much greater.
The specific port to open is set in the Preferences (or Options or whatever) of your BitTorrent client. If you don’t have a reason to change it, use the default value set by the program. To open the port, you have to access your firewall and specify that incoming connections to that port must be allowed. Unfortunately, we can’t give you detailed instructions about this process, because it is heavily dependent on how your firewall is implemented. Usually, hardware firewalls are fitted with a web interface accessible with a web browser, while software firewalls have their own configuration menus.
Anyway, if you work in a company, an university, or any other large-scale structure the firewalls are almost surely not operated by you, but by your network administrators. So you simply have to ask them to open for your machine the port specified by the BitTorrent client you chose. A word of warning about this… network administrators do not usually like people using BitTorrent within their network. In fact, this kind of protocol is mostly used to share media such as music and movies. Therefore, be prepared to be asked by a frowning tecnician: “Exactly, why do you need this port to be opened?”.

How to get Rawseeds’ datasets

Once you have installed the BitTorrent client of your choice, you’re done.
While browsing Rawseeds’ website, if you find the link to a file you want to dowload, just click on it. As usual, you will be asked by your web browser what to do with it (e.g., save or open it). If the file has a “.torrent” extension, choose “Open with [your BitTorrent client program]. The program will open, and download of the file will begin (possibly after you have been asked for a further confirmation by the BitTorrent client itself).
The amount of resources (processor, RAM, …) used by a BitTorrent client is very small: so much so that they can be considered as negligible with a modern PC. Therefore, you can leave the program running all the time, while you do your work as usual. You can stop the BitTorrent program whenever you like, without losing the already downloaded data; you can also, of course, turn off the PC. However be sure to re-run the BitTorrent client as you turn on the PC, or the download will not resume (the best way to do this is to put the BitTorrent client in the list of applications that are automatically started every time the PC is turned on).
The BitTorrent client will put the downloaded files into a specific directory that is set by its own “Preferences” (or “Options” or whatever). It will also inform you about how much of each file you have already downloaded. For extremely large files, such as some of Rawseeds’ ones, download can take many hours or even days. Download speed for a given file depends (on the download bandwidth of your internet connection and) on how many other users are helping you by seeding the same file. This is why your help as a seed is crucial: please, take away some minutes from your work to read how easily you can become a seed for Rawseeds’ datasets and why you should do it.