How does the Transfer Tool work?
To understand how the TT works, it helps to look at a diagram. In the example on below, Labo1 wants to send DNA sequences of bacteria/viruses in a protected manner. To do so, the data exchange will rely on asymetric/public key encryption. The SPSP server has a public key and a private key, which are two mathematically related encryption keys. The public key can be shared with any laboratory, but only the SPSP server has the private key.
First, Lab1 uses the TT to compress the sequences files and metadata file into a tar.gz archive. Then, the TT generates a unique hash of the previously generated archive using SHA-256 algorithm, meaning that if the content changes even slightly, the hash will be completely different. After that, the TT uses SPSP’s public key to encrypt the archive, turning it into something scrambled. Finally, the encrypted archive and the hash are uploaded using SFTP protocol which runs over the SSH protocol (which provides communication security and strong encryption).
On the server side, once the archive is properly uploaded, the server will decrypt the encrypted archive using its own private key. Then, it will compare the uploaded hash to the hash generated by the server from the decrypted archive to ensure that the content was never changed during the transfer. Finally, if everything goes well, the metadata file is parsed and loaded inside the database. As the server cannot guess for which project of the laboratory the uploaded sequences belong to, an assignment task will be generated.
Advantages of the Transfer Tool and SFTP services:
- It keeps the data safe from hackers. Asymetric/public key encryption means only the laboratory sender and the SPSP server have access to the unencrypted data. While in transit, the data remains completely encrypted.
- It uses a secured tunnel. SFTP protocol protects the integrity of the data using encryption and cryptographic hash functions, and authenticates both the server and the laboratory.
- It checks that the data is left untouched. By generating a hash, the TT makes sure that the data was never modified during the whole process.