Commit 7161e701 authored by Dillenn Terumalai's avatar Dillenn Terumalai
Browse files

Merge branch 'dev' into 'master'

Version 1.4.0

See merge request !5
parents d0b1b6e4 e00833fe
Pipeline #1305 passed with stages
in 14 minutes and 26 seconds
......@@ -10,4 +10,5 @@
!spsp
!.gitlab
!viruses
!bacteria
\ No newline at end of file
!bacteria
!builds
\ No newline at end of file
# see https://docs.gitlab.com/ce/ci/yaml/README.html for all available options
# you can delete this line if you're not using Docker
image: centos:latest
before_script:
- dnf install -y nc
- dnf install -y openssh-clients
- dnf install -y zip
- dnf install -y bzip2
test_init:
stage: test
script:
- echo "Testing init function"
- sh spsp init --without-env
test_compress:
# SPSP Transfer Tool CI/CD
stages:
- test
- deploy
default:
image: centos:latest
centos test:
before_script:
- dnf install -y nc
- dnf install -y openssh-clients
stage: test
script:
- echo "Testing compress function"
- echo "Hello World!" > test.txt
- sh spsp compress test.txt --without-env
- rm test.txt*
test_hash:
stage: test
script:
- echo "Testing hash function"
- echo "Hello World!" > test.txt
- sh spsp hash test.txt --without-env
test_encrypt:
- ./spsp init --without-env
- ./spsp test --without-env
artifacts:
paths:
- logs/error.log
- logs/spsp*.log
expire_in: 1 week
when: on_failure
debian test:
image: debian:latest
before_script:
- apt-get update
- apt-get install -y netcat
- apt-get install -y gpg
- apt-get install -y openssh-client
- apt-get install -y gpg
stage: test
script:
- echo "Testing encrypt function"
- echo "Hello World!" > test.txt
- gpg --import --fingerprint .pub
- sh spsp encrypt test.txt --without-env
test_help:
stage: test
script:
- echo "Testing help function"
- sh spsp help --without-env
production:
script: sh build.sh
artifacts:
paths:
- builds/transfer-tool*.zip
- builds/transfer-tool*.tar
- builds/transfer-tool*.tar.gz
- builds/transfer-tool*.tar.bz2
only:
- master
- ./spsp init --without-env
- ./spsp test --without-env
inherit:
default: false
artifacts:
paths:
- logs/error.log
- logs/spsp*.log
expire_in: 1 week
when: on_failure
release:
stage: deploy
before_script:
- dnf install -y zip
- dnf install -y bzip2
script: sh build.sh
artifacts:
paths:
- builds/transfer-tool*.zip
- builds/transfer-tool*.tar
- builds/transfer-tool*.tar.gz
- builds/transfer-tool*.tar.bz2
only:
- tags
......@@ -2,247 +2,16 @@
[![pipeline status](https://gitlab.sib.swiss/SPSP/transfer-tool/badges/master/pipeline.svg)](https://gitlab.sib.swiss/SPSP/transfer-tool/-/commits/master)
This is the GitLab repo of the official Transfer Tool (TT) for SPSP.
The aim of the Transfer Tool (TT) is to provide a tool allowing users of the Swiss Pathogen Surveillance Platform to easily and securely transfer sequencing files. The TT is a simple shell script relying on multiple libraries (surch as GPG and OpenSSH) to archive, hash, encrypt and transfer FASTQ files with their metadata.
## Table of Contents
# Documentation
https://gitlab.sib.swiss/SPSP/transfer-tool/-/wikis/home
- [How does the Transfer Tool (TT) work?](#how-does-the-transfer-tool-tt-work)
- [Getting Started](#getting-started)
- [Set up a shared drive between users of the same SPSP group](#set-up-a-shared-drive-between-users-of-the-same-spsp-group)
- [Upload the SSH public key](#upload-the-SSH-public-key)
- [Installation](#installation)
- [Configure the .env file](#configure-the-env-file)
- [Verify the public key](#verify-the-public-key)
- [Use the Transfer Tool](#use-the-transfer-tool)
- [How to prepare FASTQ files with the metadata file](#how-to-prepare-fastq-files-with-the-metadata-file)
- [How to transfer files easily](#how-to-transfer-files-easily)
- [Use the automatic mode in combination with a CRON task](#use-the-automatic-mode-in-combination-with-a-cron-task)
- [Debugging](#debugging)
- [Authors](#authors)
# License
## How does the Transfer Tool (TT) work?
# Reference
This tool is part of the Swiss Pathogen Surveillance Platform
To understand how the TT works, it helps to look at a diagram. In the example on the right, Labo1 wants to send DNA sequences of bacteria/viruses in a protected manner. To do so, the data exchange will rely on asymetric/public key encryption. The SPSP server has a public key and a private key, which are two mathematically related encryption keys. The public key can be shared with any laboratory, but only the SPSP server has the private key.
First, Lab1 uses the TT to compress the sequences files (*.fastq) and metadata file (*.xlsx) into a tar.gz archive.
Then, the TT generates a unique hash of the previously generated archive using SHA-256 algorithm, meaning that if the content changes even slightly, the hash will be completely different.
After that, the TT uses SPSP’s public key to encrypt the archive, turning it into something scrambled.
Finally, the encrypted archive and the hash are uploaded using SFTP protocol which runs over the SSH protocol (which provides communication security and strong encryption).
On the server side, once the archive is properly uploaded, the server will decrypt the encrypted archive using its own private key. Then, it will compare the uploaded hash to the hash generated by the server from the decrypted archive to ensure that the content was never changed during the transfer. Finally, if everything goes well, the metadata file is parsed and loaded inside the database. As the server cannot guess for which project of the laboratory the uploaded sequences belong to, an assignment task will be generated.
Advantages of the Transfer Tool and SFTP services:
**It keeps the data safe from hackers.** Asymetric/public key encryption means only the laboratory sender and the SPSP server have access to the unencrypted data. While in transit, the data remains completely encrypted.
**It uses a secured tunnel.** SFTP protocol protects the integrity of the data using encryption and cryptographic hash functions, and authenticates both the server and the laboratory.
**It checks that the data is left untouched.** By generating a hash, the TT makes sure that the data was never modified during the whole process.
![Transfer Tool Diagram](.gitlab/diagram.png)
## Getting Started
These instructions will allow you to use locally the tool to transfer your data.
### Prerequisites
If you want to be able to use the tool smoothly, make sure that you have:
- OS: macOS or Linux - This script can only run on those operative systems, it might be possible to run it on Windows 10 with bash installed but it has not been tested
- SSH: public key uploaded - This script is assuming that you have generated and transferred your public SSH key to the SPSP SFTP Server. If not, please read the [Installation](#installation) chapter
- GPG: gpg available - This script uses [GnuPG](https://gnupg.org/) to encrypt the data , make sure that it is installed and you can run this command (`which gpg`)
- COMMANDS: commands available - The Transfer Tool relies on multiple commands such as: `sha256sum` or `shasum`, `tar`, `sftp`, `nc` and `gpg`, to perform different operations needed. Make sure that all of them are available (`which sha256sum` for example)
- (Optional) CRON: automatic mode - If you want to activate the automatic mode of the Transfer Tool make sure that you can setup a CRON task
Due to the secure environment where SPSP is hosted, data cannot be directly uploaded via the SPSP online platform. Instead SPSP users should use a dedicated drive within their institution to submit data to SPSP.
Note: The dedicated drive is to be setup by each institution, with the support of SIB ([see below](#set-up-a-shared-drive-between-users-of-the-same-SPSP-group)). Upon registration of your group to SPSP, SIB will ask you to liaise with your IT department to set this up. Data cannot be submitted to SPSP before this drive has been set up.
## Set up a shared drive between users of the same SPSP group
SPSP users must belong to a SPSP group. All the data submitted by a user of a group is visible to all the users of this group. Thus, if multiple SPSP groups are registered to SPSP in your institution, please make sure to set up separate shared drives for each SPSP group.
The shared drive should be hosted on a Linux server, and require authentication using e.g. your institution LDAP. As explained below, data transferred to SPSP is not done by the user but by the SPSP group. Hence, in order to be able to trace back the origin of potential malware submissions, it is essential that access to the shared drive be controlled at the user level.
## Upload the SSH public key
Before using the script, you need to make sure that you create an SSH key pair for user authentication.
Start by generating a key pair, make sure to replace `user` by your specific ID provided by the board of SPSP. Open a terminal and type:
```bash
ssh-keygen -o -a 64 -t ed25519 -f ~/.ssh/id_ed25519 -C "user@spsp.sib.swiss" #PLEASE REPLACE user WITH YOUR OWN LAB/INSTITUTION ID GIVEN BY SPSP SUPPORT
```
You will be asked to `Enter file in which to save the key (/Users/user/.ssh/id_ed25519 or /home/user/.ssh/id_ed25519):`, leave it by default by typing the return key.
Then you will be asked to `Enter passphrase (empty for no passphrase):`, you can leave it empty or type your own passphrase.
You will then be prompted that your SSH public key has been saved to `/Users/user/.ssh/id_ed25519.pub` or `or /home/user/.ssh/id_ed25519.pub`. This is your public key that needs to be authorized on the SPSP SFTP Server.
For the next step, you will need to upload your key. Start by copying your key. Type the following to display the public key:
For macOS:
```bash
cat /Users/user/.ssh/id_ed25519.pub #PLEASE REPLACE user WITH YOUR LOCAL ACCOUNT
```
For linux:
```bash
cat /home/user/.ssh/id_ed25519.pub #PLEASE REPLACE user WITH YOUR LOCAL ACCOUNT
```
Then click [here](mailto:spsp-support@sib.swiss?subject=[SPSP-SFTP]Request%20Authorization) to send your key. Once the key has been validated, you will be notified by mail.
## Installation
### Getting started
A step by step series of commands that tell you how to setup and use properly the Transfer Tool.
Start by downloading the last version of Transfer Tool on your local machine:
[Download](https://gitlab.sib.swiss/SPSP/transfer-tool/-/releases)
Extract the downloaded archive where you want and access it with your terminal:
```bash
cd ~/path/to/transfer-tool
ls -la
```
Your terminal should output 4 folders (logs,sent,viruses,bacteria), 2 files (README.md, spsp), 1 hidden folder (.outbox) and 1 hidden file (.pub). Here is a short description of each folder and file:
- **viruses** - main repository where you should copy your folder which contains your **viruses** fastq files and metadata file that you want to send
- **bacteria** - main repository where you should copy your folder which contains your **bacteria** fastq files and metadata file that you want to send
- **sent** - contains encrypted files with their SHA256 hash that have been properly sent
- **logs** - contains all the log files when you use the auto mode (log files record only errors)
- ***.outbox*** - contains files to be sent to the SPSP server through sftp
- README.md - user guide
- spsp - script containing all the commands to run, type `./spsp help` to display the commands
- *.pub* - public key of SPSP for encryption
Let's start by setting up the Transfer Tool. To do so, type:
```bash
sh spsp init
```
This will make sure that the needed commands are available, that the script is executable, that your .env file is properly set and it will also import the public key to your own list of keys. Refer to the terminal output in case of any error.
### Configure the .env file
/!\ **THIS STEP IS EXTREMELY IMPORTANT, WITHOUT THE CORRECT SETUP, THE TRANSFER WILL FAIL** /!\
You need to have a properly configured .env file to connect to the SFTP server of SPSP. Normally, in the previous step, you should have been prompted to fill some informations while using the command `sh spsp init`. But **if it is not the case**, you can manually create the needed file. Create an .env file by using the following commands:
```bash
echo 'ID=LAB_ID' > .env #REPLACE LAB_ID BY YOUR OWN ID PROVIDED BY THE SPSP BOARD
echo 'HOST=spsp-sftp.vital-it.ch' >> .env #DO NOT CHANGE THIS LINE
echo 'SFTP_URL=${ID}@${HOST}:/data' >> .env #DO NOT CHANGE THIS LINE
```
### Verify the public key
At one point, the terminal should output the fingerprint (in green) of the imported key. Please make sure that the fingerprint corresponds to:
**ABC9 FC14 AAC9 52E7 767F D14A 48B7 0E72 4BAF E0A3**
If it doesn't, please [contact us](mailto:spsp-support@sib.swiss?subject=[SPSP-SFTP]Wrong%20Public%20Key) and send us the public key (.pub file in the directory).
### Conclusion
If everything went well, congratulations, you are ready to use the Transfer Tool. If not, please check the terminal output or contact the [support](mailto:spsp-support@sib.swiss?subject=[SPSP-SFTP]Support).
## Use the Transfer Tool
The following commands are available:
- `./spsp compress <folder>` - compress a folder to tar.gz archive
- `./spsp encrypt <file>` - encrypts a file using gpg command and SPSP public key (which needs to be in your own GPG keys list)
- `./spsp hash <file>` - generates the hash of a file using SHA-256 algorithm
- `./spsp transfer <file>` - transfers a file through sftp to SPSP server (your SSH key needs to be validated by SPSP to use this command)
- `./spsp auto`- automatically run the transfer-tool (this needs to be combined with a CRON task, see below for more information), add `--no-archive` or `-NA` to not keep the sent files
- `./spsp help` - displays the help
## How to prepare FASTQ files with the metadata file
This step assumes that you already followed the guide on [spsp.ch](https://spsp.ch/) and will only tell you how you should orgnaize your files for the Transfer Tool to work properly
- Start by **identifying** if your sequences are from **viruses** or **bacteria** (if mixed, you need to separate them)
- Make sure that the sequences described in your metadata file **match** the FASTQ files
- **Create a subfolder** as the date of the day (for example: 26-06-12) **inside bacteria/viruses directory** depending on their type
- **COPY** your FASTQ files and the metadata file inside the freshly created folder
**IT IS VERY IMPORTANT TO ALWAYS PUT YOUR FASTQ FILES AND METADATA FILE INSIDE A SUBFOLDER IN THE BACTERIA/VIRUSES DIRECTORY, OR THE TT WILL IGNORE THE FILES**
## How to transfer files easily
If you want to quickly and easily send a batch of FASTQ files with their metadata, just follow those instructions:
- Follow the instruction on [How to prepare FASTQ files with the metadata file](#how-to-prepare-fastq-files-with-the-metadata-file)
- Launch the pipeline by typing `./spsp auto` which will trigger the **automatic** mode
- Once the transfer is over, you should find the sent files (encrypted archive and hash file) in the `sent` folder
Before the transfer, your directory should look like this:
- /bacteria
- /26-06-20
- sequence1.fastq
- sequence2.fastq
- sequence3.fastq
- metadata-file.xlsx
- /viruses
- /sent
- /logs
- spsp
- README.md
After the transfer, it shoud look like this:
- /bacteria
- /viruses
- /sent
- 26-06-20.tar.gz.gpg
- 26-06-20.tar.gz.sha256
- /logs
- spsp
- README.md
## Use the automatic mode in combination with a CRON task
If you want to use the automatic mode on daily basis, you need to set up a [CRON](https://en.wikipedia.org/wiki/Cron) task.
We recommend the following settings:
```
0 5 * * * /path/to/spsp/spsp auto >> /path/to/spsp.log
```
This will launch the Transfer Tool at 5 AM every day of the week using the automatic mode and save the output inside a file called `spsp.log` (this will be the main log file).
In order, this is what happens:
1) Checks that the `.outbox`, `sent`, `viruses`, `bacteria` and `.logs` folders exist.
2) Creates a log file using the current date inside `.logs` directory
3) Checks if the connection to SPSP works
4) Scans the two `viruses` and `bacteria` directories for any folder; if one is found, checks that it contains `.fastq` or `.fastq.gz` and `.xlsx` files at least
5) Compresses the folder to tar.gz and move it to `.outbox` directory, then delete the initial folder
6) Then for every file inside `outbox`, generates the hash of the file using SHA-256
7) Encrypts the file using the SPSP public key and delete the initial unencrypted compressed file
8) Transfers `*.sha256` (hash) and `*.gpg` (encrypted tar.gz) files to the corresponding subdirectory (`viruses` or `bacteria`) on the remote server
9) (Optional) If you used the automatic mode with the `--no-archive` option, the sent files will not be moved to the `sent` folder and **will be erased**
If any error occurs during the process, the script will output the error in the log file inside the `.logs` directory and will automatically stop to avoid any more errors.
Keep in mind that in the CRON task, we are returning the output of the automatic mode of the script inside a file called `spsp.log`. This should be your starting point to check if any error occured. Then, you can check the log file inside the `.logs` folder for more information.
Also, be sure that when you copy the `fastq` or `fastq.gz` files inside the directory, the copy process should be completed before 5 AM (based on the recommended settings), or the script will send incomplete files.
Finally, as files may be quite large (several GB per file), it is up to each institution to decide if all the archives should be kept inside the `sent` folder (default behavior) or not (use the `--no-archive` option).
## Debugging
The Transfer Tool will automatically exit on any error. If you need to debug it, make sure to always check the `logs` folder which contains all the logs. You can quickly identify the log of the day by looking at the name. If you open the log file with a text editor, you will see a short description of what went wrong, allowing you to understand what needs to be fixed. If you don't understand the error message or you don't know what to do, don't hesitate to [contact us](mailto:spsp-support@sib.swiss).
## Authors
* **Dillenn TERUMALAI** - *Initial work* - [dillenn.terumalai@sib.swiss](mailto:dillenn.terumalai@sib.swiss)
# Contact
SPSP Support
[spsp-support@sib.swiss](mailto:spsp-support@sib.swiss)
\ No newline at end of file
This diff is collapsed.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment