SPSP-data-retrieval-and-processing.pdf
Script 00
Performs reads trimming and computes quality control metrics.
Method | Tool |
---|---|
Reads trimming | Trimmomatic-0.38 |
FastQC | FastQC-0.11.8 |
Contamination level | Kraken-1.1 |
Quality metrics
Based on the output, 3 quality metrics and a quality score (green/orange/red) are computed in script generate_trimming-cleaning_report.pl. The consensus from Bioinformatics_consensus_v1.xlsx is summarized in the table below:
Metric | Green | Orange | Red | Description |
---|---|---|---|---|
Contamination [%] | [0,0.2] | ]0.2,5] | ]5,Inf[ | Percent reads corresponding to another order than the expected one |
Read length after trimming [%] | [75,100] | [67,75[ | [0,67[ | median(read length after trimming)/median(read length before trimming) |
Coverage [X] | [60,100] | [30,60[ | [0,30[ | number_of_reads * median(read length after trimming)/genome_length |
Script 01
Performs typing, assembly and SNP calling.
Method | Tool |
---|---|
MLST | MentaLiST-0.2.4 |
Assembly | SPAdes-3.13 |
SNP calling (mapping) | samtools-1.9, Picard-2.18.20, BWA-0.7.17 |
SNP calling (variant calling) | GATK-3.8.1.0, VCFtools-0.1.16 |
Script 02
02-Comparative_genomics-cgMLST.sh
Computes a cgMLST-tree for strains within a project.
Method | Tool |
---|---|
cgMLST | MentaLiST-0.2.4 |
Neighbor-joining tree | python 3 make_njtree |
Script 03
03-Comparative_genomics-Global-ST.sh
Computes a global cgSNP tree for each species, and a local wgSNP tree for each species and sequence type (MLST). Core genome is defined by the species-specific MLST-schema. For the local tree, a genome reference is taken within the sequence type (ST) (i.e. first genome of that ST to be sequenced becomes the reference for that ST).
Method | Tool |
---|---|
Tree | Gubbins-2.3.4 |